#Java使用poi将office文件转为html ##一、前言 功能需求:上传office文档,并提供文件在线预览。

解决方案:

  • 使用Aspose.cells.jar包,将文档转换为pdf格式;
  • 使用libreOffice,将文档转换为pdf格式;
  • 使用poi将文档转换为html格式。

方案一,通过Aspose的方式,该功能是付费版,需要破解,所以是能抛弃。 方案二,使用libreOffice,需要安装使用libreOffice,linux还需要装unoconv,需要使用commons-io的pom依赖,之前maven官方库查询不到这个pom依赖所以放弃了这个方案,刚才准备查询资料时发现这个依赖已经可以使用,估计是前段时间maven官方库出现问题。 方案三,只需要添加所需的依赖就可以使用,但是转换出的html会有一些格式问题,等下会再下面讲到。

二、添加依赖

<dependencies>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>3.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-scratchpad</artifactId>
        <version>3.12</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.xdocreport.document</artifactId>
        <version>1.0.5</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
        <version>1.0.4</version>
    </dependency>
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.xdocreport.core</artifactId>
        <version>2.0.1</version>
    </dependency>
</dependencies>

三、word文档转html

一般word文件后缀有doc、docx两种。docx是word2007以及以后版本文档的扩展名,doc是word2003文档保存的扩展名。对于这两种格式的word转换成html需要使用不同的方法。

1、转换问题: 转换后,对于2003版本word,自动生成的目录会显示有错误;2003版本和2007版本对于特殊字符都有可能显示不出来,不过问题不是很明显;文章中的图片一定要是常用的图片格式(jpeg,jpg,png等),不然无法显示。

2、2003版本word转换成html(.doc)

public static boolean word2003ToHtml(Map params) {
  
    logger.debug("***** word2003ToHtml start params:{}", params);
    try {

    //图片存放路径
    String fileImg = params.get("fileImg").toString();
  
    //转换html后,html中图片的url前缀
    String viewImgPath = params.get("viewImgPath").toString();
    
    //html文件
    File htmlFile = new File(params.get("htmlFile").toString());
    
    File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString());
    
    // 1) 加载word文档生成 HWPFDocument对象  
    InputStream inputStream = new FileInputStream(file);
    HWPFDocument wordDocument = new HWPFDocument(inputStream);
    WordToHtmlConverter wordToHtmlConverter =
        new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        
    //设置图片存放的位置
    wordToHtmlConverter.setPicturesManager(new PicturesManager() {
        public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) {
            File imgPath = new File(fileImg);
            if (!imgPath.exists()) {//图片目录不存在则创建
            imgPath.mkdirs();
         }
        File file = new File(fileImg + suggestedName);
        try {
            OutputStream os = new FileOutputStream(file);
            os.write(content);
            os.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        //这里可以指定word文档中图片的路径。
        return viewImgPath + "/" + suggestedName;
        }
    });
    
    //解析word文档
    wordToHtmlConverter.processDocument(wordDocument);
    Document htmlDocument = wordToHtmlConverter.getDocument();
 
    OutputStream outputStream = new FileOutputStream(htmlFile);

    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(outputStream);

    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer serializer = factory.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");

    serializer.transform(domSource, streamResult);
    outputStream.close();
    } catch (Exception e) {
    e.printStackTrace();
    return false;
    }
    return true;
}

3、2007版本word转换成html(.docx)

public static boolean word2007ToHtml(Map params) throws Exception {
    logger.debug("***** word2007ToHtml start params:{}", params);
    try {
        //转换html后,html中图片的url前缀
        String viewImgPath = params.get("viewImgPath").toString();
        
        //图片存放路径
        String fileImg = params.get("fileImg").toString();
        
        File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString());
        
        // 1) 加载word文档生成 XWPFDocument对象
        InputStream inputStream = new FileInputStream(file);
        XWPFDocument document = new XWPFDocument(inputStream);
        
        // 2) 解析 XHTML配置 (URIResolver来设置图片存放的目录)
        XHTMLOptions options = XHTMLOptions.create();
        options.URIResolver(new BasicURIResolver(viewImgPath));
        FileImageExtractor extractor = new FileImageExtractor(new File(fileImg));
        options.setExtractor(extractor);
        
        // 3) 将 XWPFDocument转换成XHTML
        File htmlFile = new File(params.get("htmlFile").toString());
        OutputStream outputStream = new FileOutputStream(htmlFile);
        XHTMLConverter.getInstance().convert(document, outputStream, options);
    } catch (Exception e) {
        e.printStackTrace();
        return false;
    }
    return true;
}

四、Excel文档转Html

POI中将Excel转换为HTML方法仅能转换HSSFWorkBook类型(即03版xls),故可以先将读取的xlsx文件转换成xls文件再调用该方法统一处理

1、2003版本excel转换html(excel中不可包括图片,因为poi没提供将excel中图片转换的方法)

public static boolean excelToHtml(Map params) {
    try {
        String file = params.get("FILE_NAME").toString();
        String filePath = params.get("filePath").toString() + file;


        InputStream input = new FileInputStream(filePath);

        HSSFWorkbook excelBook = new HSSFWorkbook();

        //判断Excel文件将07+版本转换为03版本
        if (file.endsWith(EXCEL_XLS)) {  //Excel 2003
            excelBook = new HSSFWorkbook(input);
        } else if (file.endsWith(EXCEL_XLSX)) {  // Excel 2007/2010
            ExcelTransFormUtil xls = new ExcelTransFormUtil();
            XSSFWorkbook workbookOld = new XSSFWorkbook(input);
            xls.transformXSSF(workbookOld, excelBook);
        }

        ExcelToHtmlConverter excelToHtmlConverter = new ExcelToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        
        //去掉Excel头行
        excelToHtmlConverter.setOutputColumnHeaders(false);
        
        //去掉Excel行号
        excelToHtmlConverter.setOutputRowNumbers(false);

        excelToHtmlConverter.processWorkbook(excelBook);

        Document htmlDocument = excelToHtmlConverter.getDocument();

        ByteArrayOutputStream outStream = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(outStream);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();

        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        outStream.close();

        String content = new String(outStream.toByteArray());

        FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), content, "UTF-8");
    } catch (Exception e) {
        e.printStackTrace();
        return false;
    }
        return true;
}

2、将2007版本转换成2003版本

public void transformXSSF(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew) {
    HSSFSheet sheetNew;
    XSSFSheet sheetOld;

    workbookNew.setMissingCellPolicy(workbookOld.getMissingCellPolicy());

    for (int i = 0; i < workbookOld.getNumberOfSheets(); i++) {
        sheetOld = workbookOld.getSheetAt(i);
        sheetNew = workbookNew.getSheet(sheetOld.getSheetName());
        sheetNew = workbookNew.createSheet(sheetOld.getSheetName());
        this.transform(workbookOld, workbookNew, sheetOld, sheetNew);
    }
}

private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew,
                     XSSFSheet sheetOld, HSSFSheet sheetNew) {

    sheetNew.setDisplayFormulas(sheetOld.isDisplayFormulas());
    sheetNew.setDisplayGridlines(sheetOld.isDisplayGridlines());
    sheetNew.setDisplayGuts(sheetOld.getDisplayGuts());
    sheetNew.setDisplayRowColHeadings(sheetOld.isDisplayRowColHeadings());
    sheetNew.setDisplayZeros(sheetOld.isDisplayZeros());
    sheetNew.setFitToPage(sheetOld.getFitToPage());

    sheetNew.setHorizontallyCenter(sheetOld.getHorizontallyCenter());
    sheetNew.setMargin(Sheet.BottomMargin,
        sheetOld.getMargin(Sheet.BottomMargin));
    sheetNew.setMargin(Sheet.FooterMargin,
        sheetOld.getMargin(Sheet.FooterMargin));
    sheetNew.setMargin(Sheet.HeaderMargin,
        sheetOld.getMargin(Sheet.HeaderMargin));
    sheetNew.setMargin(Sheet.LeftMargin,
        sheetOld.getMargin(Sheet.LeftMargin));
    sheetNew.setMargin(Sheet.RightMargin,
        sheetOld.getMargin(Sheet.RightMargin));
    sheetNew.setMargin(Sheet.TopMargin, sheetOld.getMargin(Sheet.TopMargin));
    sheetNew.setPrintGridlines(sheetNew.isPrintGridlines());
    sheetNew.setRightToLeft(sheetNew.isRightToLeft());
    sheetNew.setRowSumsBelow(sheetNew.getRowSumsBelow());
    sheetNew.setRowSumsRight(sheetNew.getRowSumsRight());
    sheetNew.setVerticallyCenter(sheetOld.getVerticallyCenter());

    HSSFRow rowNew;
    for (Row row : sheetOld) {
        rowNew = sheetNew.createRow(row.getRowNum());
        if (rowNew != null)
            this.transform(workbookOld, workbookNew, (XSSFRow) row, rowNew);
    }

    for (int i = 0; i < this.lastColumn; i++) {
        sheetNew.setColumnWidth(i, sheetOld.getColumnWidth(i));
        sheetNew.setColumnHidden(i, sheetOld.isColumnHidden(i));
    }

    for (int i = 0; i < sheetOld.getNumMergedRegions(); i++) {
        CellRangeAddress merged = sheetOld.getMergedRegion(i);
        sheetNew.addMergedRegion(merged);
    }
}

private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew,
                     XSSFRow rowOld, HSSFRow rowNew) {
    HSSFCell cellNew;
    rowNew.setHeight(rowOld.getHeight());

    for (Cell cell : rowOld) {
    cellNew = rowNew.createCell(cell.getColumnIndex(),
        cell.getCellType());
    if (cellNew != null)
        this.transform(workbookOld, workbookNew, (XSSFCell) cell,
            cellNew);
    }
    this.lastColumn = Math.max(this.lastColumn, rowOld.getLastCellNum());
}

private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew,
                     XSSFCell cellOld, HSSFCell cellNew) {
    cellNew.setCellComment(cellOld.getCellComment());

    Integer hash = cellOld.getCellStyle().hashCode();
    if (this.styleMap != null && !this.styleMap.containsKey(hash)) {
    this.transform(workbookOld, workbookNew, hash,
        cellOld.getCellStyle(),
        (HSSFCellStyle) workbookNew.createCellStyle());
    }
    cellNew.setCellStyle(this.styleMap.get(hash));

    switch (cellOld.getCellType()) {
    case Cell.CELL_TYPE_BLANK:
        break;
    case Cell.CELL_TYPE_BOOLEAN:
        cellNew.setCellValue(cellOld.getBooleanCellValue());
        break;
    case Cell.CELL_TYPE_ERROR:
        cellNew.setCellValue(cellOld.getErrorCellValue());
        break;
    case Cell.CELL_TYPE_FORMULA:
        cellNew.setCellValue(cellOld.getCellFormula());
        break;
    case Cell.CELL_TYPE_NUMERIC:
        cellNew.setCellValue(cellOld.getNumericCellValue());
        break;
    case Cell.CELL_TYPE_STRING:
        cellNew.setCellValue(cellOld.getStringCellValue());
        break;
    default:
   
    }
}

private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew,
                     Integer hash, XSSFCellStyle styleOld, HSSFCellStyle styleNew) {
    styleNew.setAlignment(styleOld.getAlignment());
    styleNew.setBorderBottom(styleOld.getBorderBottom());
    styleNew.setBorderLeft(styleOld.getBorderLeft());
    styleNew.setBorderRight(styleOld.getBorderRight());
    styleNew.setBorderTop(styleOld.getBorderTop());
    styleNew.setDataFormat(this.transform(workbookOld, workbookNew,
        styleOld.getDataFormat()));
    styleNew.setFillBackgroundColor(styleOld.getFillBackgroundColor());
    styleNew.setFillForegroundColor(styleOld.getFillForegroundColor());
    styleNew.setFillPattern(styleOld.getFillPattern());
    styleNew.setFont(this.transform(workbookNew,
        (XSSFFont) styleOld.getFont()));
    styleNew.setHidden(styleOld.getHidden());
    styleNew.setIndention(styleOld.getIndention());
    styleNew.setLocked(styleOld.getLocked());
    styleNew.setVerticalAlignment(styleOld.getVerticalAlignment());
    styleNew.setWrapText(styleOld.getWrapText());
    this.styleMap.put(hash, styleNew);
}

private short transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew,
                      short index) {
    DataFormat formatOld = workbookOld.createDataFormat();
    DataFormat formatNew = workbookNew.createDataFormat();
    return formatNew.getFormat(formatOld.getFormat(index));
}

private HSSFFont transform(HSSFWorkbook workbookNew, XSSFFont fontOld) {
    HSSFFont fontNew = workbookNew.createFont();
    fontNew.setBoldweight(fontOld.getBoldweight());
    fontNew.setCharSet(fontOld.getCharSet());
    fontNew.setColor(fontOld.getColor());
    fontNew.setFontName(fontOld.getFontName());
    fontNew.setFontHeight(fontOld.getFontHeight());
    fontNew.setItalic(fontOld.getItalic());
    fontNew.setStrikeout(fontOld.getStrikeout());
    fontNew.setTypeOffset(fontOld.getTypeOffset());
    fontNew.setUnderline(fontOld.getUnderline());
    return fontNew;
} 手动捂脸,有点太多了。大家凑合着看。。。

五、PPT文档转Html

就是将ppt转换成一张张图片再放入html。

注: (1)ppt中文字会有中文显示问题,可以将中文文字库加入到服务器中即可。 (2)pptx2007版本文件不能显示表格

1、2003版本ppt转html

public static boolean ppt2003Tohtml(Map params) {
    try {
       String imgPath = params.get("fileImg").toString();
       File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString());
       InputStream inputStream = new FileInputStream(file);
       SlideShow ppt = new SlideShow(inputStream);
       inputStream.close();

       Dimension pgsize = ppt.getPageSize();
       org.apache.poi.hslf.model.Slide[] slide = ppt.getSlides();
       FileOutputStream out = null;
       String imghtml = "";
       String viewImgPath = params.get("viewImgPath").toString();
       for (int i = 0; i < slide.length; i++) {
           logger.debug("第" + i + "页。");

           TextRun[] truns = slide[i].getTextRuns();
           for (int k = 0; k < truns.length; k++) {
               RichTextRun[] rtruns = truns[k].getRichTextRuns();
               for (int l = 0; l < rtruns.length; l++) {

               rtruns[l].setFontIndex(1);
               rtruns[l].setFontName("宋体");
               }
           }
           BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);

           Graphics2D graphics = img.createGraphics();
           graphics.setPaint(Color.BLUE);
           graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));
           slide[i].draw(graphics);

           // 这里设置图片的存放路径和图片的格式(jpeg,png,bmp等等)
           out = new FileOutputStream(imgPath + (i + 1) + ".jpeg");
           javax.imageio.ImageIO.write(img, "jpeg", out);
            
           //图片在html加载路径
           String imgs = viewImgPath + "/" + (i + 1) + ".jpeg";
           imghtml += "<img src=\'" + imgs + "\' style=\'width:960px;height:530px;vertical-align:text-bottom;\'><br><br><br><br>";
           DOMSource domSource = new DOMSource();
           StreamResult streamResult = new StreamResult(out);
           TransformerFactory tf = TransformerFactory.newInstance();
           Transformer serializer = tf.newTransformer();
           serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
           serializer.setOutputProperty(OutputKeys.INDENT, "yes");
           serializer.setOutputProperty(OutputKeys.METHOD, "html");
           serializer.transform(domSource, streamResult);
           out.close();
           String ppthtml = "<html><head><META http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body>" + imghtml + "</body></html>";
           FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), ppthtml, "utf-8");
       }
   } catch (Exception e) {
       e.printStackTrace();
       return false;
   }
   return true;
}

2、2007版本ppt转换html

public static boolean ppt2007Tohtml(Map params) {
    try {
      String imgPath = params.get("fileImg").toString();
      File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString());
      InputStream inputStream = new FileInputStream(file);
      XMLSlideShow ppt = new XMLSlideShow(inputStream);
      inputStream.close();

      Dimension pgsize = ppt.getPageSize();

      XSLFSlide[] pptPageXSLFSLiseList = ppt.getSlides();
      FileOutputStream out = null;
      String imghtml = "";
      String viewImgPath = params.get("viewImgPath").toString();
      for (int i = 0; i < pptPageXSLFSLiseList.length; i++) {
        try {
          for (XSLFShape shape : pptPageXSLFSLiseList[i].getShapes()) {
            if (shape instanceof XSLFTextShape) {
              XSLFTextShape tsh = (XSLFTextShape) shape;
              for (XSLFTextParagraph p : tsh) {
                for (XSLFTextRun r : p) {
                  r.setFontFamily("宋体");
                }
              }
            }
          }
          BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
          Graphics2D graphics = img.createGraphics();
          // clear the drawing area
          graphics.setPaint(Color.white);
          graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));

          // render
          pptPageXSLFSLiseList[i].draw(graphics);

          //
          String Imgname = imgPath + (i + 1) + ".jpeg";
          out = new FileOutputStream(Imgname);
          javax.imageio.ImageIO.write(img, "jpeg", out);
          //图片在html加载路径
          String imgs = viewImgPath + "/" + (i + 1) + ".jpeg";
          imghtml += "<img src=\'" + imgs + "\' style=\'width:960px;height:530px;vertical-align:text-bottom;\'><br><br><br><br>";
        } catch (Exception e) {
          System.out.println(e);
          System.out.println("第" + i + "张ppt转换出错");
        }
      }


      DOMSource domSource = new DOMSource();
      StreamResult streamResult = new StreamResult(out);
      TransformerFactory tf = TransformerFactory.newInstance();
      Transformer serializer = tf.newTransformer();
      serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
      serializer.setOutputProperty(OutputKeys.INDENT, "yes");
      serializer.setOutputProperty(OutputKeys.METHOD, "html");
      serializer.transform(domSource, streamResult);
      out.close();
      String ppthtml = "<html><head><META http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body>" + imghtml + "</body></html>";
      FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), ppthtml, "utf-8");
    } catch (Exception e) {
      e.printStackTrace();
      return false;
    }
    return true;
  }

六、预览

1、每次预览将源文件转换成html后,将html文件上传nginx服务器目录下,用nginx代理访问,以防出现浏览器缓存问题。 我这里是将html文件和图片路径打包,通过sftp上传nginx服务器,然后解压。 2、使用iframe标签访问html文件。这里需要注意的是预览pdf文件,会出现下载打印的按钮,如果需求是文件只可缓存不可下载,就不可以使用iframe标签预览pdf。