POI读取word文档

作者: 蝉鸣的雨 | 来源:发表于2018-03-25 20:17 被阅读747次

POI读取word文档
poi 读取word文档中内容
怎么用Java操作Word文档？
word生成及word转pdf
JSP 利用Apache POI 操作 Word
读取word文档
Java POI导出Word文档
iOS--通过webView读取word文档
POI解析Word中文API
如何用Java将Office文档转换为PDF

最近做了一个word文档导入的功能，但是因为项目紧急，所以做的很粗糙。好不容易周末了，就自己撸了一会代码，想把他做成一个通用的工具，以备以后用到时直接黏贴。

概述

POI 的起源

POI是apache的一个开源项目，他的起始初衷是处理基于Office Open XML标准（OOXML）和Microsoft OLE 2复合文档格式（OLE2）的各种文件格式的文档，而且对于读和写都有支持。可以说是JAVA处理OFFICE文档的首选工具了。

HWPF和XWPF

POI操作word文档的两个主要模块就是HWPF和XWPF。
HWPF是操作Microsoft Word 97（-2007）文件的标准API入口。它还支持对旧版Word 6和Word 95文件对有限的只读功能。
XWPF是操作Microsoft Word 2007文件的标准API入口。

读取word文档

其实，POI对于word文档的读写操作提供了许多API可用，这里只提供最简单的按段落读取文字内容的demo，对于图片读取或表格的读取，以后再更新。

maven依赖

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>24.0-jre</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.17</version>
        </dependency>

    public static <T> List<String> readWordFile(String path) {
        List<String> contextList = Lists.newArrayList();
        InputStream stream = null;
        try {
            stream = new FileInputStream(new File(path));
            if (path.endsWith(".doc")) {
                HWPFDocument document = new HWPFDocument(stream);
                WordExtractor extractor = new WordExtractor(document);
                String[] contextArray = extractor.getParagraphText();
                Arrays.asList(contextArray).forEach(context -> contextList.add(CharMatcher.whitespace().removeFrom(context)));
                extractor.close();
                document.close();
            } else if (path.endsWith(".docx")) {
                XWPFDocument document = new XWPFDocument(stream).getXWPFDocument();
                List<XWPFParagraph> paragraphList = document.getParagraphs();
                paragraphList.forEach(paragraph -> contextList.add(CharMatcher.whitespace().removeFrom(paragraph.getParagraphText())));
                document.close();
            } else {
                LOGGER.debug("此文件{}不是word文件", path);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (null != stream) try {
                stream.close();
            } catch (IOException e) {
                e.printStackTrace();
                LOGGER.debug("读取word文件失败");
            }
        }
        return contextList;
    }

网友评论

本文标题：POI读取word文档

本文链接：https://www.haomeiwen.com/subject/cbdkcftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

POI读取word文档

概述

POI 的起源

HWPF和XWPF

读取word文档

相关文章

POI读取word文档

poi 读取word文档中内容

怎么用Java操作Word文档？

word生成及word转pdf

JSP 利用Apache POI 操作 Word

读取word文档

Java POI导出Word文档

iOS--通过webView读取word文档

POI解析Word中文API

如何用Java将Office文档转换为PDF

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读