美文网首页
Poi导出产生OOM解决方案

Poi导出产生OOM解决方案

作者: 3c69b7c624d9 | 来源:发表于2017-11-30 22:57 被阅读348次

    背景

    1. 目前生产系统大量的数据的导出,情况不是特别理想,将常出现OOM
    2. 如果不限制程序最大内存可能导致其他应用收到影响 比如稽计任务

    思路

    考虑出现OOM是如下几个因素

    1. 查询大量数据 需要维护一个大的列表【如果出现大量的数据 超大的List】我们目前正常最大的list大概能达到接近10w
    2. 封装Excel的workBook 创建Sheet和Cell【基本上单行得有接近20个字段】通常的思路就是workbook==>sheet==》cell

    从这边可以看出通常我们出现系统OOM的应该是在EXCEL导出出现的概率较高

    可以在一个导出完成的系统中查看占比

    可以看出我们的业务数据排在第16位 4w+ 大约占用15.3M 而对应的Excel基本上最大的一个类就占用了接近550M

    可想而知目前最大的瓶颈在于优化Excel导出导致的OOM【如果限制XMX的话可能会频繁OOM 如果不限制的话可能会使得JVM急速膨胀】

    那思路就会有如下

    1. 分解多个Excel 使用多个Excel来进行导出后进行合并 【比较麻烦 并且侵入性太高】打包压缩?
    2. 将Excel导出时写到临时文件中最终把临时文件发出去即可

    Poi在新版本支持了我们所说的特性

    首先了解一下office03和office07的区别

    通常来说office03的拓展名是xls office07的拓展名是xlsx

    xls一般支持最大行数为65535【单sheet】而在xlsx中支持达到1048576

    当我们导出大量的报表的数据时建议采用xlsx的方式

    而在Poi中实现xlsx中有两种方式

    分别是

    175336_lBMO_871390.png175336_lBMO_871390.png

    其中

    1. HSSFWorkbook是office03
    2. XSSFWorkbook是我们在office07总最常使用
    3. SXSSFWorkbook这个提供了OOM的解决方案 如上述所说会临时写到临时文件中
            /**
             * Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy.
             *
             * This allows to write very large files without running out of memory as only
             * a configurable portion of the rows are kept in memory at any one time.
             *
             * You can provide a template workbook which is used as basis for the written
             * data.
             *
             * See https://poi.apache.org/spreadsheet/how-to.html#sxssf for details.
             *
             * Please note that there are still things that still may consume a large
             * amount of memory based on which features you are using, e.g. merged regions,
             * comments, ... are still only stored in memory and thus may require a lot of
             * memory if used extensively.
             *
             * SXSSFWorkbook defaults to using inline strings instead of a shared strings
             * table. This is very efficient, since no document content needs to be kept in
             * memory, but is also known to produce documents that are incompatible with
             * some clients. With shared strings enabled all unique strings in the document
             * has to be kept in memory. Depending on your document content this could use
             * a lot more resources than with shared strings disabled.
             *
             * Carefully review your memory budget and compatibility needs before deciding
             * whether to enable shared strings or not.
             */
            public class SXSSFWorkbook implements Workbook {
                /**
                 * Specifies how many rows can be accessed at most via {@link SXSSFSheet#getRow}.
                 * When a new node is created via {@link SXSSFSheet#createRow} and the total number
                 * of unflushed records would exceed the specified value, then the
                 * row with the lowest index value is flushed and cannot be accessed
                 * via {@link SXSSFSheet#getRow} anymore.
                 */
                public static final int DEFAULT_WINDOW_SIZE = 100;
            }
    

    默认情况下提供100行的访问【换言之 访问窗口为100】

    源码

    我们确认一下代码中如何实现临时文件的建立

            /**
             * Streaming version of XSSFSheet implementing the "BigGridDemo" strategy.
            */
            public class SXSSFSheet implements Sheet
            {
                /*package*/ final XSSFSheet _sh;
                private final SXSSFWorkbook _workbook;
                private final TreeMap<Integer,SXSSFRow> _rows=new TreeMap<Integer,SXSSFRow>();
                private final SheetDataWriter _writer;
                private int _randomAccessWindowSize = SXSSFWorkbook.DEFAULT_WINDOW_SIZE;
                private final AutoSizeColumnTracker _autoSizeColumnTracker;
                private int outlineLevelRow = 0;
                private int lastFlushedRowNumber = -1;
                private boolean allFlushed = false;
             
                public SXSSFSheet(SXSSFWorkbook workbook, XSSFSheet xSheet) throws IOException {
                    _workbook = workbook;
                    _sh = xSheet;
                    _writer = workbook.createSheetDataWriter();
                    setRandomAccessWindowSize(_workbook.getRandomAccessWindowSize());
                    _autoSizeColumnTracker = new AutoSizeColumnTracker(this);
                }
    

    创建Sheet的时候传入对应参数 【默认100】

    首先在创建rows或进行判断

            @Override
            public SXSSFRow createRow(int rownum)
            {
                int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();
                if (rownum < 0 || rownum > maxrow) {
                    throw new IllegalArgumentException("Invalid row number (" + rownum
                            + ") outside allowable range (0.." + maxrow + ")");
                }
             
                // attempt to overwrite a row that is already flushed to disk
                if(rownum <= _writer.getLastFlushedRow() ) {
                    throw new IllegalArgumentException(
                            "Attempting to write a row["+rownum+"] " +
                            "in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");
                }
             
                // attempt to overwrite a existing row in the input template
                if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) {
                    throw new IllegalArgumentException(
                            "Attempting to write a row["+rownum+"] " +
                                    "in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");
                }
             
                SXSSFRow newRow=new SXSSFRow(this);
                _rows.put(rownum,newRow);
                allFlushed = false;
                if(_randomAccessWindowSize>=0&&_rows.size()>_randomAccessWindowSize)
                {
                    try
                    {
                       flushRows(_randomAccessWindowSize);
                    }
                    catch (IOException ioe)
                    {
                        throw new RuntimeException(ioe);
                    }
                }
                return newRow;
            }
            /**
             * Specifies how many rows can be accessed at most via getRow().
             * The exeeding rows (if any) are flushed to the disk while rows
             * with lower index values are flushed first.
             */
            public void flushRows(int remaining) throws IOException
            {
                while(_rows.size() > remaining) flushOneRow();
                if (remaining == 0) allFlushed = true;
            }
    

    当前行数超过阈值的时候将会将最上面一行刷到磁盘

            private void flushOneRow() throws IOException
            {
                Integer firstRowNum = _rows.firstKey();
                if (firstRowNum!=null) {
                    int rowIndex = firstRowNum.intValue();
                    SXSSFRow row = _rows.get(firstRowNum);
                    // Update the best fit column widths for auto-sizing just before the rows are flushed
                    _autoSizeColumnTracker.updateColumnWidths(row);
                    _writer.writeRow(rowIndex, row);
                    _rows.remove(firstRowNum);
                    lastFlushedRowNumber = rowIndex;
                }
            }
    

    这边_writer负责将对应的数据刷到临时文件中

            /**
             * Initially copied from BigGridDemo "SpreadsheetWriter".
             * Unlike the original code which wrote the entire document,
             * this class only writes the "sheetData" document fragment
             * so that it was renamed to "SheetDataWriter"
             */
            public class SheetDataWriter {
                private static final POILogger logger = POILogFactory.getLogger(SheetDataWriter.class);
                 
                private final File _fd;
                private final Writer _out;
                private int _rownum;
                private int _numberOfFlushedRows;
                private int _lowestIndexOfFlushedRows; // meaningful only of _numberOfFlushedRows>0
                private int _numberOfCellsOfLastFlushedRow; // meaningful only of _numberOfFlushedRows>0
                private int _numberLastFlushedRow = -1; // meaningful only of _numberOfFlushedRows>0
             
                /**
                 * Table of strings shared across this workbook.
                 * If two cells contain the same string, then the cell value is the same index into SharedStringsTable
                 */
                private SharedStringsTable _sharedStringSource;
             
                public SheetDataWriter() throws IOException {
                    _fd = createTempFile();
                    _out = createWriter(_fd);
                }
             
                public SheetDataWriter(SharedStringsTable sharedStringsTable) throws IOException{
                    this();
                    this._sharedStringSource = sharedStringsTable;
                }
                /**
                 * Create a temp file to write sheet data.
                 * By default, temp files are created in the default temporary-file directory
                 * with a prefix "poi-sxssf-sheet" and suffix ".xml".  Subclasses can override
                 * it and specify a different temp directory or filename or suffix, e.g. <code>.gz</code>
                 *
                 * @return temp file to write sheet data
                 */
                public File createTempFile() throws IOException {
                    return TempFile.createTempFile("poi-sxssf-sheet", ".xml");
                }
            }
    

    很明显此处将会在Temp目录下创建一个临时文件

    当数据写完的时候如下

            public InputStream getWorksheetXMLInputStream() throws IOException
            {
                // flush all remaining data and close the temp file writer
                flushRows(0);
                _writer.close();
                return _writer.getWorksheetXMLInputStream();
            }
        
            public InputStream getWorksheetXMLInputStream() throws IOException {
                File fd = getTempFile();
                return new FileInputStream(fd);
            }
    

    做了一次导出 大约Excel 大小15m 共计4w行数据

    175552_Hk3N_871390.png175552_Hk3N_871390.png

    这样再次导出时将会性能OK

    经历多次导出后仍然显示如下

    175627_9joB_871390.png175627_9joB_871390.png

    当执行GC后数据可以显著回收

    175640_qwGA_871390.png175640_qwGA_871390.png

    相关文章

      网友评论

          本文标题:Poi导出产生OOM解决方案

          本文链接:https://www.haomeiwen.com/subject/nonbbxtx.html