读取大文件耗时很久 #4041

cengxw · 2024-10-31T09:23:00Z

建议先去看文档

异常代码

EasyExcel.read(file, new ReadListener() {
  @Override
  public void invoke(Object data, AnalysisContext context) {
	  System.out.println(data);
	  throw new ExcelAnalysisStopException();
  }
  
  @Override
  public void doAfterAllAnalysed(AnalysisContext context) {
  
  }
}).sheet().doRead();

异常提示

问题描述

就这段代码，读取一个160MB的xlsx，100W行、48列，耗时40秒。我只想读取第一行然后就结束读取，这耗时让我怀疑我是不是用错方式了

cengxw · 2024-10-31T16:13:11Z

我用以下方法来读取就很快，2~4秒就能出结果，能说说差异在哪吗？

public static Map<String, String> firstMap(File file) {
        OPCPackage pkg = null;
        InputStream sheet = null;
        final Map<String, String> titleMap = new HashMap<>();
        try {
            pkg = OPCPackage.open(file, PackageAccess.READ);
            XSSFReader reader = new XSSFReader(pkg);
            sheet = reader.getSheet("rId1");
            XMLReader parser = XMLReaderFactory.createXMLReader();
            ReadOnlySharedStringsTable rosst = new ReadOnlySharedStringsTable(pkg);
            parser.setContentHandler(new XSSFSheetXMLHandler(reader.getStylesTable(),rosst,new XSSFSheetXMLHandler.SheetContentsHandler() {

                @Override
                public void cell(String index, String value, XSSFComment comment) {
                    titleMap.put(index, value);
                }

                @Override
                public void startRow(int arg0) {
                }

                @Override
                public void endRow(int arg0) {
                    throw new ArithmeticException();
                }

            },
            false));
            try {
                parser.parse(new InputSource(sheet));
            } catch(ArithmeticException e) {
                
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                sheet.close();
            } catch (IOException e) {

            }
            try {
                pkg.close();
            } catch (IOException e) {

            }
            return titleMap;
        }
    }

wyctxwd1 · 2024-11-04T08:48:21Z

因为他源码里在读取的时候在一开始就解析了完整的sharestring，你的代码是自己处理了sharestring

wangguanquan · 2024-11-06T12:25:51Z

可以尝试换成EEC来处理这类问题， https://github.com/wangguanquan/eec
它支持filter, map, collect等标准的流式处理，可以像操作集合类一样读取Excel

try (ExcelReader reader = ExcelReader.read(Paths.get("/export/excel-1000k.xlsx"))) {
    // 指定第1行为表头并获取表头
    HeaderRow row = (HeaderRow) reader.sheet(0).header(1).getHeader();
    System.out.println(row);

    // 获取第100行的数据
    Row row100 = reader.sheet(0).rows().filter(r -> r.getRowNum() == 100).findAny().get();
    System.out.println(row100);

    // 将行数据转换为指定对象
    Item item = row100.to(Item.class);
}

cengxw · 2024-11-06T13:07:45Z

感谢各位，我去研究研究

cengxw added the help wanted Extra attention is needed label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

读取大文件耗时很久 #4041

读取大文件耗时很久 #4041

cengxw commented Oct 31, 2024

cengxw commented Oct 31, 2024

wyctxwd1 commented Nov 4, 2024

wangguanquan commented Nov 6, 2024

cengxw commented Nov 6, 2024

读取大文件耗时很久 #4041

读取大文件耗时很久 #4041

Comments

cengxw commented Oct 31, 2024

建议先去看文档

异常代码

异常提示

问题描述

cengxw commented Oct 31, 2024

wyctxwd1 commented Nov 4, 2024

wangguanquan commented Nov 6, 2024

cengxw commented Nov 6, 2024