io.ReadAll impact the OpenReader function memory usage #2086

ijustyce · 2025-02-06T13:45:01Z

OpenReader 源码如下：

func OpenReader(r io.Reader, opts ...Options) (*File, error) {
	b, err := io.ReadAll(r)
	if err != nil {
		return nil, err
	}
	f := newFile()
	f.options = f.getOptions(opts...)
	if err = f.checkOpenReaderOptions(); err != nil {
		return nil, err
	}
	if bytes.Contains(b, oleIdentifier) {
		if b, err = Decrypt(b, f.options); err != nil {
			return nil, ErrWorkbookFileFormat
		}
	}
	zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))
	if err != nil {
		if len(f.options.Password) > 0 {
			return nil, ErrWorkbookPassword
		}
		return nil, err
	}
	file, sheetCount, err := f.ReadZipReader(zr)
	if err != nil {
		return nil, err
	}
	f.SheetCount = sheetCount
	for k, v := range file {
		f.Pkg.Store(k, v)
	}
	if f.CalcChain, err = f.calcChainReader(); err != nil {
		return f, err
	}
	if f.sheetMap, err = f.getSheetMap(); err != nil {
		return f, err
	}
	if f.Styles, err = f.stylesReader(); err != nil {
		return f, err
	}
	f.Theme, err = f.themeReader()
	return f, err
}

其中 io.ReadAll 会导致将所有数据加载到内存，这和流式读取的初衷违背；这里 io.ReadAll 的结果 b 的使用相关代码是：

if bytes.Contains(b, oleIdentifier) {
	if b, err = Decrypt(b, f.options); err != nil {
		return nil, ErrWorkbookFileFormat
	}
}
zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))

建议将这串代码直接改为：

zr, err := zip.OpenReader(name)

并将 file, sheetCount, err := f.ReadZipReader(zr) 改为 file, sheetCount, err := f.ReadZipReader(&zr.Reader) 即可实现真正的流式读取。
我在本地 replace 后测试，修改后，读取 7.8G 的 excel 文件，内存占用不到 10M，而修改前 32G 内存占用。

但并不确定如下代码的用途：

if bytes.Contains(b, oleIdentifier) {
	if b, err = Decrypt(b, f.options); err != nil {
		return nil, ErrWorkbookFileFormat
	}
}

该代码看起来是兼容老版本的 Excel，但具体我并不清楚，在我的测试中，Excel 文件后缀为：xlsx，且读取的数据和实际数据一致。

所以，我的建议是，能否按上面的修改来确保流式读取是真正的流式读取，而不是先 io.ReadAll 全部读取到内存，然后才流式读取，这无意义。如果上面我不确定用途的代码，有实际含义，那么能否在流式读取中兼容它？如果不兼容，那么能否提供不兼容部分格式的流式读取呢？

类似的 issue 有：https://github.com/qax-os/excelize/issues/1775，但该 issue 已经关闭。

The text was updated successfully, but these errors were encountered:

ijustyce · 2025-02-06T14:05:27Z

进一步阅读源码发现上面我提到的代码：

if bytes.Contains(b, oleIdentifier) {
	if b, err = Decrypt(b, f.options); err != nil {
		return nil, ErrWorkbookFileFormat
	}
}

的用途应该是解密加密的 Excel 2003，oleIdentifier 指的是 CFB 格式，即 Excel 2003，如果 Excel 本身没加密，或采用 Excel 2007 进行加密的，那么这串代码应该可以删除。

xuri · 2025-02-08T08:43:34Z

Thanks for your suggestion, would you like to create pull request for this?

ijustyce · 2025-02-09T11:27:11Z

Thanks for your suggestion, would you like to create pull request for this?

#2087

xuri · 2025-02-12T07:53:57Z

Thanks for your PR. I've left some comments for your PR. Due to duplicated with issue #1775, I closed this.

ijustyce pushed a commit to ijustyce/excelize that referenced this issue Feb 6, 2025

try to fix qax-os#2086

553d0ae

xuri added the duplicate This issue or pull request already exists label Feb 12, 2025

xuri closed this as completed Feb 12, 2025

xuri changed the title ~~针对 OpenReader 调用了 io.ReadAll 导致读取超大 excel 时内存占用非常大的优化建议~~ io.ReadAll impact the OpenReader function memory usage Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io.ReadAll impact the OpenReader function memory usage #2086

io.ReadAll impact the OpenReader function memory usage #2086

ijustyce commented Feb 6, 2025 •

edited

Loading

ijustyce commented Feb 6, 2025

xuri commented Feb 8, 2025

ijustyce commented Feb 9, 2025

xuri commented Feb 12, 2025

io.ReadAll impact the OpenReader function memory usage #2086

io.ReadAll impact the OpenReader function memory usage #2086

Comments

ijustyce commented Feb 6, 2025 • edited Loading

ijustyce commented Feb 6, 2025

xuri commented Feb 8, 2025

ijustyce commented Feb 9, 2025

xuri commented Feb 12, 2025

ijustyce commented Feb 6, 2025 •

edited

Loading