You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to use a CSV file-based dataset, that contains very long values in the columns, you have to increase the maxCharsPerColumn property from the default value of 4096. If, however, you don't know what is the length of the largest datapoint in your dataset, or cannot be sure to set a hard limit for future expansion, a logical thing to do would be to set the value to the largest possible one, i.e. Integer.MAX_VALUE.
This to my surprise crashes the test execution with :
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.input.DefaultCharAppender.<init>(DefaultCharAppender.java:40)
at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParserSettings.newCharAppender(CsvParserSettings.java:93)
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.ParserOutput.<init>(ParserOutput.java:111)
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.AbstractParser.<init>(AbstractParser.java:91)
at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParser.<init>(CsvParser.java:70)
at org.junit.jupiter.params.provider.CsvParserFactory.createParser(CsvParserFactory.java:61)
at org.junit.jupiter.params.provider.CsvParserFactory.createParserFor(CsvParserFactory.java:40)
at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:64)
at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:44)
at org.junit.jupiter.params.provider.AnnotationBasedArgumentsProvider.provideArguments(AnnotationBasedArgumentsProvider.java:52)
at org.junit.jupiter.params.ParameterizedTestExtension.arguments(ParameterizedTestExtension.java:145)
at org.junit.jupiter.params.ParameterizedTestExtension.lambda$provideTestTemplateInvocationContexts$2(ParameterizedTestExtension.java:90)
at org.junit.jupiter.params.ParameterizedTestExtension$$Lambda/0x00007427a8142bb0.apply(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
I was absolutely shocked to see that the CSVParser implementation used by JUnit is really pre-allocating a char array to store the CSV values in it. See 1.
This is completely unusable for CSV strings of unknown length. One could of course provide a value that would fit within the JVM limits (i.e. Integer.MAX_VALUE - 8), at which point this doesn't crash, but that just means your unit test is now allocating absolutely ridiculous amounts of heap memory to run.
Digging further, I found that the shaded univocity parsers library does actually have another implementation of DefaultCharAppender called ExpandingCharAppender, which seems to grow the char buffer at runtime, starting from a modest 8192 buffer length value.
The library is basing it's decision on which Appender to use in the CsvParserSettings, see 2. Apparently, all that is required to switch to the ExpandingCharAppender is to pass a value of -1 for the maxCharsPerColumn.
Unfortunately, the maxCharsPerColumn property of the @CsvFileSource annotation requires the value to be a positive integer:
org.junit.platform.commons.PreconditionViolationException: maxCharsPerColumn must be a positive number: -1
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Suppressed: org.junit.platform.commons.PreconditionViolationException: Configuration error: You must configure at least one set of arguments for this @ParameterizedTest
at java.base/java.util.stream.AbstractPipeline.close(AbstractPipeline.java:323)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
... 9 more
Steps to reproduce
OutOfMemoryError when using large column length limits
Used versions (Jupiter/Vintage/Platform): JUnit 5.10.3
Build Tool/IDE: JDK 21
TLDR
Please switch to the ExpandingCharAppender by default when using @CsvFileSource, or at least allow its usage by removing the positive integer validation of maxCharsPerColumn property, and document the valid range.
Alternatively, you may consider switching to a better CSV parser implementation altogether. This obscure "Univocity" library has last seen a commit in 2021 and its website univocity.com returns an HTTP 404 error page.
The text was updated successfully, but these errors were encountered:
Please switch to the ExpandingCharAppender by default when using @CsvFileSource, or at least allow its usage by removing the positive integer validation of maxCharsPerColumn property, and document the valid range.
We'll do that for now.
Alternatively, you may consider switching to a better CSV parser implementation altogether. This obscure "Univocity" library has last seen a commit in 2021 and its website univocity.com returns an HTTP 404 error page.
Description
If you want to use a CSV file-based dataset, that contains very long values in the columns, you have to increase the
maxCharsPerColumn
property from the default value of4096
. If, however, you don't know what is the length of the largest datapoint in your dataset, or cannot be sure to set a hard limit for future expansion, a logical thing to do would be to set the value to the largest possible one, i.e.Integer.MAX_VALUE
.This to my surprise crashes the test execution with :
I was absolutely shocked to see that the
CSVParser
implementation used by JUnit is really pre-allocating achar
array to store the CSV values in it. See 1.This is completely unusable for CSV strings of unknown length. One could of course provide a value that would fit within the JVM limits (i.e.
Integer.MAX_VALUE - 8
), at which point this doesn't crash, but that just means your unit test is now allocating absolutely ridiculous amounts of heap memory to run.Digging further, I found that the shaded univocity parsers library does actually have another implementation of
DefaultCharAppender
calledExpandingCharAppender
, which seems to grow thechar
buffer at runtime, starting from a modest8192
buffer length value.The library is basing it's decision on which
Appender
to use in theCsvParserSettings
, see 2. Apparently, all that is required to switch to theExpandingCharAppender
is to pass a value of-1
for themaxCharsPerColumn
.Unfortunately, the
maxCharsPerColumn
property of the@CsvFileSource
annotation requires the value to be a positive integer:Steps to reproduce
ExpandingCharAppender
withmaxCharsPerColumn = -1
Context
TLDR
Please switch to the
ExpandingCharAppender
by default when using@CsvFileSource
, or at least allow its usage by removing the positive integer validation ofmaxCharsPerColumn
property, and document the valid range.Alternatively, you may consider switching to a better CSV parser implementation altogether. This obscure "Univocity" library has last seen a commit in 2021 and its website univocity.com returns an HTTP 404 error page.
The text was updated successfully, but these errors were encountered: