You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a previous post, I tried to compare Easy Batch and Spring Batch in terms of features. I came to the conclusion that with no doubt, Spring Batch provides a richer feature set and allows you to do much more than Easy Batch does.
In this post, I will compare Easy Batch and Spring Batch in terms of performance. This post is constructive! I developed an alternative to Spring Batch, so I was curious about how it would behave at runtime. Regardless of the result, the goal is to understand why a framework would perform better/worse than the other, and not to show that a framework performs better/worse than the other. The benchmark measures the execution time to read the following customer data file customers_in.csv:
I will use Easy Random library to generate several files of different sizes for the benchmark: 100.000, 1.000.000 and 10.000.000 customers. The configuration of Easy Batch and Spring Batch applications is pretty much like the Hello World application of the previous post, only the domain object has been changed from Tweet to Customer. Here is the main class to launch Easy Batch job:
The benchmark results have been obtained as an average of 5 executions on the following Hardware/Software configuration:
Hardware:
Laptop: MacBook Pro (Retina, 15-inch, Late 2013)
CPU: 2 GHz Intel Core i7
RAM: 8 GB 1600 MHz DDR3
DISK: 251 GB SSD Flash Storage
Software:
OS: Mac OS X Yosemite 10.10.3
Java: version 1.7.0_67 HotSpot(TM) 64-Bit Server VM
The commit-interval is an important parameter for the performance of Spring Batch, just like the batch-size parameter for Easy Batch.
I have used different values for these parameters: 10, 100 and 1000. The following table summarizes the number of input records, the file size and the processing time for each framework:
Number of records (file size)
Easy Batch BS = 10 (s)
Easy Batch BS = 100 (s)
Easy Batch BS = 1000 (s)
Spring Batch CI = 10 (s)
Spring Batch CI = 100 (s)
Spring Batch CI = 1000 (s)
100.000 (9.4 Mo)
1
1
1
10
6
5
1.000.000 (94 Mo)
8
7
7
74
40
38
10.000.000 (983 Mo)
82
76
73
773
424
388
It's always better to see results in charts, so here they are:
The difference is more important for very large data sets:
Please note that this is a macro benchmark, not a micro benchmark at nano second level (where I would have used JMH or a similar tool). The goal is to have a rough idea about the whole execution time for both applications.
Conclusion
Easy Batch is faster than Spring Batch in this case (but might be slower in another case). Now the question is: why this difference? From my understanding of Spring Batch mechanics, I guess the interaction with the job repository (even in memory) is the main reason. Persisting the job/step execution state at each commit-interval has a considerable performance overhead, but it enables job restarts in case of failure. It's always a matter of trade-offs..
The text was updated successfully, but these errors were encountered:
In a previous post, I tried to compare Easy Batch and Spring Batch in terms of features. I came to the conclusion that with no doubt, Spring Batch provides a richer feature set and allows you to do much more than Easy Batch does.
In this post, I will compare Easy Batch and Spring Batch in terms of performance. This post is constructive! I developed an alternative to Spring Batch, so I was curious about how it would behave at runtime. Regardless of the result, the goal is to understand why a framework would perform better/worse than the other, and not to show that a framework performs better/worse than the other. The benchmark measures the execution time to read the following customer data file
customers_in.csv
:and write each record in uppercase to
customers_out.csv
. In another world, I would write something like:but let's stay in the Java world.. 😄 The following domain object will be used to marshal/unmarshal data:
I will use Easy Random library to generate several files of different sizes for the benchmark: 100.000, 1.000.000 and 10.000.000 customers. The configuration of Easy Batch and Spring Batch applications is pretty much like the Hello World application of the previous post, only the domain object has been changed from
Tweet
toCustomer
. Here is the main class to launch Easy Batch job:And here is the main class to launch Spring Batch job:
Results
The benchmark results have been obtained as an average of 5 executions on the following Hardware/Software configuration:
Hardware:
Software:
The
commit-interval
is an important parameter for the performance of Spring Batch, just like thebatch-size
parameter for Easy Batch.I have used different values for these parameters: 10, 100 and 1000. The following table summarizes the number of input records, the file size and the processing time for each framework:
It's always better to see results in charts, so here they are:
The difference is more important for very large data sets:
Please note that this is a macro benchmark, not a micro benchmark at nano second level (where I would have used JMH or a similar tool). The goal is to have a rough idea about the whole execution time for both applications.
Conclusion
Easy Batch is faster than Spring Batch in this case (but might be slower in another case). Now the question is: why this difference? From my understanding of Spring Batch mechanics, I guess the interaction with the job repository (even in memory) is the main reason. Persisting the job/step execution state at each commit-interval has a considerable performance overhead, but it enables job restarts in case of failure. It's always a matter of trade-offs..
The text was updated successfully, but these errors were encountered: