-
Notifications
You must be signed in to change notification settings - Fork 200
faqs
Easy Batch streams data record by record from the data source. Depending on the data source type, a record can be a line in a flat file, an tag in an Xml file, a record in a database table, etc.
The RecordReader
abstraction is intended to be implemented with a Streaming API so that the data source is not entirely loaded in memory (which is the main cause of java.lang.OutOfMemoryError
of many batch applications).
There are several implementations of the RecordReader
interface to read data from a variety of data sources.
Please refer to the user guide for all details about available record readers.
Easy Batch writes records in batches. Most APIs provide a way to write data in bulk mode for performance reasons.
The RecordWriter
abstraction is designed to work this way. The writeRecords
method takes a batch of records and writes them as a unit to the data sink.
Usually, the write operation is performed within a transaction boundary so records are written in unit or rolled back for re-processing.
Easy Batch provides several implementations of the RecordWriter
interface to write data to a variety of data sinks.
Please refer to the user guide for all details about available record writers.
Yes. Even though common use cases are about processing textual data, Easy Batch Record
abstraction can be implemented for any type of input data.
For example, in a scenario where you need to compress a set of images with Java in a batch mode, a record can be one image file.
Easy Batch API is generic and can be used to process any type of input data.
Yes. Easy Batch uses the reference implementation Hibernate validator to validate domain objects. For all details about how to validate data using Bean Validation API with Easy Batch, please refer to the user guide.
No. Easy Batch has been designed and implemented before the JSR 352 has been submitted.
Yes. You can enable Jmx monitoring with JobBuilder#enableJmx()
. This will register a JMX MBean named org.jeasy.batch.jmx.monitor:name=YourJobName
at job startup.
You can use any standard JMX compliant tool to monitor your job metrics.
You can also use a JobMonitorProxy
and register a JobMonitoringListener
to listen to push notifications sent by the job at runtime.
Yes. the JobFactoryBean
can be used to configure and declare Easy Batch jobs as Spring beans.
This factory bean can be used by importing the easy-batch-spring
module.
Comparing Easy Batch and Spring Batch would be unfair, because even if both frameworks fundamentally try to solve the same problem, they are conceptually different at several levels:
- Job structure: A job in Spring Batch is a collection of steps. A step can be a single task or chunk-oriented. In Easy Batch, there is no concept of step. A job in Easy Batch is similar to a Spring Batch job with a single chunk-oriented step using an in-memory job repository.
- Job definition: Spring Batch provides a DSL to define the execution flow of steps within a job. In Easy Batch, there is no such DSL. Creating a workflow of jobs is left to an external workflow engine like Easy Flows.
-
Job execution: A Spring Batch job can have multiple job instances (identified by (identifying) job parameters). Each job instance may in turn have multiple executions. In Easy Batch, there is no such job instance or job execution concepts. Jobs are Callable objects that can be executed with a
JobExecutor
orExecutorService
.
That said, Spring Batch is an advanced batch processing framework with a very rich features set such as flows, remoting, partitioning, automatic retry on failure, etc. Easy Batch is a bit like Spring Batch, but much smaller and not as bright! It's a simple and lightweight framework that can be learned quickly and used easily for the majority of batch processing use cases. Easy Batch does not compete with Spring Batch but tries to provide an alternative that is easier to learn, configure and use. A detailed comparison between Easy Batch and Spring Batch can be found in the following blog posts:
Easy Batch started from the belief that batch jobs should be designed to be restartable without relying on any tool. One of the best characteristics a batch job can have is idempotency. If a job cannot be implemented in an idempotent way, there are always patterns to make it restartable without persisting its state, like the process indicator pattern, or the staging table pattern, etc.
Persisting the job state during the execution is not only expensive in terms of performance, but also requires additional setup, configuration and maintenance.
If you think about it, the failure rate of batch jobs is always neglectable compared to the success rate (even if the failure rate is high!), so adding such a feature will put an unfair disadvantage for the majority of job executions to the profit of a minority of failures.
That said, its possible to persist the job state in a persistent store if necessary. You can find an example in the Restart a failed job tutorial.
Easy Batch does not implement batch jobs as a workflow of steps and this is on purpose. Implementing workflows should be delegated to workflow engines like Easy Flows. Easy Batch was specifically designed for simple ETL jobs. A job in Easy Batch is a single task that has the sole responsibility of reading/processing/writing data from a source to a target.
Easy Batch used to rely on the java.util.logging
API to minimise dependencies. As of v5.3, SLF4J is used for logging. So you can use any logging framework compatible with SLF4J.
Talking about "Micro"-benchmarks for batch applications is not correct. Benchmarking batch applications is hard as the majority of real world jobs interact with external resources (databases, file systems, etc).
We used to provide a JMH based benchmark to measure the potential overhead of Easy Batch jobs but it has been removed because we think JMH is not the right tool to benchmark batch jobs. Benchmarking batch applications heavily depends on the use case, so the best way to measure any potential overhead is to give it a try on your use case (and compare it to other frameworks).
Feel free to ask your question on Gitter.
Easy Batch is created by Mahmoud Ben Hassine with the help of some awesome contributors
-
Introduction
-
User guide
-
Job reference
-
Component reference
-
Get involved