Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JdbcJobInstanceDao to Use LIMIT and OFFSET for Efficient Pagination #4740

Open
zakyvit opened this issue Jan 7, 2025 · 1 comment
Labels
status: waiting-for-triage Issues that we did not analyse yet type: feature

Comments

@zakyvit
Copy link

zakyvit commented Jan 7, 2025

The current implementation of JdbcJobInstanceDao in Spring Batch is too generic and not optimized for scenarios where pagination is needed. Specifically, methods like:

public List<JobInstance> findJobInstancesByName(String jobName, final int start, final int count)
public List<JobInstance> getJobInstances(String jobName, final int start, final int count)

load all matching data from the database and then manually filter it based on the start and count parameters. This approach is inefficient, particularly when dealing with large datasets.

Proposal:
To improve performance, it would be beneficial to provide a more optimized implementation for databases that support LIMIT and OFFSET. These clauses allow for efficient pagination directly at the database level, avoiding the need to load and filter unnecessary data in memory.

Suggested Changes:
Add support for LIMIT and OFFSET in the SQL queries for methods like findJobInstancesByName and getJobInstances when using databases such as:

  • MySQL
  • PostgreSQL
  • H2
  • Other databases that natively support LIMIT and OFFSET.

Provide database-specific implementations for JdbcJobInstanceDao or extend the base implementation to use optimized SQL for supported databases.

Retain the existing implementation as a fallback for databases that do not support LIMIT and OFFSET.

Benefits:
Improved Performance: Reduces the amount of data fetched and processed in memory, especially for large datasets.
Better Resource Utilization: Minimizes memory usage and reduces database load by fetching only the required rows.
More Scalable: Makes the framework more efficient for applications with large job histories or high concurrency.
Example:
Instead of:

SELECT JOB_INSTANCE_ID, JOB_NAME 
FROM %PREFIX%JOB_INSTANCE 
WHERE JOB_NAME = ? 
ORDER BY JOB_INSTANCE_ID DESC;

and then manually iterating over the results, the query should be:

SELECT JOB_INSTANCE_ID, JOB_NAME 
FROM %PREFIX%JOB_INSTANCE 
WHERE JOB_NAME = ? 
ORDER BY JOB_INSTANCE_ID DESC 
LIMIT ? OFFSET ?;

Impact:
This change would primarily benefit users of supported databases, improving the performance of pagination-related methods in JdbcJobInstanceDao.

If you find this suggestion useful, I'd be happy to contribute a pull request with an implementation. Let me know your thoughts!

@zakyvit zakyvit added status: waiting-for-triage Issues that we did not analyse yet type: feature labels Jan 7, 2025
@wondywang
Copy link

Great idea, very useful feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage Issues that we did not analyse yet type: feature
Projects
None yet
Development

No branches or pull requests

2 participants