Git Product home page Git Product logo

Comments (3)

spring-projects-issues avatar spring-projects-issues commented on May 7, 2024

Lucas Ward commented

After looking at the issue, it's really only necessary for the developer to specify how many skips they want to allow, since that's all they really care about. Rollbackcount was left in the StepExecution because it is an extremely useful statistics for operations teams. It also doesn't matter to the team what happened after the rollback (retry or recover), they just need to know how many times a particular job rolled-back.

from spring-batch.

spring-projects-issues avatar spring-projects-issues commented on May 7, 2024

Tommy C. Trang commented

In the batch summary statistics, we definitely need to know how many records skipped so we can investigate and handle those records after the original run. At C-IV, we used a preset max amount of allowable skipped records. If the maximum number of allowable skipped records is exceeded, the non-fatal exception is converted into a fatal exception to stop the run. This indicates that something is seriously wrong with the input. We haven't use a percentage of records for allowable skips because we do not know the total number of records that will be processed in the run at the beginning of the run. We didn't calculate the total number of records that will be processed because of the additional processing required. A few options to get the total records upfront are:

  1. Run the SQL twice, once for the sum and once for the result set to process
    Problem: Driving query is run twice
  2. Run the SQL once and make the result set scrollable. Go to the last record to get the record position for the total number of record.
    Problem: Scrollable prepared statement / result set are more expensive. The architecture we used is cursor-driven and kept the driving query result set opened for the entire run. Using the scrollable option here loads the entire result set into memory and that can be problematic for memory utilization.
  3. Run the SQL once and load the data into an ArrayList, then close the result set.
    Problem: All data are in an ArrayList and used up lots of memory. The ArrayList is in scope for the entire run.
  4. Don't use a cursor-driven approach. Run the driving query to retrieve only the record key and load the record key into an ArrayList, then close the cursor. Run the driving query to retrieve all the required data for a specific record when processing.
    Problem: The driving query is more often than not the longest running SQL in the program. This approach runs the driving query logics once for each record and that can be a performance problem. It has been proven in my past projects that retrieving as much information as possible in a single SQL in the driving query are usually best for performance.

In conclusion, I think it is a nice-to-have to specify a percentage of records for skipped but don't think it is worth the effort or performance hit. I like the simple approach of just specifying a single integer value for allowable skipped before stopping the batch run due to large amount of skipped records. Handling what to do with the skipped records is a separate topic whether to automatically re-run or manually the next day. There should be small finite possibilities to re-process skipped records.

from spring-batch.

spring-projects-issues avatar spring-projects-issues commented on May 7, 2024

Lucas Ward commented

Skip Limit has been added to StepConfiguration. There probably needs to be some code in the DefaultStepExecutor to deal with this value, but that can be created as a new issue.

from spring-batch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.