0

I have a Spring boot app which has a scheduled job that runs every 1,5 seconds. Its goal is to fetch data from a 3rd party api, update the database with results (if needed) and repeat. The next api call should not start before the previous updates are finished.

For simplicity lets say the code looks like this

@Scheduled(initialDelay = 3000, fixedDelay = 1500)
public void loadUpdates() {
    List<Item> recentItems = apiClient.getUpdatesAfter(lastUpdateAt);

    List<CompletableFuture<Void>> tasks = new ArrayList<>();
    for(Item item: recentItems) {
        tasks.add(CompletableFuture.runAsync(
                () -> handleItemUpdates(item),
                itemUpdateExecutor
        ));
    }

    // need to wait for all updates to finish
    CompletableFuture.allOf(tasks.toArray(CompletableFuture[]::new)).join();
    
    doSomethingElseHere();
    
    saveTheLastUpdateTime();
}

public void handleItem(Item item) {
    ItemDetails details = this.apiClient.getDetails(item.getId());

    
    switch (item.getStatus()) {
       case 1:
        doOneTypeOfDbUpdates(item, details);
       break;
       case 2:
        doOtherTypeOfDbUpdates(item, details);
       break;
    }
}

@Transactional
public void doOneTypeOfDbUpdates(Item item, ItemDetails itemDetails) {
    updateOneTable();
    insertManyRecordsToAnotherTable();
    deleteSomethingFromThirdTable();
}

It worked fine in the beginning but now as the data amount grows this approach takes to much time.

Let's assume that the code of handling works fine and the db queries are optimal.

Edit: One item would cause various actions like another API call, inserts/updates into different tables in a transaction and/or databases, so doing bulk inserts of all the items is not possible from the received list is not really an option.

The question is how can this be done in a better way?

I have thought of replacing threads with a queue (rabbitmq) and process with multiple instances, but how to make sure that the next iteration will not start until all of the jobs are finished?

Any suggestions are welcome - Spring Integration, Apache Camel or any other solutions/frameworks/libraries/queues/etc.

Thank you in advance.

4
  • 1
    "Lets assume that ... the db queries are optimal." - I think that that assumption may be your problem. You will get better performance by doing the database updates in batches.
    – Stephen C
    Apr 12 at 14:16
  • @StephenC that is true, but the processing is complex and the question is more about how to do better parallel processing or how to approach the problem in general
    – Alex
    Apr 12 at 14:45
  • 1
    Measure, figure out where the bottleneck is and then determine the cause of action. Adding threads to a problem like this might even make it worse due to contention on the DB side, or other IO parts. You might want to do something smarter with the list of Item you get and figure out if you can split those in other lists with common actions and execute those in bulk/batch. That would really improve performance.
    – M. Deinum
    Apr 16 at 12:06
  • I would add a "staging" table probably, where its the raw data fetched or normalized for consumption, then process those constantly (bonus if you have a unique key you can use to discard duplicate data), so your scheduled job is just adding changes to the staging table for you to process Apr 18 at 15:55

2 Answers 2

0

Just add boolean variable and check value before each iteration. When its true - there would be no further processing:

private boolean inAction;

@Scheduled(initialDelay = 3000, fixedDelay = 1500)
public void loadUpdates() {
    if (inAction) return;
    
    try {
    inAction = true;
    List<Item> recentItems = apiClient.getUpdatesAfter(lastUpdateAt);
    ...
    } finally {
      inAction = false; 
    }
}
0

you can use Spring Batch, for example you have two api call, you can create two steps in a Spring Batch job Step A will call first api and Step B call second api. With the ability to group steps together within an owning job comes the need to be able to control how the job "flows" from one step to another. For your usecase you can use Sequential Flow scenario job where all of the steps execute sequentially

Features

  • Transaction management
  • Chunk based processing
  • Declarative I/O
  • Start/Stop/Restart
  • Retry/Skip

Spring Batch docs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.