I have x tasks to execute. How to execute y in parallel until all are finished?


#1

I have to make lots of HTTP requests for data to an API. I have a service with functions that give me the expected results via promises. My initial thought was to put the requests information in an array, then loop over it and call my service function to get the data, which then writes the data in another array (array2). After the long time the loop will take to get all the data I then write that array2 into storage for further usage.

Those of you that already got all the async stuff will (rightfully) laugh now, as of course this doesn’t work at all: As the promises are aync, the loop is finished in a few miliseconds and an empty array is written to storage. Meanwhile my browser (and debug API server!) almost crashes because of the thousands of HTTP requests I just sent out at the exact same moment. Yeah, async is hard for some of us :blush:

After I realized my mistake I thought about the problem a bit more and realized, that I need to limit the number of calls that are done at the same time. Probably my API service could handle that internally and start a waiting list of calls to it and only x requests are then fire in parallel.

But that of course won’t help with the problem that the loop still finishes instantly and I write my empty array to storag. So I thought that there has to be a way to check if a list of promises is finished. It does, it is called promises.all and “resolves when all of the promises have resolved”. Perfect: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise/all … not: “It rejects with the reason of the first promise that rejects.” :frowning: So this will reject if one request goes wrong. This is not what I want. Also no way to limit the parallelization - promise.all still will DDOS my server and kill the browser.

More googling brought me to async.mapLimit: https://caolan.github.io/async/docs.html#mapLimit Parallel execution, limited to y, collects all the responses, perfect.

But is this the way I want to go? Googling for mor information I of course ended up in the pit of 2015/16 nodejs posts about the different implementations (bluebird?) of everything and … UAH! I have no idea if this will even work and is not a monumental mistake.

So my question (finally):
I have x tasks to execute. What should I use to execute y of them in parallel until they are all finished (some not working should be handled by just continuing, not stopping all work)?


#2

(While closing tabs of my research I found https://github.com/timdp/es6-promise-pool. This looks more fitting to the browser and also uses understandable words to describe what it does. But only ~100 Github stars. Is this what I am looking for?)


#3

I would redesign the API in order to bundle the requests into larger chunks so that you don’t have thousands of them. That will save on connection overhead as well.


#4

To follow up on that comment, it’s almost certain that your data is amenable to preprocessing. Very few natural problems require a thousand distinct independent queries every single time. Instead, there are a handful of choices, and everything else flows from those choices. So if you arranged your database so it had X-many large downloads, even if there was repetition of data in download 1 and download 2, in the long run you’d be saving, because of fewer connections, less likelihood of dropped connection, etc. Redundancy might help you a lot here.


#5

Unfortunately not an option. API design is external and set in stone. I have to make these many requests unfortunately, there is just no other endpoint available. I have to work with what I got :unamused:

I could add a server side wrapper that does what, but then I have a totally different kind of problem as this app normally doesn’t need its own backend at all. So this would add servers, sysadmin, availability to my plate. Not an option.


#6

For now I have this (probably terrible) code that works (in perfect conditions). I realized that I control the resolve/reject of the service methods, so Promise.all actually could work here:

  ids: {};
  
  saveToStorage() {
    let details = [];

	// save ids to storage
    this.storage.set('ids', this.ids).then(ids => {
      console.log('storage ids', ids);

	  // go through ids, return only when all promises inside have resolved
      return Promise.all(ids.map(id => {
        console.log('id', id);

        // get details for id
        return this.dataService.getDetails(id).then(_details => {
          console.log('getDetails', id, _details);

          // save details "out" to variable
          details[id] = _details;
        });
      })).then(result => {
        console.log('all resolved: ', details, result);

        // save details to storage
        this.storage.set('details', details).then(result => {
          console.log('storage details', result);
        });
      });
    })
  }

(ids was filled in an earlier function)

This creates a lot of parallel requests, but is fortunately limit by the numbers of http connections per host of the browser/device :wink: Poor man’s limit :smile:


#7

I fear performance would be unacceptable even if the API was capable of handling all thousands of these requests concurrently simply due to the limitations of mobile networking, let alone how much more awful things would get if you deliberately started throttling things.

I don’t like being a voice of doom, but I am worried that by the time you build this out, you are going to come to the conclusion that it simply won’t work as currently designed.

Maybe if you can describe more of the details of what sort of data is being fetched by what criteria, people can suggest ways to improve things. For example, if this were a bulletin board app, instead of having an HTTP request for each post, you change things so a single request gets the next 20 or 50 posts in a thread.


#8

The API basically can return a list of IDs, and details for one ID. (Of course the real name sfor things are different, but that really is what it is.) I have to have all details for all IDs in my local database to do things with them depending on the details. So I send some parameters to /ids to get the correct IDs, then /details/:id for each of the returned IDs.

The exepcted number of IDs is between 50 and ~250. One of my testing endpoints has 2000 IDs to really stress the app and connection. I know that the app will suck on non Wifi connections, but that’s fine. It also is okay that the “download” process takes some time to finish. I just need a stable way to get this “download” done with enough control over it so I can later handle errors and stuff in non perfect conditions.


#9

I still think a caching application server that is capable of serving all details in one request is the best option here.


#10

Unfortunately not. The additional setup, maintenance and complications (API is rate limited per API, so having all users go through the same one will probably trigger this) forbid this. Just won’t work for this project.