Longer running background process (synching huge amounts of data) - general pointers?


#1

I am working with an API that is used to sync bigger amounts of data (JSON arrays mostly, later maybe also images) to the device. Unfortunately there are lots of endpoints to call to be able to collect all required data, so it is quite a complicated process.

How should I implement such a “background process” in general?

The user should be able to use the app as normal, and the views will handle the missing data just fine - I am just not sure how I can make sure the process is started and then running until it is finished, even if the app is closed while it is running, the user stops using the app etc.

I am thankful for any pointers or links that help. Thank you.


#2

This is more possible if you are writing a PWA for a desktop browser platform. If you want an app that can run on a handheld device, the device’s operating system probably won’t allow this, because of malware concerns. From the point of view of Android or Apple, if a user stops an app, the app should completely stop. Also, even if you find a way that works in the current OS, the next version of an OS might break your app, because companies are moving toward more security, not less.

This might be as much a user experience problem as a data problem, because you need to keep the user interested enough so the app remains open for long enough for the loading process to complete. So you might have to wireframe pages in relation to what data is most important to arrive first. Make sure there’s at least five minutes of work to do the first time someone opens the app (or whatever time window you need).

Also, look at what you really need to download. If there are lots of high-resolution images, only download tiny thumbnails of those images, and download the high-res version only if the user clicks on a thumbnail. General concept: create extra data artifacts that are small, whose only purpose is to improve user experience, and then download the big, heavy data only when it is requested.


#3

Sorry, my rambling message sent your down a wrong path, I provided too much unimportant detail:

The process only has to be active when the app is running, I just have to make sure that stuff doesn’t break when the app is restarted in between. Actually, in the first version I am even ok with it displaying a note “Please don’t restart the app now or you will have to start from scratch” and only saving everything when the process is done.

Same for the empty state / wireframe thoughts. I can of course do that, but in a first version I am okay with a modal or overlay that says “syncing data”. The user wants the data to be there, so this is not perfect UX, but a worth complexity tradeoff for the first version.

The data is actually a set of entries in a database (simplified) that is delivered in JSON from the API. I have to call one endpoint to get a list of foo ids, then many calls to the foo endpoint as there is not foos endpoint to get many at once.


#4

Observable.concatAll()

If you want to query foo1 only once, even if the app stops and restarts later, one way is to store the result of foo1 in local storage, along with a boolean flag that says foo1 has already been queried.


#5

I’ll take the chance and ask the newb question:
Why Observables and not Promises?


#6

Ok, that makes sense. Is there a library that takes care of that (basically a queue for worker distribution) or should this be implemented manually?


#7

One or the other might be easier, depending on your API. If you can choose, though, Observables feel like a better fit for your problem statement. (Though I’ve misunderstood your problem at least once already so who knows!) The issue is that you have more control over error handling with Observables. For example, if there’s an error with an Observable, you can set your Observable to try again in five minutes. If a Promise rejects, the only thing you can do is return Reject. This also means that handling errors in Observables is more subtle and difficult than what most synchronous programmers are used to. So using Observables would probably require more of a learning curve. That was certainly true for me, and I’d say I’m only about 75% there.

Regarding your other question, there might be a library, I don’t know. I’d implement it manually with an immutable Map<fooID, dataReceived>, and write to the Map as data arrived, simultaneously storing it in local storage. I don’t think that step is hard. The hard part to what you describe is handling errors. Make sure you always get everything even if the internet connection is severed at the worst possible time.


#8

Got it.

Yes, the stuff I google for the call you posted screams “I am smarter than you!” at me when I look at it: https://www.learnrxjs.io/operators/combination/concatall.html :smiley: :smiley: :blush:[quote=“AaronSterling, post:7, topic:97312”]
The hard part to what you describe is handling errors. Make sure you always get everything even if the internet connection is severed at the worst possible time.
[/quote]

Yes, there will be many edge cases. I don’t even want to think about the “User restarted device while sync was running” and similar stuff. Everything will break :wink: That’s also why I hoped someone had developed the standard library for handling this.


#9

People sort of have, but it’s called “backend as a service.” With Firebase for example, if an Observable fails for a non-fixable reason, like not having the database permission to read from a location, the query throws an error. Otherwise, the query fails silently and the Observable keeps trying again. Other pay-as-you-go databases have similar setups.

To learn Observables, start here. Then watch whatever videos you can find by Andre Stalz (who wrote that) or Brian Troncone (the learnrxjs guy you linked to). It’s worth it.