Reading large files leads to application crash

I’m making a new app that uses a map as it’s core and the user can import some geographic data from KMLs and KMZs. Most of the time the app is intended to be used without internet connection. The user can import some zip files (KMZ) that contain geographic data which will be used within the map, some of these zips are unzipped in a single text file (KML) that have 300MB+, the data inside these files are then read to be load inside the map.

In the beginning I used Ionic Capacitor Filesystem plugin for reading the this unzipped text file and writing a new modified file in the a specific device’s FS folder, but I started to get some Out of Memory errors while doing this. After some research, I manage to solve the writing problem using a chunk-writing method, which uses way less memory.

My problem now is the reading part, which also crash with out-of-memory error if I try to use the conventional Capacitor read method.

My main problems now are:
1 – Most of the alternatives are using different types of data, like Blob or File object, which are not easy to obtain with Capacitor FileSystem.
2 – Because of this I am forced to do some conversions, which involve reading the file as a whole, which always going to lead to an out of memory crash.

This is what I’m trying to do now, based on this Gist:

readBigFile(file: File, options) {
        var EventEmitter = require("events").EventEmitter

        // file - The file to read from
        // options - Possible options:
        // type - (Default: "Text") Can be "Text" or "ArrayBuffer" ("DataURL" is unsupported at the moment - dunno how to concatenate dataUrls
        // chunkSize - (Default: 64K) The number of bytes to get chunks of
        // Returns an EventEmitter that emits the following events:
        // data(data) - Returns a chunk of data
        // error(error) - Returns an error. If this event happens, reading of the file stops (no end event will happen).
        // end() - Indicates that the file is done and there's no more data to read
        // derivedfrom here http://stackoverflow.com/questions/14438187/javascript-filereader-parsing-long-file-in-chunks

        var emitter = new EventEmitter()

        if (options === undefined) options = { }
        if (options.type === undefined) options.type = "ArrayBuffer"
        if (options.chunkSize === undefined) options.chunkSize = 1024 * 1024 * 2

        var offset = 0, method = 'readAs' + options.type//, dataUrlPreambleLength = "data:;base64,".length

        var onLoadHandler = function (evt) {
            if (evt.target.error !== null) {
                emitter.emit('error', evt.target.error)
                return;
            }

            var data = evt.target.result

            offset += options.chunkSize
            emitter.emit('data', data)
            if (offset >= file.size) {
                emitter.emit('end')
            } else {
                readChunk(offset, options.chunkSize, file)
            }
        }

        var readChunk = function (_offset, length, _file) {
            var r = new FileReader()
            var blob = _file.slice(_offset, length + _offset)
            r.onload = onLoadHandler
            r[method](blob)
        }

        readChunk(offset, options.chunkSize, file)

        return emitter

    }

The problem with this approach is that I need the file as a Blob to read with chunks, but to convert the file to Blob I need to read it first, which return me to out-of-memory error.

I have tried some different approaches too:
1 - A plugin called read-chunk, but wasn’t exactly what I needed, the plugin read a part of the file using a start position and file’s length, at this point I tried to use it to read the file piece by piece, but the write algorithm needs a whole object to be able to write chunk by chunk, and this object need to be Blob or File. That generate one more problem, the read-chunk plugin returns a Buffer that I need to convert before writing, and I can’t convert a single piece and send do the write algorithm, because in the end I will have many files and not a whole file.

2 Likes

AFAIK, this is not going to be easy.

The fundamental problem is that the Capacitor bridge is only capable of passing strings. Any attempt to handle this completely on the JavaScript side is (again, this is just to the best of my knowledge) doomed, because you can’t get a Blob across the bridge.

When I was faced with a vaguely similar situation, but in the opposite direction (writing huge files instead of reading them), I came across diachedelic/capacitor-blob-writer, which (as is described in the attached README) uses a separate httpd to build its own bridge separate from the Capacitor one.

I can’t think of any reason a similar tactic wouldn’t work in the read direction, but nor do I know of anybody who’s tried doing it. I do think, sadly, that you are going to need at least some help on the native side of the Capacitor bridge to pull this off. I don’t think any solution that relies solely on the Capacitor bridge is going to work for you.

The fact that the data is zipped adds additional complexity, because compressed data is generally not amenable to random access reads.

Hopefully somebody more knowledgeable than I (such as @jcesarmobile) will come along and tell me I’m an idiot and there’s a much easier path for you.

3 Likes

Lol, I would never call you idiot.

Sadly you are right about the bridge, it’s not possible to pass complex types such as Blobs.

There is an easy way of getting a Blob from a native file using something likes this: let blob = await fetch(url).then(r => r.blob());, where the url needs to be like _capacitor_file_/native/path/to/file.
But in the end, that would request the whole file to the filesystem and might still end in out of memory.

4 Likes

Amazingly using fetch was enough to solve this problem, at last for now. Currently I’m reading files of 300MB+ in size and I didn’t ran out of memory yet. Even better, the object returned is Blob, which is the exactly what I need for my write-file-in-chunks code. Thank you so much for your help, this was a life saver.

1 Like

Tentatively good news, but if you’re releasing this app on multiple platforms, I would urge you to test them all. The way that browsers handle memory management here isn’t standardized, and I would not at all be surprised to see Chrome and Safari doing this differently.

@jcesarmobile : see this SO discussion for context. Do you have a take on the OP’s followup comment about xhr (and therefore maybe even Angular’s HttpClient) potentially being less inclined to slurp entire files into memory?

this.http.get("_capacitor_file_/native/path/to/file", {responseType: "blob"})
  .subscribe(blob => { wootWoot(blob) };
1 Like

XHR should also work, I used the fetch sample because it was simpler.

I don’t really know how the WKWebView will treat them differently in their internal code, but the request to Capacitor will be the same and capacitor will return the whole file for both, doesn’t matter which one you use.

1 Like