-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI crashes on JSON files > ~256MB #17
Comments
Thanks for the much detailed bug report. I can confirm this issue on Linux, too. I'll have a look. |
The problem is in the concat-stream package's I verified this by instead using the JSONStream npm package to read and parse the input stream into a js object (also skipping the Anyway JSONStream is great, I've used it for several other projects and it totally solves this issue. It's slightly slower than the native JSON.parse() but with the huge advantage that you never need to have all of the JSON document text in memory at once. |
If you want to try out JSONStream, here's the code I added: if (format === 'json') {
(filename ? fs.createReadStream(filename) : process.stdin).pipe(JSONStream.parse())
.on('root', function(data) {
convert(data); // Note that data is an object, not a Buffer, so convert needs to test for that
})
.on('error', function(err) {
console.error("ERROR: JSON input stream could not be opened or parsed.")
process.exit(1);
});
} else {
(filename ? fs.createReadStream(filename) : process.stdin).pipe(concat(convert));
} |
Actually, the bottleneck here was the
Yes, this seems like the way to go. I've also tried that and it works very well (with the only minor flaw of some rounding errors).
In fact, I do already use such an XML parser (htmlparser2), so the direct streaming approach should be fairly easy to adopt there, too. |
Fixed now in 84dbbcb. Can you give it a try? I tested it with JSON and XML files up to 1.5 GB - worked well with memory consumption up to ~7GB. One needs to run the script with a reasonably set |
@tyrasd, Thanks for the update. I've tested against my "good" and "bad" JSON files and everything works. I've also tested with a ~500MB JSON file and that works as well. I don't currently work with XML data, but I've modified a couple of my Overpass queries to return XML and they work fine.
One of my collaborators wrote this |
Slightly more explanation on how to use
|
Hi great tool, but I've run into what seems like an arbitrary limitation/bug. I found two OSM Overpass output JSON files that differ by a single way with 4 nodes. The smaller one runs without failure. The larger one crashes consistently with:
When the smaller file runs it requires less than 2GB of memory (out of ~15GB available).
The 1 way / 4 node difference is unremarkable, and is not the cause of the problem.
I know this because if I add space characters to the beginning of the smaller "good" file until it is the size of the larger file, it also fails with the above error. So this seems entirely about file size and not content. I have distilled it down to two files which contain the exact same JSON content (the good smaller file from above), but differ by the amount of leading whitespace added (about 240 bytes)
The file that is 268435577 bytes works, and the one that is 268435578 bytes (with one additional leading space character) fails. 256MB is 268435456 bytes, so these files are both slightly larger than that and maybe it's just a coincidence.
I'm running this on a Mac Mini with OS X 10.9.2 and 16GB of RAM. I'm using node.js version 10.26 installed from MacPorts. The osmtogeojson version is 2.0.4 and was installed from npm.
The "good" file can be obtained from Overpass with this command (assuming the OSM database doesn't change between now and when you try it):
This command currently returns a file of 268435336 bytes. If you get more than that try pulling in the bounding box a tiny bit (6th decimal place) until it's less than this amount. That file should convert. Now add about 300 spaces to the beginning of the file until the size is about 268435600 bytes; that one should fail.
I've tried adding these v8 parameters to the node command to increase the memory it will allocate, but it doesn't help (I regularly run node scripts using these parms that consume 10+ GB of memory without any problems):
Let me know if you need any more info.
The text was updated successfully, but these errors were encountered: