Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON load times: Brim v0.31.0 vs. Zui v1.6.0 #3000

Open
SoftTools59654 opened this issue Feb 8, 2024 · 5 comments
Open

JSON load times: Brim v0.31.0 vs. Zui v1.6.0 #3000

SoftTools59654 opened this issue Feb 8, 2024 · 5 comments
Labels
bug Something isn't working community

Comments

@SoftTools59654
Copy link

Brim Vs Zui

At the time of importing a file for testing, I imported the same file in Brim and Zui
A 2 GB json file with about 250k records
The import time in Brim was about one minute, but this time in Zui was more than 2 minutes
I imported both files in the same system, but I don't know how standard this is, maybe this problem was only in that file

But I hope you do this test on a standard file to see if there is such a problem or not

@SoftTools59654 SoftTools59654 added the bug Something isn't working label Feb 8, 2024
@philrz
Copy link
Contributor

philrz commented Feb 8, 2024

@SoftTools59654: When you say "Brim" are you talking specifically about Brim v0.31.0 which was the last release of the app when it was still called "Brim"? And for the comparison with Zui are you using the latest GA release Zui v1.6.0? Also, are you on Windows, macOS, or Linux?

@SoftTools59654
Copy link
Author

Yes, that's what I mean. I tested in the same versions that you announced

Of course, maybe my test is wrong, but I tested it several times

The system is Windows

Maybe my review is incorrect, but if you also test, we will get a better result, a large json file with many records. Better shows the difference

@philrz philrz changed the title Brim Vs Zui JSON load times: Brim v0.31.0 vs. Zui v1.6.0 Feb 12, 2024
@philrz
Copy link
Contributor

philrz commented Feb 12, 2024

@SoftTools59654: Since I don't have your specific test data, I attempted to reproduce the symptom using some publicly-available data that has high-level characteristics similar to what you described for your data. However, in my tests the Zui v1.6.0 release performed slightly faster than Brim v0.31.0. Below are the details.

As test data I used some of the hourly GitHub archives. Specifically the ones shown below, which come out to ~1.9 GB and 482k records. So, the size is roughly the same as yours, though this would indicate your records are on average 2x the size of these since you said you had only 250k records adding up to 2 GB.

$ curl -O https://data.gharchive.org/2023-02-08-0.json.gz
$ curl -O https://data.gharchive.org/2023-02-08-1.json.gz
$ curl -O https://data.gharchive.org/2023-02-08-2.json.gz

$ gunzip *.gz

$ du -sh .
1.9G	.

$ zq 'count()' *.json
482272(uint64)

The tests below are on a an AWS EC2 t2.xlarge instance, so 4 vCPU and 16 GB of RAM.

Dragging these into Brim v0.31.0 took ~48 seconds.

Repro-v0.31.0.mp4

Dragging these into Zui v1.6.0 took ~44 seconds.

Repro-v1.6.0.mp4

Of course, the effect you reported may be unique to your data and/or environment. Some questions to help narrow this down:

  1. Would it be possible to share your test data? If not, could you examine the publicly-available data here and see if you can spot key differences between that and your data? Or, if your records are all fairly similar in structure, perhaps you could "anonymize" a sample record and paste it here so I could create 250k random variations based on it?

  2. Could you describe the CPU/memory resources of your test environment for comparison?

@SoftTools59654
Copy link
Author

Thanks for the complete article you wrote

It's true, I found the problem in more detail. Although Brim didn't import all the data, it was showing the way with the data and didn't import a few million lines.

The interesting thing is when data is imported in Brim, if even half of the data is imported, it displays half of the data, but zui does not.

Maybe I was wrong in the initial review that there was a difference in speed, but there were differences in speed

I will check it a few more times and let you know the result

@philrz
Copy link
Contributor

philrz commented Feb 21, 2024

@SoftTools59654: Yes, that makes sense. Not only did Brim v0.31.0 allow for "partial loads" when there were input errors, as captured in #2660, Zui also behaved that way until fairly recent releases. I just updated and closed that issue to reflect the current state of Zui. As that issue also covers, even when the "partial loads" were possible, the error messaging was poor so it was basically a bug, though as captured in brimdata/super#4546 perhaps it would be desirable under some circumstances to allow for this behavior when desired (with better error messaging, of course), so we hope to revisit the topic at some point.

I'll hold this issue open in the event you've able to provide any additional detail on your initial repro as you indicated in your last comment. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community
Projects
None yet
Development

No branches or pull requests

2 participants