-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using CSV became extremly slow #324
Comments
Version 2.5 loads fast on 0.6 (1.4s), version 3.1 loads slow on 0.7 (4s), version 4.1 loads extremely slow (8.8s). And I had a version from master about 2 months ago that was loading in 0.8s. |
The codebase has changed a lot in the last few months, so it's probably just due to that. |
Actually that's intentional: CSV reads a dummy file in Lines 244 to 250 in 8e6a5bc
I'm not really it's a real win. In particular, that means that packages which depend on CSV will pay the compilation price even if they don't call it in a given session. |
We are mainly do flight testing of wind drones. After each flight (they are usually short, like 10 min) we analyse the csv log file, which is not small, because we are logging about 120 values at 200 Hz. With Julia 0.6 loading a log file and preprocessing it takes about 20s on my laptop. I will create a more realistic test for loading CSV and loading the data to see how much slower the current version of CSV is compared to the 0.2.5 version that I used with Julia 0.6. |
Then it shouldn't make a difference for you whether compilation happens when loading the package or when reading a file for the first time. In the end the total time should be similar. You can easily check that by removing the call I linked to above. |
Doing benchmarking with real data, total time for loading the needed packages, loading the CSV data and analysing it:
So I would also like to benchmark the current CSV version, but it is not compatible with my code. Need to check why. |
The CSV.jl-induced regression is at least fixed on master now. |
607: stop running functions in init r=CarloLucibello a=KristofferC Zygote currently does the same thing as CSV used to do (JuliaData/CSV.jl#324) which is to run some representative functions in `__init__` to make the first call look faster. In reality, this just shifts the latency in the first call to package load time. The problem is that Zygote can be loaded in a Julia session without necessarily getting called. In those scenarios, users have to pay the compilation cost anyway which makes it more costly to have Zygote as a dependency. Co-authored-by: Kristoffer <kcarlsson89@gmail.com>
With the latest version of CSV the load time on Julia 1.0.1 and Julia 0.7 increased 10 times:
Same machine:
It used to be faster on 0.7 then on 0.6.
Any idea?
The text was updated successfully, but these errors were encountered: