-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uproot.iterate
throws a pandas
PerformanceWarning
#1070
Comments
The issue only appears in ROOT files with many branches.
|
Some additional information:
|
Some history of the uproot5/src/uproot/interpretation/library.py Lines 787 to 803 in 63bb43f
It was added in #281 to solve #277. It was one of many different attempts to construct a Originally, that function was for a special case, but in #734, it became the only way. (Kush removed the complex code that "exploded" ragged arrays into columns with @ioanaif is investigating |
Appending to a dataframe in pandas has O(n^2) complexity because in each iteration a new dataframe is created and the data is copied over. Thus the call to pandas_memory_efficient is not needed and indeed it raises a performance warning when a lot of columns are involved. With pandas 2.1.4 the best course of action is to build the dataframe directly from a dictionary of arrays and list of column names. We already build these two in _pandas_only_series Enlarging a dataframe through
is much slower than constructing a dataframe once: |
I am using the latest uproot version 5.2.0.
When reading a ROOT file with
uproot.iterate
intopandas.DataFrame
, a PerformanceWarning is triggered:The issue seems to be in the function
_pandas_memory_efficient
in the fileinterpretation/library.py
.The text was updated successfully, but these errors were encountered: