-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: final refactoring for Forth generation #749
Conversation
This PR solves issue #723 , |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this on our weekly Zoom call. It's great, and the project is done. Congratulations!
Oh wait—one more thing: AwkwardForth should be used by default now. Whenever a new AsObjects is constructed, it gets a |
e946dea was the last step needed for this PR, apart from getting the tests to work again. The last error in the logs (one of 6 errors total) was due to some AwkwardForth cases being declared as NotImplementedError. In general, NotImplementedError for AwkwardForth can be replaced with turning off the Forth generation and proceeding with the old code path. So you don't have to handle more Forth cases, just ensure that it gives up appropriately if it can't proceed. That will "hide" these cases, since they're no longer revealed by NotImplementedError, which trades the good thing of getting Uproot to run for the bad thing of not knowing which code path was taken in a given case. In practice, this will mean that some data types will still have the old deserialization speed and it becomes harder to debug performance issues. Does the "give up" flag leave any permanent indicator on the AsObjects object, so that we have something to check if a particular deserialization seems to be very slow? By the way, in the case that I looked at, TClonesArray, it would be very hard to write Forth. TClonesArray is a messy, complex thing. But 9a7845e was a good thing to have fixed. So I guess not all of these are things to be skipped. (I only looked at the last one in the logs.) |
Hey, so some of the tests are failing because of the difference is Awkward Array layout. I will have to figure out those and there's one that seems to be reading strings incorrectly. From what I expected, these errors are much more tame. |
A different layout is significant. I guess my shortcut of checking only the last failure was misrepresentative. When you're dealing with the NotImplementedError, be sure to do all of the cases in which the AwkwardForth is deliberately not implemented. I'm going to call the TClonesArray one hopeless, as are any with memberwise serialization, but if one of them looks like it could be easy and just fills in a case we didn't have a test for before, then at least put a note in a comment that such-and-such a test reaches that part of the code and could be implemented someday. |
I should have expanded on the comment. The difference that I spotted was a spelling mistake and some layout parameters that should not have been there. The difference in layout is not in the logical structure (at least I haven't spotted any). |
With the latest Awkward Array, all tests pass except for the two that you knew about. |
for more information, see https://pre-commit.ci
src/uproot/containers.py
Outdated
uproot._util.awkward_form(self._values, file, context), | ||
), | ||
None, | ||
parameters={"__array__": "sorted_map"}, | ||
), | ||
parameters={"__array__": "sorted_map"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one level of this should be "__array__": "sorted_map"
. I remember we were talking about this in one of our meetings, and thought about the consequences of both choices. Putting the "__array__": "sorted_map"
in with the RecordForm, rather than out with the ListOffsetForm, allows for the possibility that a whole array can be one sorted map, something we'll likely want even if it's not what we have here. (Someone could take this array
and do array[0]
to get only one list of key-value pairs, and that single list should have map-like semantics someday.)
uproot._util.awkward_form(self._values, file, context), | |
), | |
None, | |
parameters={"__array__": "sorted_map"}, | |
), | |
parameters={"__array__": "sorted_map"}, | |
) | |
uproot._util.awkward_form(self._values, file, context), | |
), | |
None, | |
parameters={"__array__": "sorted_map"}, | |
), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might affect the test (src/uproot/streamers.py line 285). In which case, the expected test result can be changed again.
No description provided.