Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dask_awkward support to fastjet #189

Merged
merged 16 commits into from
Apr 14, 2023
Merged

add dask_awkward support to fastjet #189

merged 16 commits into from
Apr 14, 2023

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Mar 28, 2023

Fixes #188

Bindings appear to be pretty straightforward at the end of the day.

The trick we use to make delayed operations work is to construct a new ClusterSequence for each concrete array being processed.

Within DaskAwkwardClusterSequence we dispatch the function to be called and the JetDefinition.

There are a large number of functions to convert, but the majority of them seem pretty straightforward.

@lgray
Copy link
Collaborator Author

lgray commented Mar 28, 2023

@chrispap95 OK - sorry for the noise here while I beat it into shape. This is all ready for completion now. Let me know if you have any questions!

There are perhaps a few places where it's not a trivial map_partitions but I haven't thought about it too hard.

@lgray
Copy link
Collaborator Author

lgray commented Mar 28, 2023

We probably also need to turn on the thread safety options for fastjet now, since dask's default scheduler is threaded.

@lgray
Copy link
Collaborator Author

lgray commented Apr 5, 2023

@chrispap95 ok - finished this up. It would be super useful to try this out with some user analysis and derive some tests before we merge it.

@lgray
Copy link
Collaborator Author

lgray commented Apr 5, 2023

@andrzejnovak FYI

@chrispap95
Copy link
Collaborator

@lgray I can run the SUEP workflow on it. It is using only inclusive_jets and constituents though. Let me try to set up the environment and I will reply back.

@lgray
Copy link
Collaborator Author

lgray commented Apr 5, 2023

@chrispap95 cool - dask-based NanoEvents in the awkward2_dev branch of coffea should work for PFNano so you may just be able to start from that.

@lgray
Copy link
Collaborator Author

lgray commented Apr 5, 2023

@jmduarte I know you guys are using some of the lund plane functions and stuff - if you have some time could you give this new software stack a try?

@lgray
Copy link
Collaborator Author

lgray commented Apr 8, 2023

There are some bugs in this that need fixing:

  • it won't touch data correctly
  • it's presently not handling the cases where the function has additional array arguments

@lgray
Copy link
Collaborator Author

lgray commented Apr 13, 2023

OK - this is going to need some work now that I've got some tests running on a file.

Update: I have a janky fix that seems to work on files well.

@chrispap95 can you make a 2 event sample of that PFNano file you pointed me to so that I can make proper tests?
There's a lot of gotcha's in dask_awkward related to when you finally start reading from a file (and doing so efficiently).

@lgray
Copy link
Collaborator Author

lgray commented Apr 13, 2023

@jpivarski Do you think this is reasonably robust enough? I checked in the code that deeper within all that is ever asked for is E/px/py/pz, but it all looks sorta sketchy IMO.

@lgray lgray requested a review from jmduarte April 13, 2023 13:33
@lgray
Copy link
Collaborator Author

lgray commented Apr 13, 2023

uuughhh