Replies: 8 comments 75 replies
-
To answer to the bulleted questions:
On the last one, once you get the histograms back you can manipulate and sum them. You can get an idea of the multi-dataset way to do things here, sorry it's just the tests for now. I'll make something more expository after the break: Which uses some of the functions we've put together for handling multiple datasets, test runs, recovering the set of failed files and re-running on that. |
Beta Was this translation helpful? Give feedback.
-
Ok, thanks @lgray! I've created an
This seems like it might be something obvious, but I'm not sure I understand it, sorry. Wondering if you'd have any tips? Is it related to the array being a np array not a dask array? |
Beta Was this translation helpful? Give feedback.
-
Also have another question about another I've run into further into the processor (I've just commenting the xgboost stuff for now.. will try to figure it out later). It's on this line:
The error is this:
So I guess I cannot use
Wondering if @lgray or @nsmith- would have any tips on how to work around this? Is there maybe some different version of |
Beta Was this translation helpful? Give feedback.
-
I have a quick question, with coffea2023 is anything like this needed for processors (using @property
def columns(self):
return self._columns I'm actually not really sure what (if anything) it does (even in coffea 0.7)... but I guess I must have copy pasted it from some example at some point. |
Beta Was this translation helpful? Give feedback.
-
Happy New Year! I have a question about the structure of the output histogram object for the coffea 2023 version of my processor. The way I'd previously been running with coffea 0.7, the output histogram object had a StrCategory axis for samples (e.g. ttH, ttW, etc). This structure was convenient for manipulating and plotting (to do e.g. summing and grouping etc). However, in my initial attempt to migrate to coffea 2023, I'm now passing each dataset one by one to my process function, so I eventually pass an object like The way I can think of to do this with coffea2023 would be to pass something like Anyway, sorry for the long question (and sorry that it's not very well formulated), but I'm wondering if you'd have any thoughts or advice on this? Thanks! |
Beta Was this translation helpful? Give feedback.
-
I have a quick question (probably something obvious that I'm just missing). When I try to print e.g. |
Beta Was this translation helpful? Give feedback.
-
Thank you again @lgray for all of the help with this. I think my analysis code is finally pretty much fully migrated. I should clean some parts up, but as of now everything seems to be working fine for my tests with a single file anyway (yields for the single file agree, and my CI is passing). Next up I will try to run at scale. Just for reference, with coffea 0.7 scaled out with Work Queue, this analysis (which is still fairly preliminary so does not have any systematics yet) was able to turn around in ~20m using ~500 cores. I will talk to @btovar and @cmoore24-24 about how to run with TaskVine, and once we have it running I can post any interesting performance numbers here. But I have one quick question before trying to scale up. @lgray I am wondering if you could explain a bit more about how/where the |
Beta Was this translation helpful? Give feedback.
-
Hi @lgray I'm wondering if there is any type of progress bar that would be available with coffea 2023 (e.g. something similar to the extremely useful progress bars available in coffea 0.7)? |
Beta Was this translation helpful? Give feedback.
-
Hello @lgray, @nsmith- , all,
I'm attempting to migrate this analysis code to coffea 2023. Just for future reference, here is a link to the repo at the specific current commit (so that we have a "pre coffea 2023" to refer against). Thank you to @btovar for pointing me to @cmoore24-24's coffea 2023 processor here. This, along with the references from @lgray (here and here) are what I'm trying to base this migration off of.
Ok, so in the current version of my code, this is where the processor was run. Here are the relevant lines for the iterative executor:
When I naively just attempt to run with coffea2023, the first thing that breaks is from that block (with an error
AttributeError: module 'coffea.processor' has no attribute 'IterativeExecutor'
). So is seems like this block is a good thing to try to focus on first. So, attempting to rewrite this block for coffea2023, I think I need to do something like:(Though of course eventually will have to change things in the process function too, but I wanted to just focus on trying to call it the right way first.) If that's the right direction, then I think what I am stuck on next is
events
. It is not clear to me what this is supposed to correspond to. In the pre-coffea2023 version, we never directly call the processor'sprocess
function ourselves, but I think itsevents
argument just corresponds to the nanoevents object for just the events in the particular chunk.In @lgray's announcement of coffea 2023 (on mattermost here) it was mentioned that the arrays correspond to an entire dataset, not a chunk of a dataset. So I think my questions for now would be:
events
argument should correspond to all of the events in just one dataset? Or would it correspond to all of the events that we are processing in total?events
?)Sorry for the long message, and thank you in advance for any help or tips!
Beta Was this translation helpful? Give feedback.
All reactions