Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted by time output in unifiedlogs_* scripts #31

Open
cvandeplas opened this issue Nov 19, 2024 · 4 comments
Open

Sorted by time output in unifiedlogs_* scripts #31

cvandeplas opened this issue Nov 19, 2024 · 4 comments

Comments

@cvandeplas
Copy link
Contributor

I'm struggling a bit with the output of the unifiedlog_* example scripts.
The output is currently not time-sorted, which makes post-processing data in some cases not possible.

Considering the size of a logarchive sorting afterwards can be memory and cpu intensive.
I'm wondering if there would be a way to output the data in a sorted way as the native apple tool also exports sorted by time.

With the current logic of iterating over the archive_path types I guess that will not work.
Not being that familiar with the true datastructure behind (and thank you for that, as that's the beauty of your library) I'm not sure if it is really feasible. Same with the trick of handling oversized entries.

Do you think there is a clean way to do so?

@cvandeplas cvandeplas changed the title Sorted by time output in examples scripts Sorted by time output in unifiedlogs_* scripts Nov 19, 2024
@puffyCid
Copy link
Collaborator

I think it may be possible to sort by time if we sort by filename before iterating through the tracev3 files.
@jrouaix actually recently had similar idea in a recent PR.

If we sort the tracev3 files by filename in Persist, Special, SignPost, HighVolume directories, that could provide a way to sort by time

@cvandeplas
Copy link
Contributor Author

I fear it's more complex than this. A quick look at the output shows line numbers of sorted output vs

sorted unifiedlogs_iterator
line 1 line 84311
line 2 line 84236
line 3 line 84322
line 4 line 84312

At some point I thought that the LogIterator was maybe not providing the chunks in the right order, as the data is nicely sorted within each chunk, but looking at the table above that doesn't seem to be the case.

@cvandeplas
Copy link
Contributor Author

Note that above numbers are on a logarchive that only has the logdata.LiveData.tracev3 file, not the HighVolume and others.

@puffyCid
Copy link
Collaborator

hmm, yes you are correct, ordering by filename will not 100% solve the time sorting issue.
I will need to think on this a bit more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants