Speed up augur filter without replacing Pandas #1573

victorlin · 2024-08-09T00:20:25Z

Context

See parent issue for context on how Pandas is used in augur filter and why it is slow.

There are some potential optimizations to the current code without a full rewrite that's necessary with #1574.

Progress

Not pursued

filter: Rewrite priority queue logic with pandas functions #809

The text was updated successfully, but these errors were encountered:

victorlin · 2024-11-16T05:05:43Z

Another potential speedup here is to leverage the pyarrow+pandas integration. This should be more mature with pandas v2. Pandas is pushing more in this direction as well, slated to make pyarrow a required dependency in v3.

Unfortunately, it's not as simple as setting engine='pyarrow'. I tried briefly with 11743ac. If we want to go down this route, it might be best to convert the metadata TSV to parquet upfront, which would require rewriting some logic (I'm not sure how much). Previous discussion on using parquet for metadata: (1, 2)

victorlin added the enhancement New feature or request label Aug 9, 2024

victorlin mentioned this issue Aug 9, 2024

Speed up augur filter #1575

Open

victorlin mentioned this issue Jan 13, 2025

filter: Rewrite priority queue logic with pandas functions #809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up augur filter without replacing Pandas #1573

Speed up augur filter without replacing Pandas #1573

victorlin commented Aug 9, 2024 •

edited

Loading

victorlin commented Nov 16, 2024

Speed up augur filter without replacing Pandas #1573

Speed up augur filter without replacing Pandas #1573

Comments

victorlin commented Aug 9, 2024 • edited Loading

Context

Progress

Not pursued

victorlin commented Nov 16, 2024

victorlin commented Aug 9, 2024 •

edited

Loading