Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(filters): parallelize feed post-processing #61

Merged
merged 2 commits into from
Mar 2, 2024

Conversation

shouya
Copy link
Owner

@shouya shouya commented Mar 2, 2024

I realized the post processing of the fetched html page is a bottleneck for the full_text filter. After profiling, I found:

  • The post processing step in dev mode can take 100ms-400ms.
  • Most of the computation were hidden inside tendril and html5ever libraries, which is not easily optimized from this crate.
  • Even worse, the post-processing for each feed item was not parallelized. So the latency add up.

So in this patch I moved the compute intensive part into separate tokio tasks so they can be computed at the same time.

For reference, before the change, the sample hackernews full text endpoint [1] takes around 9s to finish. With the request caching, it still takes around 4s. After the change, the uncached version takes 4s and cached version only take less than 1s (on my 8-core machine).

[1]:

  - path: /hackernews.xml
    source: https://news.ycombinator.com/rss
    filters:
      - full_text:
          simplify: true
          append_mode: true

@shouya shouya merged commit 02f038b into master Mar 2, 2024
2 checks passed
@shouya shouya deleted the improve-full-text-filter-performance branch March 2, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant