Skip to content

Commit

Permalink
Merge pull request #4750 from szarnyasg/nits-20250210b
Browse files Browse the repository at this point in the history
Small fixes in the H2O blog post
  • Loading branch information
szarnyasg authored Feb 10, 2025
2 parents e9fbe55 + 22e6d3b commit 7c4703e
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions _posts/2024-06-26-benchmarks-over-time.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,16 +445,16 @@ In version 0.5.1, released September 2022, DuckDB's performance when writing to
As a result, versions 0.2.7 to 0.4.0 use Pandas, and 0.5.1 onward uses Arrow.

On the import side, replacement scans allow DuckDB to read those same formats without a prior import step.
In the replacement scan benchmark, the data that is scanned is the output of the final H20.ai group by benchmark query.
At the 5GB scale it is a 10 million row dataset.
In the replacement scan benchmark, the data that is scanned is the output of the final H2O.ai group by benchmark query.
At the 5GB scale it is a 100 million row dataset.
Only one column is read, and a single aggregate is calculated.
This focuses the benchmark on the speed of scanning the data rather than DuckDB's aggregation algorithms or speed of outputting results.
The query used follows the format:

```sql
SELECT
sum(v3) AS v3
FROM ⟨dataframe or Parquet file⟩
FROM ⟨dataframe or Parquet file⟩;
```

#### Window Functions
Expand Down

0 comments on commit 7c4703e

Please sign in to comment.