Skip to content

An experimental Athena extension for DuckDB 🐀

License

Notifications You must be signed in to change notification settings

dacort/duckdb-athena-extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

54 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DuckDB Athena Extension

WARNING This is a work in progress - things may or may not work as expected πŸ§™β€β™‚οΈ

Limitations

  • Only the default database is supported
  • Not all data types are implemented yet
  • 10,000 results are returned by default (use maxrows=-1 to return everything)
  • Pushdown predicates are not supported

Getting started

The Athena extension is supported in DuckDB v0.7.0 and up. To install the extension, start duckdb with the unsigned parameter.

> duckdb -unsigned
v0.7.1 b00b93f0b1
D 

The first time you use the extension, you need to install it from a custom repository.

SET custom_extension_repository='d2j9pg7mqm9we6.cloudfront.net/athena/latest';
INSTALL athena;

Then LOAD the extension. You only need to run the INSTALL command once.

LOAD athena;

You can now extract data from tables in your default data catalog.

select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix");

Warning To prevent runaway queries, the extension only returns 10,000 rows by default. If you'd like to return everything, you can add maxrows=-1 as a parameter inside the function.

select * from athena_scan("noaa_gsod_pds", "s3://results-bucket/prefix", maxrows=-1);

Filter pushdown is not yet supported so the extension will scan the entire table.

Note The extension uses your environment variables to figure out region and credentials. Make sure to have your access key/secret set.

Development

  • Clone the repo with submodules
git clone https://github.com/dacort/duckdb-athena-extension.git --recurse-submodules
  • Build
cd duckdb-athena-extension
make release
  • Start up duckdb with the -unsigned parameter and your desired AWS_REGION
AWS_REGION=us-east-1 build/debug/duckdb -unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D 
  • Load the extension
load 'build/debug/extension/duckdb-athena-extension/athena.duckdb_extension';
  • Query a single table, also providing where S3 results are written to
select * from athena_scan('table_name', 's3://<bucket>/athena-results/);

Warning: 10,000 results will be returned by default! Use maxrows=-1 to return the entire table.

D select * from athena_scan("amazon_reviews_parquet");
Running Athena query, execution id: 152a20c7-ff32-4a19-bb71-ae0135373ca6
State: Queued, sleep 5 secs ...
Total execution time: 1307 millis
100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚ marketplace β”‚ customer_id β”‚   review_id    β”‚ product_id β”‚ product_parent β”‚ … β”‚  vine   β”‚ verified_purchase β”‚   review_headline    β”‚     review_body      β”‚   review_date   β”‚ year  β”‚
β”‚   varchar   β”‚   varchar   β”‚    varchar     β”‚  varchar   β”‚    varchar     β”‚   β”‚ varchar β”‚      varchar      β”‚       varchar        β”‚       varchar        β”‚      int64      β”‚ int32 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ US          β”‚ 37441986    β”‚ R2H287L0BUP89U β”‚ B00CT780C2 β”‚ 473048287      β”‚ … β”‚ N       β”‚ Y                 β”‚ Perfect Gift         β”‚ I love giving my s…  β”‚ 140454171422720 β”‚     0 β”‚
β”‚ US          β”‚ 20676035    β”‚ R1222MJHP5QWXE β”‚ B004LLILFA β”‚ 361255549      β”‚ … β”‚ N       β”‚ Y                 β”‚ Five Stars           β”‚ Great gift for out…  β”‚           16170 β”‚  2014 β”‚
β”‚ US          β”‚ 45090731    β”‚ R32ECJRNTB61K8 β”‚ B004LLIL4G β”‚ 307223063      β”‚ … β”‚ N       β”‚ Y                 β”‚ happy birthday card  β”‚ gift cards from Am…  β”‚ 140454171423232 β”‚     0 β”‚
β”‚ US          β”‚ 2207141     β”‚ RLTEU3JZ1IJAA  β”‚ B004LLILDM β”‚ 87389551       β”‚ … β”‚ N       β”‚ Y                 β”‚ Five Stars           β”‚ gracias.             β”‚           16391 β”‚  2014 β”‚
β”‚ US          β”‚ 15258       β”‚ R1ZAX1TN66QOU6 β”‚ B004LLIKVU β”‚ 473048287      β”‚ … β”‚ N       β”‚ Y                 β”‚ easy breezy          β”‚ gift card was sent…  β”‚ 140454171424000 β”‚     0 β”‚
β”‚ Β·           β”‚    Β·        β”‚       Β·        β”‚     Β·      β”‚    Β·           β”‚ Β· β”‚ Β·       β”‚ Β·                 β”‚     Β·                β”‚    Β·                 β”‚             Β·   β”‚    Β·  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 999 rows (40 shown)                                                                                                                                          15 columns (11 shown) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Credits