Quick start guide for new users working with Hive partitioned data in R or Python.
generate_test_data.R
standalone script to make some sample dataDockerfile
instruction set for building the data generator containergenerate_test_data_container.R
container script to make some sample data
To create the data in a container using Podman or Docker:
- Build the container
podman build -t demo_data .
- Run the container
podman run demo_data
- Check the container hash
podman ps --all
- Fetch the sample data from the container with:
podman cp <container hash>:/home/r-environment/test_data.tar.gz test_data.tar.gz
- Decompress the archive and try the examples.
R_hive_example.qmd
quarto doc with R code chunksR_hive_example.md
read this file in Github for the worked examples- python example - WIP
If you're interested in some Arrow benchmarks, check out this repo: https://github.com/mikerspencer/arrow_test/, which was used for an EdinbR talk.