Skip to content

Commit

Permalink
Add a PySpark example in the quickstart (#379)
Browse files Browse the repository at this point in the history
  • Loading branch information
JosepSampe committed Aug 5, 2024
1 parent 522c59c commit ebd0b00
Showing 1 changed file with 48 additions and 0 deletions.
48 changes: 48 additions & 0 deletions docs/Quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,54 @@ Sampling has also an operator called `TABLESAMPLE`, which can be expressed in bo
spark.sql("SELECT avg(price) FROM ecommerce_qbeast TABLESAMPLE(10 PERCENT)").show()
```

## Python

Similar to Scala and SQL, you can also use PySpark to operate on Qbeast tables.
Here, you can find an end-to-end example following the same operations demonstrated earlier on this page.

```python
from pyspark.sql import SparkSession

# Initialize Spark session with Qbeast and Delta support, running locally
spark = SparkSession.builder \
.appName("QbeastExample") \
.master("local[*]") \
.config("spark.jars.packages", "io.qbeast:qbeast-spark_2.12:0.6.0,io.delta:delta-spark_2.12:3.1.0") \
.getOrCreate()

qbeast_table_path = "/tmp/qbeast-test-data/qtable"

# Creating a temporary view for the ecommerce data of October
ecommerce_october = spark.read \
.format("csv")\
.option("header", "true") \
.option("inferSchema", "true") \
.load("src/test/resources/ecommerce100K_2019_Oct.csv")

# Creating a Qbeast table using PySpark
ecommerce_october.write \
.format("qbeast") \
.mode("overwrite") \
.option("columnsToIndex", "user_id,product_id") \
.option("cubeSize", "500") \
.save(qbeast_table_path)

# Reading the ecommerce data for November
ecommerce_november = spark.read \
.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load("src/test/resources/ecommerce300k_2019_Nov.csv")

# Inserting data into the Qbeast table using PySpark
ecommerce_november.write \
.format("qbeast") \
.mode("append") \
.save(qbeast_table_path)

sampled_data = spark.read.format("qbeast").load(qbeast_table_path).sample(0.10)
sampled_data.selectExpr("avg(price)").show()
```

## Analyze and Optimize

Expand Down

0 comments on commit ebd0b00

Please sign in to comment.