PySpark Playground

Hi there! Welcome to PySpark Playground! This is a collection of PySpark examples I put together while learning Apache Spark. I wanted to document my journey, share my progress, and hopefully help others get started with Spark too.

🌟 Why I Made This

When I started exploring Apache Spark, I realized how powerful it is for working with big data. But, like anything new, it felt a bit overwhelming at first. So, I decided to create simple, hands-on examples that make it easier to understand the basics. This repository is my way of sharing what I’ve learned so far.

📘 What You'll Find Here

This repository includes a short tutorial in the notebook: pyspark_tutorial_with_examples.ipynb. It’s packed with practical examples using an ice cream sales dataset—because who doesn’t love ice cream?

🛠️ What’s Inside:

Getting Started: Learn how to set up Apache Spark and PySpark in Google Colab.
DataFrame Basics: Examples of selecting columns, filtering rows, adding calculated columns, and grouping data.
Popular Functions: Hands-on with PySpark’s most-used transformations, like “withColumn”, “groupBy”, and aggregations (e.g., average, sum).
Unique IDs: See how to generate unique IDs for rows using “monotonically_increasing_id”.
SQL Magic: Combine SQL with PySpark to run custom transformations.
Real-Life Data: Follow along with an ice cream sales dataset for practical use cases.
Extra Tricks: Work with dates, timestamps, and other cool features.

🚀 How to Use This

Clone the repository:

git clone https://github.com/AlefRP/pyspark-playground.git

Open the notebook pyspark_tutorial_with_examples.ipynb in Jupyter Notebook or Google Colab.
Follow the examples, tweak the code, and see what happens—learning by doing is the best way!

📚 My Go-To Resources

While I tried to keep this tutorial as clear as possible, Spark’s official documentation has been a lifesaver for me. If you want to dive deeper, I highly recommend checking it out:

Apache Spark Documentation

🤔 Why You Should Check This Out

It’s beginner-friendly (I’m a beginner too!).
Focuses on hands-on learning with fun, real-world examples.
Shows how to use PySpark in environments like Google Colab.

I hope this helps you get started with PySpark and makes your learning journey a bit easier. Let’s explore big data together!

🤝 Contributions Welcome!

If you have ideas, improvements, or your own examples, I’d love to see them! Feel free to fork this repository and contribute.

📄 License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
pyspark_tutorial_with_examples.ipynb		pyspark_tutorial_with_examples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PySpark Playground

🌟 Why I Made This

📘 What You'll Find Here

🛠️ What’s Inside:

🚀 How to Use This

📚 My Go-To Resources

🤔 Why You Should Check This Out

🤝 Contributions Welcome!

📄 License

About

Releases

Packages

Languages

License

AlefRP/pyspark-playground

Folders and files

Latest commit

History

Repository files navigation

PySpark Playground

🌟 Why I Made This

📘 What You'll Find Here

🛠️ What’s Inside:

🚀 How to Use This

📚 My Go-To Resources

🤔 Why You Should Check This Out

🤝 Contributions Welcome!

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages