Skip to content
View VillePuuska's full-sized avatar

Block or report VillePuuska

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
VillePuuska/README.md

Ville Puuska

Experience

  • 2023- Data Engineer, Solita
  • 2017-2023 PhD student/researcher, Tampere University

Data engineering

Interests

  • Portable/headless data platforms
  • Event driven architectures and data pipelines
  • Streaming data pipelines

Tech at work

  • Python, PySpark, Spark SQL, R when absolutely necessary
  • Azure Data Factory, Databricks

Tech at home

  • Python (and a bit of Go, Rust, and Scala)
  • Airflow, Docker, DuckDB, FastAPI, Kafka, Polars

Mathematics

Research and Publications

My research is focused on the algebraic theory of topological data analysis. I'm interested in utilizing (minimal) resolutions to develop computable and interpretable representations and invariants for multiparameter persistent (co)homology and persistence modules more generally.

Education

  • 2017-2023, PhD, Mathematics, Tampere University
    Advisor: Professor Eero Hyry, Tampere University
    Field: Topological Data Analysis
    Thesis: Flat Covers and Cotorsion in Persistence https://urn.fi/URN:ISBN:978-952-03-3058-3
  • 2013-2017, MSc (and BSc), Mathematics, University of Tampere

Pinned Loading

  1. Local-Lakehouse Local-Lakehouse Public

    PoC Python package for using Unity Catalog OSS to manage local structured data and accessing it via a Polars DataFrame API and DuckDB SQL API.

    Python

  2. Journeys-pipeline-dlt-DuckDB-Polars Journeys-pipeline-dlt-DuckDB-Polars Public

    Simple example of an ELT pipeline using dlt for ingesting from the JourneysAPI, DuckDB for intermediate storage, and DuckDB & Polars for transformations.

    Python 3

  3. Streaming-and-processing-CPU-and-RAM-usage Streaming-and-processing-CPU-and-RAM-usage Public

    Python

  4. Spotify-cli Spotify-cli Public

    Simple CLI tool for managing Spotify playback and generating recommendations with Spotify's recommendations API.

    Rust

  5. DuckDB-examples DuckDB-examples Public

    Basic tutorial and example scenario for using DuckDB

    Jupyter Notebook

  6. AoC AoC Public

    Advent of Code solutions

    Jupyter Notebook