Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add Content Library Page to the docs #13335

Merged
merged 7 commits into from
Nov 13, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 10, 2024

Looking for help

Does anyone know how to automatically format the content links in the same / similar manner as the Notion page?

Which issue does this PR close?

Related to

Rationale for this change

@SamSynnada created a wonderful list of DataFusion related content here and I think posting it to the DataFusion website would be great

What changes are included in this PR?

Add a new page with the content in the 👉 DF Content Library

Screenshot 2024-11-10 at 3 44 54 AM

Are these changes tested?

N/A

Are there any user-facing changes?

@alamb alamb added the documentation Improvements or additions to documentation label Nov 10, 2024

- **2020-02-27**: How Query Engines Work [Online Book](https://andygrove.io/2020/02/how-query-engines-work/)

## ✨ Good Reads
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering of the notion page is much nicer:

Screenshot 2024-11-10 at 3 51 14 AM

I started trying to replicate the formatting with ChatGPT but it still needs cleaning up.

Here is the raw markdown from Notion (when I exporte the notion site as markdown):

📚 DF Content Library

🧭 Foundational Contents

✨ Good Reads

📅 Release Notes & Updates

🌎 Community Events

Source

# 📚 DF Content Library

# 🧭 Foundational Contents

- **2024-06-13** 2024 ACM SIGMOD International Conference on Management of Data Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (talk) [slides](https://docs.google.com/presentation/d/1gqcxSNLGVwaqN0_yJtCbNm19-w5pqPuktII5_EDA6_k/edit#slide=id.p), [recording](https://youtu.be/-DpKcPfnNms), [paper](https://dl.acm.org/doi/10.1145/3626246.3653368)
- **2024-06-07**  https://www.youtube.com/watch?v=-DpKcPfnNms&t=5s
- **2023-04-05** The Apache Arrow DataFusion Architecture Part 3: Physical Plan and Execution. [slides](https://docs.google.com/presentation/d/1cA2WQJ2qg6tx6y4Wf8FH2WVSm9JQ5UgmBWATHdik0hg), [recording](https://youtu.be/2jkWU3_w6z0)
- **2023-04-04** The Apache Arrow DataFusion Architecture Part 2: Logical Plans and Expressions. [slides](https://docs.google.com/presentation/d/1ypylM3-w60kVDW7Q6S99AHzvlBgciTdjsAfqNP85K30), [recording](https://youtu.be/EzZTLiSJnhY)
- **2023-03-31** The Apache Arrow DataFusion Architecture Part 1: Query Engines. [slides](https://docs.google.com/presentation/d/1D3GDVas-8y0sA4c8EOgdCvEjVND4s2E7I6zfs67Y4j8), [recording](https://youtu.be/NVKujPxwSBA)
- **2020-02-27** https://andygrove.io/2020/02/how-query-engines-work/

# ✨ Good Reads

- **2024-10-16** https://www.letsql.com/posts/candle-image-segmentation/
- **2024-09-23 → 2024-12-02** [Carnegie Mellon University: Database Building Blocks Seminar Series - Fall 2024](https://db.cs.cmu.edu/seminar2024/)
    - **2024-10-28** https://www.youtube.com/watch?v=fltZMO8EGl0&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=6
    - **2024-10-21** https://www.youtube.com/watch?v=tyM-ec1lKfU&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=5
    - **2024-10-07**  https://www.youtube.com/watch?v=Vxb8TELNM98&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=4
    - **2024-09-23** https://www.youtube.com/watch?v=iJhRbDFJjbg&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=2
    - **2024-09-30** https://www.youtube.com/watch?v=o59s0d3HE1k&list=PLSE8ODhjZXjZc2AdXq_Lc1JS62R48UX2L&index=3
- **2024-09-17** https://www.youtube.com/watch?v=2z11xtYw_xs
- **2024-08-25** [Pydantic/logfire: We're changing database](https://github.com/pydantic/logfire/issues/408)
- **2024-08-15** https://www.youtube.com/watch?v=RVLshX6fbds
- **2024-08-14** https://uwheel.rs/post/datafusion_uwheel/
- **2024-06-17**   https://blog.lancedb.com/columnar-file-readers-in-depth-apis-and-fusion/
- **2024-06-14** [2024 Simplicity in Management of Data (SiMOD)](https://sfu-dis.github.io/simod/) DataFusion: The Case for Building Open Data Systems (keynote) [slides](https://docs.google.com/presentation/d/1K3EdknzkqU2LhWi_eNKXdcvNk0OEvk9AqTLqhZkPxuI/edit)
- **2024-05-29** https://cube.dev/blog/query-push-down-in-cubes-semantic-layer
- **2024-03-26 → 2024-06-26** Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion
    - **2024-06-26** [Microsoft Gray Systems Lab:](https://www.microsoft.com/en-us/research/group/gray-systems-lab) Building InfluxDB 3.0 (and other systems) without starting from "scratch" with Apache DataFusion [slides](https://docs.google.com/presentation/d/1a4wHZij_69drdmD32TPombQ9zSaE6l26LZ87DAz2New/edit#slide=id.p)
    - **2024-03-26** [DataCouncil 2024:](https://www.datacouncil.ai/talks24/building-influxdb-30-with-apache-arrow-datafusion-flight-and-parquet?hsLang=en) Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet. [slides](https://docs.google.com/presentation/d/12kdYHLyH79B5__9xs3de_hZyG9geW4jC3vUpiy39VA0), [recording](https://www.youtube.com/watch?v=I-Z7kFGsYRI)
- **2024-03-20**  https://www.youtube.com/watch?v=P3dXH61Kr5U
- **2024-03-18** https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/
- **2023-10-25**  https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/
- **2023-09-26**  https://www.kamu.dev/blog/2023-09-datafusion-flightsql/
- **2023-08-15**https://www.synnada.ai/blog/running-window-query-in-stream-processing
- **2023-08-05** InfluxData: Aggregating Millions of Groups Fast in Apache Arrow DataFusion. [InfluxData](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/), [DataFusion](https://arrow.apache.org/blog/2023/08/05/datafusion_fast_grouping/).
- **2023-07-28**https://www.synnada.ai/blog/sliding-window-hash-join-swhj
- **2023-07-13**https://www.synnada.ai/blog/probabilistic-data-structures-in-streaming-count-min-sketch
- **2023-05-25**  https://www.youtube.com/watch?v=NEL6DluUxgw
- **2023-02-20**https://www.synnada.ai/blog/general-purpose-stream-joins-via-pruning-symmetric-hash-joins
- **2023-02-15 → 2023-09-27** Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust
    - **2023-09-27** MIT Database Group: Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. [slides](https://docs.google.com/presentation/d/1_JXxapY2jksCOm5hePK8FIjO3buDzsrBBy0jUEpJR4A)
    - **2023-06-02** [[Dutch Seminar on Database System Design]](https://dsdsd.da.cwi.nl/past_talks/post_talks/Andrew-Lamb/): Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust. [slides](https://docs.google.com/presentation/d/1XTsO2zsHkgBCF6C0YVwk0BnhZzLBrm39oeapOBb-s9A), [recording](https://youtu.be/Y5K2Ik2oo-8)
    - **2023-02-15** [Invited Talk at Optum Labs]: Building a new time series database "from scratch" Using Apache Arrow, Parquet, DataFusion and Rust [slides](https://docs.google.com/presentation/d/1SzqgTtSKVqpuFUDdOHhRNC3mLmJ7oyVp0OyrYwHvgPA),
- **2023-01-01** https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- **2022-12-07**  https://www.influxdata.com/blog/querying-parquet-millisecond-latency/
- **2022-06-27** [DataBricks Data+AI Summit]: DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine. [slides](https://docs.google.com/presentation/d/1wLORMn23RD_sQ84W2w51s-Xysly5S8F5mGXzaeJ4QWY), [recording](https://www.databricks.com/dataaisummit/session/datafusion-and-arrow-supercharge-your-data-analytical-tool-rusty-query-engine)
- **2022-05-23** [The Data Thread 2022]: Apache Arrow and DataFusion: Changing the Game for Implementing Database Systems. [slides](https://docs.google.com/presentation/d/1Tkjfup5z_nsrBWIO7dXscEzC5toTQCXj0IsZeO3endc), [recording](https://www.youtube.com/watch?v=rb61lVH2vYc)
- **2021-03-10** [InfluxData Tech Talk]: Query Engine Design and the Rust-Based DataFusion in Apache Arrow. slides ([Google Slides](https://docs.google.com/presentation/d/1z_bmjqQk_WKsyQMfmIYssjJNYwLEkjGcoCAsv8D7XO0), [Slideshare](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)), [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU)

# 📅 Release Notes & Updates

- **2024-07-24** https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
- **2024-01-19** https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0/
- **2023-06-24** https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/
- **2023-01-19** https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/
- **2023-01-01** https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
- **2022-10-25** https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/
- **2022-05-16** https://arrow.apache.org/blog/2022/05/16/datafusion-8.0.0/
- **2022-02-28** https://arrow.apache.org/blog/2022/02/28/datafusion-7.0.0/
- **2021-11-19** https://arrow.apache.org/blog/2021/11/19/datafusion-6.0.0/
- **2021-08-18** https://arrow.apache.org/blog/2021/08/18/datafusion-5.0.0/
- **2019-09-22** https://andygrove.io/2019/09/datafusion-0.15.0-release-notes/

# 🌎 Community Events

- **2025-01-15** (Upcoming) [Boston Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/13165)
- **2024-12-18** (Upcoming) [Chicago Apache DataFusion Meetup](https://lu.ma/eq5myc5i)
- **2024-09-27** [Belgrade Apache DataFusion Meetup](https://lu.ma/tmwuz4lg), [recap](https://github.com/apache/datafusion/discussions/11431#discussioncomment-10832070), [slides](https://github.com/apache/datafusion/discussions/11431#discussioncomment-10826169), [recordings](https://www.youtube.com/watch?v=4huEsFFv6bQ&list=PLrhIfEjaw9ilQEczOQlHyMznabtVRptyX)
- **2024-06-26** [New York City Apache DataFusion Meetup](https://lu.ma/2iwba0xm). [slides](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7)
- **2024-06-25** [San Francisco Bay Area Apache DataFusion Meetup](https://lu.ma/6bphole2). [slides](https://docs.google.com/presentation/d/1Oz2yGllrWBkNGyiRMLr8qXTt4vmvtJWuI_weGThaZak/edit#slide=id.g26bebde4fcc_3_7)
- **2024-03-27** [Austin Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/8522). [slides](https://docs.google.com/presentation/d/1S51TK8waxHEJaxi_-uiSMrgQZ09m_hfaasPk5X5ExEY), [recording](https://www.youtube.com/watch?v=q1N3pH3tFw8)

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, The only proposition is to rename Content Library to Database Concepts Library or similar

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

Sounds good -- thanks @comphead -- I will try and get this first draft in the next day or two

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

oh yeah, The only proposition is to rename Content Library to Database Concepts Library or similar

After thinking about this I changed the name to "Concepts, Readings, Events" which while a bit verbose I think gets the point across in a better way.

I personally think the docs are looking pretty good now:

Screenshot 2024-11-13 at 10 03 39 AM

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

Let's merge this one in and we can continue iterating on the content as follow on PRs

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

Thanks again @comphead and @SamSynnada

@alamb alamb merged commit 4b5e374 into apache:main Nov 13, 2024
5 checks passed
@alamb alamb deleted the alamb/datafusion-docs branch November 13, 2024 17:27
- **2024-06-26** [New York City Apache DataFusion Meetup](https://lu.ma/2iwba0xm). [slides](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7)
- **2024-06-25** [San Francisco Bay Area Apache DataFusion Meetup](https://lu.ma/6bphole2). [slides](https://docs.google.com/presentation/d/1Oz2yGllrWBkNGyiRMLr8qXTt4vmvtJWuI_weGThaZak/edit#slide=id.g26bebde4fcc_3_7)
- **2024-03-27** [Austin Apache DataFusion Meetup](https://github.com/apache/datafusion/discussions/8522). [slides](https://docs.google.com/presentation/d/1S51TK8waxHEJaxi_-uiSMrgQZ09m_hfaasPk5X5ExEY), [recording](https://www.youtube.com/watch?v=q1N3pH3tFw8)
- **2024-03-26** [Seattle Apache DataFusion Meetup](
Copy link
Member

@jonahgao jonahgao Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event seems incomplete and lacks links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Nice catch. -- PR to fix: #13445

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants