Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve comments on target user and unify intro summaries #12418

Merged
merged 1 commit into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,28 @@

<img src="./docs/source/_static/images/2x_bgwhite_original.png" width="512" alt="logo"/>

Apache DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in
[Rust](http://rustlang.org), using the [Apache Arrow](https://arrow.apache.org)
in-memory format. [Python Bindings](https://github.com/apache/datafusion-python) are also available. DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
DataFusion is an extensible query engine written in [Rust] that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the same as in lib.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is redundant to have the same content in three places I think it is worthwhile as the three places are the three most common "landing" pages for people with DataFusion:

  1. The main website https://datafusion.apache.org/
  2. The docs.rs page: https://docs.rs/datafusion/latest/datafusion/index.html
  3. The github repo: https://github.com/apache/datafusion/

uses [Apache Arrow] as its in-memory format. DataFusion's target users are
developers building fast and feature rich database and analytic systems,
customized to particular workloads. See [use cases] for examples.

"Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs,
excellent [performance], built-in support for CSV, Parquet, JSON, and Avro,
extensive customization, and a great community.
[Python Bindings] are also available.

DataFusion features a full query planner, a columnar, streaming, multi-threaded,
vectorized execution engine, and partitioned data sources. You can
customize DataFusion at almost all points including additional data sources,
query languages, functions, custom operators and more.
See the [Architecture] section for more details.

[rust]: http://rustlang.org
[apache arrow]: https://arrow.apache.org
[use cases]: https://datafusion.apache.org/user-guide/introduction.html#use-cases
[python bindings]: https://github.com/apache/datafusion-python
[performance]: https://benchmark.clickhouse.com/
[architecture]: https://datafusion.apache.org/contributor-guide/architecture.html

Here are links to some important information

Expand Down
24 changes: 14 additions & 10 deletions datafusion/core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,28 @@
#![warn(missing_docs, clippy::needless_borrow)]

//! [DataFusion] is an extensible query engine written in Rust that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified this text and made it consistent with the other intros

//! uses [Apache Arrow] as its in-memory format. DataFusion help developers
//! build fast and feature rich database and analytic systems, customized to
//! particular workloads. See [use cases] for examples
//! uses [Apache Arrow] as its in-memory format. DataFusion's target users are
//! developers building fast and feature rich database and analytic systems,
//! customized to particular workloads. See [use cases] for examples.
//!
//! "Out of the box," DataFusion quickly runs complex [SQL] and
//! [`DataFrame`] queries using a full-featured query planner, a columnar,
//! streaming, multi-threaded, vectorized execution engine, and partitioned data
//! sources (Parquet, CSV, JSON, and Avro).
//! "Out of the box," DataFusion offers [SQL] and [`Dataframe`] APIs,
//! excellent [performance], built-in support for CSV, Parquet, JSON, and Avro,
//! extensive customization, and a great community.
//! [Python Bindings] are also available.
//!
//! DataFusion is designed for easy customization such as
//! additional data sources, query languages, functions, custom
//! operators and more. See the [Architecture] section for more details.
//! DataFusion features a full query planner, a columnar, streaming, multi-threaded,
//! vectorized execution engine, and partitioned data sources. You can
//! customize DataFusion at almost all points including additional data sources,
//! query languages, functions, custom operators and more.
//! See the [Architecture] section below for more details.
//!
//! [DataFusion]: https://datafusion.apache.org/
//! [Apache Arrow]: https://arrow.apache.org
//! [use cases]: https://datafusion.apache.org/user-guide/introduction.html#use-cases
//! [SQL]: https://datafusion.apache.org/user-guide/sql/index.html
//! [`DataFrame`]: dataframe::DataFrame
//! [performance]: https://benchmark.clickhouse.com/
//! [Python Bindings]: https://github.com/apache/datafusion-python
//! [Architecture]: #architecture
//!
//! # Examples
Expand Down
25 changes: 17 additions & 8 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,23 @@ Apache DataFusion
<a class="github-button" href="https://github.com/apache/datafusion/fork" data-size="large" data-show-count="true" aria-label="Fork apache/datafusion on GitHub">Fork</a>
</p>

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in
`Rust <http://rustlang.org>`_, using the `Apache Arrow <https://arrow.apache.org>`_
in-memory format.

DataFusion offers SQL and Dataframe APIs, excellent
`performance <https://benchmark.clickhouse.com>`_, built-in support for
CSV, Parquet, JSON, and Avro, extensive customization, and a great
community.

DataFusion is an extensible query engine written in `Rust <http://rustlang.org>`_ that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be argued that we should move this content out of the github readme.md leave a link to the main website https://datafusion.apache.org/ 🤔

uses `Apache Arrow <https://arrow.apache.org>`_ as its in-memory format. DataFusion's target users are
developers building fast and feature rich database and analytic systems,
customized to particular workloads. See `use cases <https://datafusion.apache.org/user-guide/introduction.html#use-cases>`_ for examples.

"Out of the box," DataFusion offers `SQL <https://datafusion.apache.org/user-guide/sql/index.html>`_
and `Dataframe <https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html>`_ APIs,
excellent `performance <https://benchmark.clickhouse.com/>`_, built-in support for CSV, Parquet, JSON, and Avro,
extensive customization, and a great community.
`Python Bindings <https://github.com/apache/datafusion-python>`_ are also available.

DataFusion features a full query planner, a columnar, streaming, multi-threaded,
vectorized execution engine, and partitioned data sources. You can
customize DataFusion at almost all points including additional data sources,
query languages, functions, custom operators and more.
See the `Architecture <https://datafusion.apache.org/contributor-guide/architecture.html>`_ section for more details.

To get started, see

Expand Down