Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation guidelines for contribution content #13703

Merged
merged 5 commits into from
Dec 16, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 9, 2024

Which issue does this PR close?

Rationale for this change

Given recent discussions on improving DataFusion stability (see #13648) I think it is time to document the criteria to add new features to DataFusion to help ensure everyone's expectations are aligned

What changes are included in this PR?

  1. Add summary of conclusion of [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) #12357 to the docs

Are these changes tested?

CI

Are there any user-facing changes?

@alamb
Copy link
Contributor Author

alamb commented Dec 9, 2024

Rather than state some hard rule, I went with the approach of guidance on what would need more / less discussion prior to acceptance. I felt this would give us flexibility but still given potential contributors guidance

2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
5. "Small" functional improvements to existing features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. "Small" functional improvements to existing features
5. "Small" functional improvements to existing features

Non breaking improvements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good qualification -- non breaking changes are certainly more likely to get approved in my experience. I tried to clarify

5. "Small" functional improvements to existing features
6. Additional APIs for extending DataFusion's capabilities

Contributions that likely require discussion prior to acceptance include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should note the preferred way to discuss, GH Discussion/ASF slack, etc

Contributions that likely require discussion prior to acceptance include:

1. New functionality that is part of the "standard sql"
2. New functions that aren't part of the "standard sql"
Copy link
Contributor

@comphead comphead Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following, 🤦 looks like anything related to "standard sql" requires discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording like this so nothing was "required" but to give a hint that major new features are likely to involve more discussion

Contributions that will likely involve more discussion (see Discussing New 
Features above) prior to acceptance include:

1. Major new functionality (even if it is part of the "standard sql")
2. New functions, especially if they aren't part of "standard sql"
3. New data sources (e.g. support for Apache ORC)

1. Bug fixes for existing features
2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 love benchmarks

docs/source/contributor-guide/index.md Outdated Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Dec 10, 2024

BTW We should strive to make datafusion easier to extend. This is not only about particular APIs but also about internal design. We may be principled or not about what kind of SQL variant we provide out of the box, but we should assume extending systems provide something else (eg Ballista speaks Spark SQL, which is different than Datafusion SQL). This is obviously a teaser to longer discussion (#12723).

I approved this PR based on my understand that the wording proposed here doesn't preclude -- and actually encourages -- ground work features serving the extensibility story (the issue linked above, #12604, etc.)

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

BTW We should strive to make datafusion easier to extend.

I 100% agree

I approved this PR based on my understand that the wording proposed here doesn't preclude -- and actually encourages -- ground work features serving the extensibility story (the issue linked above, #12604, etc.)

Yes, this was my intention and I think it fits directly with the stated design goals (aka customizable everything):

  1. Work “out of the box”: Provide a very fast, world class query engine with minimal setup or required configuration.
  2. Customizable everything: All behavior should be customizable by implementing traits.
  3. Architecturally boring 🥱: Follow industrial best practice rather than trying cutting edge, but unproven, techniques.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb

@comphead
Copy link
Contributor

Should we let it hang for day-two to get other reviews?

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2024

Should we let it hang for day-two to get other reviews?

Yes I think we should leave this open for several more days to make sure anyone who wants a chance to comment can do so. Obviously we can always update the wordiing with subsequent PRs too, but I don't think there is any reason to rush this one in

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

docs/source/contributor-guide/index.md Outdated Show resolved Hide resolved
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense!

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
@Omega359
Copy link
Contributor

LGTM

@alamb
Copy link
Contributor Author

alamb commented Dec 15, 2024

Thanks for all the comments so far. I plan to merge this tomorrow, Monday Dec 16 unless anyone else would like time to comment

@alamb
Copy link
Contributor Author

alamb commented Dec 16, 2024

🚀 📖

@alamb alamb merged commit 668984e into apache:main Dec 16, 2024
5 checks passed
@alamb alamb deleted the alamb/update_criteria branch December 16, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
7 participants