Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][Doc] Provide an implementation status page to indicate libraries status of format implementation support #373

Merged
merged 9 commits into from
Feb 27, 2024

Conversation

acezen
Copy link
Contributor

@acezen acezen commented Feb 23, 2024

…libraries status of format implementation support

Proposed changes

This changes add some tables to indicate the format implementation status.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

close #352
related #362

@SemyonSinchenko
Copy link
Member

Just a small suggestion, related to Spark/PySpark. Because we are going to provide the support of multiple Spark/PySpark versions, I would suggest having the table like, for example:

Graphar Spark Version Apache Spark versions Scala version Java version Supported GAR format versions
1.0 3.2.x-3.3.x 2.12.10 1.8 0.1-1.0

Because it is important not just know the supported GAR version but also the version of Spark on which the GraphAr Spark will work.

@acezen
Copy link
Contributor Author

acezen commented Feb 23, 2024

Just a small suggestion, related to Spark/PySpark. Because we are going to provide the support of multiple Spark/PySpark versions, I would suggest having the table like, for example:

Graphar Spark Version Apache Spark versions Scala version Java version Supported GAR format versions
1.0 3.2.x-3.3.x 2.12.10 1.8 0.1-1.0
Because it is important not just know the supported GAR version but also the version of Spark on which the GraphAr Spark will work.

Thanks for the suggestion, Sem. Yes, it's better to provide a table of supported Spark version too. I will add such a page in this Pull Request too.

@acezen
Copy link
Contributor Author

acezen commented Feb 26, 2024

Just a small suggestion, related to Spark/PySpark. Because we are going to provide the support of multiple Spark/PySpark versions, I would suggest having the table like, for example:
Graphar Spark Version Apache Spark versions Scala version Java version Supported GAR format versions
1.0 3.2.x-3.3.x 2.12.10 1.8 0.1-1.0
Because it is important not just know the supported GAR version but also the version of Spark on which the GraphAr Spark will work.

Thanks for the suggestion, Sem. Yes, it's better to provide a table of supported Spark version too. I will add such a page in this Pull Request too.

I have added tables for libraries to provide version compatibility. @SemyonSinchenko @Thespica Can you help to complement the Java and PySpark part?

@SemyonSinchenko
Copy link
Member

@acezen what do you think about publishing this table in releases too?

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

@acezen what do you think about publishing this table in releases too?

You mean the release notes? I think it's a good place to put the version compatibility table.

…libraries status of format implementation support

Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
Signed-off-by: acezen <qiaozi.zwb@alibaba-inc.com>
@acezen acezen marked this pull request as ready for review February 27, 2024 03:14
@acezen acezen changed the title [WIP][Improvement][Doc] Provide an implementation status page to indicate … [Improvement][Doc] Provide an implementation status page to indicate libraries status of format implementation support Feb 27, 2024
@lixueclaire
Copy link
Contributor

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

I will post the version compatibility table to release note after PR been merged.

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

The reader/ writers implementation are quite different between C++ and Scala/Python. the C++ library has low level to high level reader/writer, and mainly implement with arrow. The Scala library aims to Spark DataFrame. I'm not sure we can show the status like format implementation.

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

BTW, I think format implementation are mainly about the meta-info implementation. The reader and writer can put to another document page. We should create a discussion topic to discuss this problem.

@lixueclaire
Copy link
Contributor

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

The reader/ writers implementation are quite different between C++ and Scala/Python. the C++ library has low level to high level reader/writer, and mainly implement with arrow. The Scala library aims to Spark DataFrame. I'm not sure we can show the status like format implementation.

Maybe a top-level overview of reader/writer support across C++ and Scala/Python, emphasizing the integration with Arrow for C++ and Spark DataFrame for Scala, would be beneficial. We can include a status summary and provide a link to the API documentation for those seeking more in-depth information. Shall we proceed with this approach?

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

The reader/ writers implementation are quite different between C++ and Scala/Python. the C++ library has low level to high level reader/writer, and mainly implement with arrow. The Scala library aims to Spark DataFrame. I'm not sure we can show the status like format implementation.

Maybe a top-level overview of reader/writer support across C++ and Scala/Python, emphasizing the integration with Arrow for C++ and Spark DataFrame for Scala, would be beneficial. We can include a status summary and provide a link to the API documentation for those seeking more in-depth information. Shall we proceed with this approach?

Sounds reasonable, I can draft a status of reader/writer with this approach.

docs/format/status.rst Outdated Show resolved Hide resolved
@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

BTW, I think format implementation are mainly about the meta-info implementation. The reader and writer can put to another document page. We should create a discussion topic to discuss this problem.

hi, @lixueclaire , after add some draft to the status, I'm still thinking it's a little weird that put reader/writer implementation and format implementation together. So I suggest we can add another document to describe the reader/writer implementation. And this change can only contain the format implementation.

@lixueclaire
Copy link
Contributor

hi,@acezen. Could we add some information about supported readers/writers? Also, highlight advanced features like validation and filter pushdown?

BTW, I think format implementation are mainly about the meta-info implementation. The reader and writer can put to another document page. We should create a discussion topic to discuss this problem.

hi, @lixueclaire , after add some draft to the status, I'm still thinking it's a little weird that put reader/writer implementation and format implementation together. So I suggest we can add another document to describe the reader/writer implementation. And this change can only contain the format implementation.

Understood, notice that you put this page under format, I think it is ok. We could include the status of reader/writer implementations in the libraries' documentation later.

@SemyonSinchenko
Copy link
Member

GraphAr spark: Java 1.8 and 11, Scala 2.12.x, Hadoop 3, Spark 3.2.2 and 3.3.4
GraphAr PySpark: Java 1.8 and 11, Scala 2.12.x, Hadoop 3, PySpark 3.2.2

(we did not test PySpark against 3.3.4 yet)

@acezen
Copy link
Contributor Author

acezen commented Feb 27, 2024

GraphAr spark: Java 1.8 and 11, Scala 2.12.x, Hadoop 3, Spark 3.2.2 and 3.3.4 GraphAr PySpark: Java 1.8 and 11, Scala 2.12.x, Hadoop 3, PySpark 3.2.2

(we did not test PySpark against 3.3.4 yet)

Updated, please take a look again.

Copy link
Member

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acezen acezen merged commit 4828ebb into apache:main Feb 27, 2024
2 checks passed
@acezen acezen deleted the 352-format-info-page branch February 27, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants