[SPARK-46395][CORE] Assign Spark configs to groups for use in documentation #44755

nchammas · 2024-01-16T16:25:29Z

What changes were proposed in this pull request?

Enable Spark configs to be assigned to documentation groups via a new documentationGroups field on ConfigEntry. These groups will be used to automatically build config tables for display in our documentation.

Instead of having to maintain large blocks of HTML tables throughout our documentation, config tables can simply be included as follows:

{% include_api_gen _generated/config_tables/sql-tuning-caching-data.html %}

sql-tuning-caching-data would be the name of a configuration group defined directly on multiple instances of ConfigEntry to group them together, and from which the above HTML table would be automatically generated for inclusion in our documentation.

This PR is a stripped down version of #44300 to facilitate review of the core idea.

Why are the changes needed?

Using this approach we can accomplish several goals at once:

Eliminate thousands of lines of manually maintained HTML tables of Spark configs.
Ensure that internal configs are not accidentally documented publicly. (e.g. spark.sql.files.openCostInBytes)
Ensure that configs are documented publicly exactly as they are in the code. (e.g. unlike spark.sql.autoBroadcastJoinThreshold)
Improve the presentation of the HTML tables by refining how long config names wrap, as proposed in [SPARK-46923][DOCS] Limit width of configuration tables #44955.

Does this PR introduce any user-facing change?

No, not for users.

For contributors, the documentation build will change so that config tables are generated only when Spark is built. For fast documentation builds -- like what we currently have via SKIP_API=1 -- Spark won't be built and placeholders will be inserted in the place of every config table. This will be addressed in a future PR.

How was this patch tested?

No testing beyond what was already done in #44300, pending review.

Was this patch authored or co-authored using generative AI tooling?

No.

### What changes were proposed in this pull request? - Assign all config tables in the documentation to the new CSS class `spark-config`. - Migrate the config table in `docs/sql-ref-ansi-compliance.md` from Markdown to HTML and assign it to this new CSS class as well. - Limit the width of the config tables to the width of the main content, and force words to break and wrap if necessary. - Remove a styling workaround for the documentation for `spark.worker.resourcesFile` that is not needed anymore thanks to these changes. - Remove some `.global` CSS rules that, due to their [specificity][specificity] interfere with our ability to assign simple rules that apply directly to elements. [specificity]: https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity ### Why are the changes needed? Many configs and config defaults have very long names that normally cannot wrap. This causes tables to overflow the viewport. An egregious example of this is `spark.scheduler.listenerbus.eventqueue.executorManagement.capacity`, which has a default of `spark.scheduler.listenerbus.eventqueue.capacity`. This change will force these long strings to break and wrap, which will keep the table widths limited to the width of the overall content. Because we are hard-coding the column widths, some tables will look slightly worse with this new layout due to extra whitespace. I couldn't figure out a practical way to prevent that while also solving the main problem of table overflow. In #44755 or #44756 (whichever approach gets accepted), these config tables will be generated automatically. This will give us the opportunity to improve the styling further by setting the column width dynamically based on the content. (This should be possible in CSS, but table styling in CSS is limited and we cannot use properties like `max-width`.) We will also be able to insert [word break opportunities][wbo] so that config names wrap in a more visually pleasing manner. [wbo]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr ### Does this PR introduce _any_ user-facing change? Yes, it changes the presentation of tables, especially config tables, in the main documentation. ### How was this patch tested? I built the docs and compared them visually across `master` (left) and this branch (right). `sql-ref-ansi-compliance.html`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/1ad54764-e942-491a-8060-a23cdc2b1d3f" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/5e623d71-1f8b-41ca-b0bc-7166e9a2de7e" /> `configuration.html#scheduling`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/7a14c2d6-b9e6-4114-8902-96c80bebfc87" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/27dec813-f5a0-49de-a74d-f98b5c0f1606" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/76cfb54d-dad7-46e8-a29d-1f0d91962ce1" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/21457627-a7a4-4ede-8b8b-fcffabe1e24c" /> `configuration.html#barrier-execution-mode`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/e9b4118b-85eb-4ee5-9195-7e69e94e1008" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/2f3a1f0e-3d67-4352-b884-94b41c0dd6ea" /> `spark-standalone.html`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/c1923b0b-b57c-45c3-aff9-b8db1c0b39f6" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/6abb45e1-10e0-4a31-b1a3-91ef3a0478d1" /> `structured-streaming-kafka-integration.html#configuration`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/8505cda1-46fc-4b61-be22-16362bbf00fc" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/a5e351af-161a-442b-95a9-2501ec7934c9" />  ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44955 from nchammas/table-styling. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

nchammas · 2024-03-05T17:45:54Z

@holdenk - This is the config documentation approach we discussed on the mailing list. (The alternative, YAML-based approach is over on #44756.)

This PR just adds the fields and methods we need on ConfigEntry. All the work to update our documentation scripts and migrate the HTML tables into the appropriate config entries will happen in subsequent PRs. I'll group them up under an umbrella Jira ticket.

nchammas · 2024-04-04T16:11:25Z

@holdenk - Friendly ping. Are you still interested in shepherding this work? No hard feelings if not.

holdenk · 2024-04-04T16:15:06Z

Oh yes, sorry it's been a busy quarter. Let me try and schedule some review time tomorrow / next week (and do please feel free to ping me again if I forget, work is just very busy as of late).

holdenk · 2024-04-25T23:45:00Z

I like it, one small note around the API change

nchammas · 2024-04-26T04:08:36Z

I like it, one small note around the API change

I'm missing the note. Did a line comment on the diff get swallowed up, perhaps?

nchammas · 2024-05-07T15:11:39Z

Following up regarding your feedback on the API change, @holdenk. Do you recall what it was?

holdenk · 2024-04-25T23:43:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

   */
-  def getAllDefinedConfs: Seq[(String, String, String, String)] = {
+  def getAllDefinedConfs: Seq[(String, String, String, String, Set[String])] = {


This does change a public APIs type signature, I'm not sure it's super important or widely used but given the stated goal of the PR change is to improve the docs do we need this part? Or could a new internal API be introduced if we need this functionality.

I think this is only used by us, and I also think it's the right place to make a change judging from the docstring. We also have a rare chance to "get away" with breaking public APIs due to the upcoming 4.0 release.

But it is indeed a public API. Do you prefer I just create a new API? It will duplicate this existing API but include the new documentation groups as well. I was thinking to also have this return a Seq of a case class with descriptive attribute names, so we're not calling ._1 and the like everywhere.

holdenk · 2024-05-13T20:30:45Z

My bad github staged my review comment for some reason, should be visible now.

github-actions · 2024-08-30T00:22:15Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

nchammas added 3 commits January 12, 2024 12:42

add documentation groups

e3ee67a

update config export methods to include groups

624c63f

update set command to ignore doc groups

86fd741

github-actions bot added SQL CORE PYTHON labels Jan 16, 2024

nchammas mentioned this pull request Jan 16, 2024

[SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation #44300

Closed

nchammas added 3 commits January 16, 2024 14:09

trigger build

4a8546c

Merge branch 'master' into SPARK-46395-spark-config-groups

f3e186d

fix tests

3061723

nchammas mentioned this pull request Jan 30, 2024

[SPARK-46923][DOCS] Limit width of configuration tables #44955

Closed

nchammas added 2 commits February 14, 2024 20:00

Merge branch 'master' into SPARK-46395-spark-config-groups

0efbd62

Merge branch 'master' into SPARK-46395-spark-config-groups

70ea350

holdenk reviewed May 13, 2024

View reviewed changes

Merge branch 'master' into SPARK-46395-spark-config-groups

8b2a9b1

github-actions bot added the Stale label Aug 30, 2024

github-actions bot closed this Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46395][CORE] Assign Spark configs to groups for use in documentation #44755

[SPARK-46395][CORE] Assign Spark configs to groups for use in documentation #44755

nchammas commented Jan 16, 2024 •

edited

Loading

nchammas commented Mar 5, 2024

nchammas commented Apr 4, 2024

holdenk commented Apr 4, 2024

holdenk commented Apr 25, 2024

nchammas commented Apr 26, 2024

nchammas commented May 7, 2024

holdenk Apr 25, 2024

nchammas May 21, 2024

holdenk commented May 13, 2024

github-actions bot commented Aug 30, 2024

[SPARK-46395][CORE] Assign Spark configs to groups for use in documentation #44755

[SPARK-46395][CORE] Assign Spark configs to groups for use in documentation #44755

Conversation

nchammas commented Jan 16, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

nchammas commented Mar 5, 2024

nchammas commented Apr 4, 2024

holdenk commented Apr 4, 2024

holdenk commented Apr 25, 2024

nchammas commented Apr 26, 2024

nchammas commented May 7, 2024

holdenk Apr 25, 2024

Choose a reason for hiding this comment

nchammas May 21, 2024

Choose a reason for hiding this comment

holdenk commented May 13, 2024

github-actions bot commented Aug 30, 2024

nchammas commented Jan 16, 2024 •

edited

Loading