[docs] document LIR attribution #30899

mgree · 2024-12-23T20:35:21Z

Documents the LIR mapping introspection source (#29848).

Preview at https://preview.materialize.com/materialize/30899/transform-data/troubleshooting/.

Motivation

This PR adds a known-desirable feature.
https://github.com/MaterializeInc/database-issues/issues/6551

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

kay-kim

Thank you for this! Left some trivial suggestions (feel free to ignore).
I can pick up after vacation w.r.t. table rendering.

doc/user/content/transform-data/troubleshooting.md

kay-kim · 2024-12-23T21:37:43Z

doc/user/content/transform-data/troubleshooting.md

+```sql
+SELECT mo.name AS name, global_id, lir_id, parent_lir_id, REPEAT(' ', nesting * 2) || operator AS operator,
+       SUM(duration_ns)/1000 * '1 microsecond'::INTERVAL AS duration, SUM(count) AS count
+    FROM           mz_introspection.mz_lir_mapping mlm


Trivial nit (feel free to disregard). Do we want the FROM to either left-align with SELECT or right align with the 'LEFT JOIN'/'JOIN' ?

Have zero opinion as I've seen various alignments when using JOINS and I don't think we have a company style yet. But, this one seems to differ from the others.

Fixed it so FROM aligns with the SELECT, like all the other queries I wrote. I'm happy to have these reformatted, no strong feelings.

doc/user/content/transform-data/troubleshooting.md

kay-kim · 2024-12-23T22:35:18Z

doc/user/content/transform-data/troubleshooting.md

+Running this query on an auction generator will produce results that look something like the following (though the specific numbers will vary, of course):
+
+
+| name         | global_id | lir_id | parent_lir_id | operator                         | duration        | count  |


So, markdown table like this won't preserve the spacing(I know, with all your nice indentation logic ... come on, markdown) :shakes-fist:
https://preview.materialize.com/materialize/30899/transform-data/troubleshooting/#attributing-computation-time

When I get back, I can move separate these into a data file and a table
where in the data file, can use ```mzsql annotation to maintain spacing.

😭 Thanks!

I could also probably just force   or a unicode non-breaking space in there or something, though that's a hell of a kludge. Awkward either way.

Added a patch to use data files as well as some tweaks to improve skimmability (basically, added bullet points since it's easier t skip over the bulleted lists than paragraphs -- since people might have to read the paragraph before determining to skip or not to skip.)

Thank you!!!

ala2134 · 2025-01-03T21:27:50Z

doc/user/content/transform-data/troubleshooting.md

+
+You can [`EXPLAIN`](/sql/explain-plan/) a query to see how it will be
+run as a dataflow. In particular, `EXPLAIN PHYSICAL PLAN` will show
+the concrete, fully optimized plan that Materialize will run.  (That


Do we typically use (...) when we're making an aside? I guess I'm curious why that wouldn't just be its own sentence.

Common in academic writing, but that ivory-tower tone is hardly worth emulating; I dropped the parens.

ala2134 · 2025-01-03T21:29:18Z

doc/user/content/transform-data/troubleshooting.md

+and other internal views to discover which parts of your query are
+computationally expensive (e.g.,
+[`mz_introspection.mz_compute_operator_durations_histogram`](/sql/system-catalog/mz_introspection/#mz_compute_operator_durations_histogram), [`mz_introspection.mz_scheduling_elapsed`](/sql/system-catalog/mz_introspection/#mz_scheduling_elapsed))
+or consuming excessive memory (e.g., ).


Is this intentionally blank after the (e.g. )?

ala2134 · 2025-01-03T21:30:52Z

doc/user/content/transform-data/troubleshooting.md

+
+### Attributing computation time
+
+One way to understand which parts of your query are 'expensive' is to


When you say 'expensive', I presume you mean computationally expensive thus resource/$$ expensive right?

Yes; revised.

ala2134 · 2025-01-03T21:37:13Z

doc/user/content/transform-data/troubleshooting.md

+
+[Worker skew](/transform-data/dataflow-troubleshooting/#is-work-distributed-equally-across-workers) occurs when your data do not end up getting evenly
+partitioned between workers.  Worker skew can only happen when your
+cluster has more than one worker. (You can query


Similar nit as my first comment about () instead of its own standalone sentence. 😅

Revised as well.

Co-authored-by: Kay Kim <kaykim00@gmail.com>

kay-kim · 2025-01-09T19:45:48Z

doc/user/content/transform-data/troubleshooting.md

+
+```sql
+  SELECT mo.name AS name, mlm.global_id AS global_id, lir_id, parent_lir_id, REPEAT(' ', nesting * 2) || operator AS operator,
+         levels, to_cut, hint, pg_size_pretty(savings)


I moved the hint before pg_size_pretty ... so that the hint shows in the page. I could also rename columns so that they all fit in ... but, eh.

Honestly, there's something to be said for not even selecting out levels or to_cut, since they feel pretty inward looking.

kay-kim · 2025-01-09T19:46:45Z

doc/user/content/transform-data/troubleshooting.md

+- The `duration` column shows that the `TopK` operator is where we spend the
+  bulk of the query's computation time.
+
+- Creating an index on a view executes the underlying view query. As such, the


Wasn't sure if this is why we have the 2 global_ids, but ... wanted some explanation since we just state the index has the 2 ... and people might think it's just because of our WHERE clause.

I like the changes; I replaced 'executed' with 'started', since 'execution' feels like it implies termination.

What else do we need here to be ready to go?

kay-kim

LGTM! Thank you!

mgree added A-docs Area: documentation A-optimization Area: query optimization and transformation T-observability labels Dec 23, 2024

mgree requested a review from ala2134 December 23, 2024 20:35

mgree requested a review from a team as a code owner December 23, 2024 20:35

kay-kim reviewed Dec 23, 2024

View reviewed changes

ala2134 reviewed Jan 3, 2025

View reviewed changes

mgree and others added 6 commits January 9, 2025 11:21

explain LIR attribution

a965ba5

kay's ce

16c6cdb

Co-authored-by: Kay Kim <kaykim00@gmail.com>

kay's fancy tip template

edb480e

Co-authored-by: Kay Kim <kaykim00@gmail.com>

add WHERE clause, some clarifications; thanks @kay-kim!

20e1ba7

improve worker skew query, clarify

e4552bc

fix formatting

1925ce3

mgree force-pushed the docs-lir-troubleshooting branch from 169cd80 to 1925ce3 Compare January 9, 2025 16:40

mgree and others added 5 commits January 9, 2025 11:46

address @ala2134's comments

444e0ec

another parenthetical

864eff2

docs: Use data files to preserve spacing in tables

e2119fd

wording tweaks

77c1bde

linting errors

3375a81

kay-kim reviewed Jan 9, 2025

View reviewed changes

kay-kim and others added 2 commits January 9, 2025 15:09

use alias + tweak 2 global ids explanation

c4f8ad5

word choice nit s/executed/started/

ee071f3

kay-kim approved these changes Jan 9, 2025

View reviewed changes

kay-kim enabled auto-merge (squash) January 9, 2025 21:53

kay-kim merged commit 960265c into MaterializeInc:main Jan 9, 2025
11 checks passed

mgree mentioned this pull request Jan 15, 2025

[docs] Explain each LIR operator #31054

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] document LIR attribution #30899

[docs] document LIR attribution #30899

mgree commented Dec 23, 2024 •

edited

Loading

kay-kim left a comment

kay-kim Dec 23, 2024

mgree Dec 26, 2024

kay-kim Dec 23, 2024

mgree Dec 26, 2024

kay-kim Jan 9, 2025

mgree Jan 9, 2025

ala2134 Jan 3, 2025

mgree Jan 9, 2025

ala2134 Jan 3, 2025

mgree Jan 9, 2025

ala2134 Jan 3, 2025

mgree Jan 9, 2025

ala2134 Jan 3, 2025

mgree Jan 9, 2025

kay-kim Jan 9, 2025

mgree Jan 9, 2025

kay-kim Jan 9, 2025

mgree Jan 9, 2025

kay-kim Jan 9, 2025

kay-kim left a comment

		Running this query on an auction generator will produce results that look something like the following (though the specific numbers will vary, of course):


		\| name \| global_id \| lir_id \| parent_lir_id \| operator \| duration \| count \|


		### Attributing computation time

		One way to understand which parts of your query are 'expensive' is to

[docs] document LIR attribution #30899

[docs] document LIR attribution #30899

Conversation

mgree commented Dec 23, 2024 • edited Loading

Motivation

Checklist

kay-kim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kay-kim left a comment

Choose a reason for hiding this comment

mgree commented Dec 23, 2024 •

edited

Loading