feat: Add `get_chain_root_span` utility for langchain instrumentation #1054

anticorrelator · 2024-10-07T15:36:31Z

partially resolves #4158
resolves #1052

Adds a utility to get the most recent chain span that can be found traversing up the trace tree. This will help grab the most appropriate span for sending user feedback.

RogerHYang · 2024-10-07T15:50:09Z

Can you please add a test similar to this one, and make sure it's working with concurrency? Thanks!

python/instrumentation/openinference-instrumentation-langchain/tests/test_instrumentor.py

...eninference-instrumentation-langchain/src/openinference/instrumentation/langchain/_tracer.py

RogerHYang · 2024-10-10T19:36:14Z

python/instrumentation/openinference-instrumentation-langchain/tests/test_instrumentor.py

+    ), "Did not capture all root spans during execution"
+
+    assert (
+        len(set(id(span) for span in root_spans_during_execution)) == 2 * n


Shouldn't the number of root spans be n, i.e. one for each run, since there are n runs?

Otherwise this test is not really testing the tree climbing procedure.

Since we get ReadableSpan from in_memory_span_exporter, it's easy to determine which is root, i.e. when span.parent is None.

Should we also check to see if root span_id is actually correct, in addition to counting them?

Otherwise it could just be any span (if we refactor the code), and we wouldn't be sure.

Should we run the same test for the threaded version?

Right now the threaded test above is different.

Ideally we should ensure our implementation works for both types of concurrency.

Suggested change

len(set(id(span) for span in root_spans_during_execution)) == 2 * n

len(set(id(span) for span in root_spans_during_execution)) == n

The RunnableLambdas create their own chain, so they are their own root

The RunnableLambdas create their own chain, so they are their own root

If that's the case then this is not really testing the tree climbing procedure. Is that correct?

It's not, though if like you suggested we yield the the ancestor chains we can test how many there are. I didn't end up doing that since it feels like a confusing interface?

This is not related to yielding the ancestor chain. All i'm pointing out is that we are not testing what we have implemented.

I know that, I'm suggesting yielding the ancestor chain can be an easy way to do that

I know that

So are you suggesting that the logic you have implemented so far doesn't need to be tested?

I’m not saying it has to be tested, but I’ve noticed the logic isn’t being tested. Since we seem to agree on that observation, I’d like to understand why leaving it untested is considered the better option.

I am specifically offering an alternative strategy to testing whether or not it's the right ancestor directly: if we change this implementation to yield ancestors we can do (I think) a good enough test by counting how many ancestors are yielded each time instead of directly checking the ancestor directly.

Regarding needing to test it: I can go either way on it, I'm not positive this test is needed though I agree it would be nice. I'm using tests as a guide here to scaffold the implementation, not necessarily be exhaustive.

I'm not positive this test is needed though I agree it would be nice. I'm using tests as a guide here to scaffold the implementation, not necessarily be exhaustive.

While I understand the test may not seem essential right now, it’s important to consider that this repo is updated only occasionally. Given the infrequent changes and the high likelihood that any future updates may be handled by someone new, having tests that are both thorough and precise can be incredibly beneficial. These tests can serve as reliable documentation and help prevent potential issues when the codebase is revisited later.

I am specifically offering an alternative strategy to testing

As an aside, I previously mentioned that the self.run_map from the base instrumentor already stores the family tree of UUIDs. You can leverage this to simplify both the implementation and the testing, as it’s the ultimate source of truth for the family tree.

...eninference-instrumentation-langchain/src/openinference/instrumentation/langchain/_tracer.py

python/instrumentation/openinference-instrumentation-langchain/tests/test_instrumentor.py

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

RogerHYang · 2024-10-10T20:13:36Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

@@ -64,6 +64,20 @@ def _uninstrument(self, **kwargs: Any) -> None:
    def get_span(self, run_id: UUID) -> Optional[Span]:
        return self._tracer.get_span(run_id) if self._tracer else None

+    def get_root_chain_span(self, run_id: UUID) -> Optional[Span]:


I think by making the following signature change, we could make this more general and at the same time simplify our internal logic, given the following compositional equivalence. This simplifies our backend logic because we only need a Dict[UUID, Optional[UUID]] to track whose parent is whom, and we can just call on the preexisting self.run_map.

get_root_chain_span(run_id) is get_span(get_root_chain_run_id(run_id))

We just have to also update get_span to take None as argument, which is a trivial change.

Suggested change

def get_root_chain_span(self, run_id: UUID) -> Optional[Span]:

def get_root_chain_run_id(self, run_id: UUID) -> Optional[UUID]:

FYI, self.run_map seems to be all you need.

RogerHYang · 2024-10-10T20:23:40Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

+            if span_id in tracer._root_span_ids:
+                return span
+
+            span = tracer._parent_span_by_span_id.get(span_id)  # get parent span


Strictly speaking, we should not be using private methods in this scope.

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

RogerHYang · 2024-10-16T15:01:14Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

@@ -64,6 +64,19 @@ def _uninstrument(self, **kwargs: Any) -> None:
    def get_span(self, run_id: UUID) -> Optional[Span]:
        return self._tracer.get_span(run_id) if self._tracer else None

+    def get_root_chain_spans(self, run_id: UUID) -> Optional[List[Span]]:


Unless you disagree, “root” typically refers to just a single node, so naming the function that way would be misleading if it intends to return a list. Based on my reading, this function returns all ancestors that are chains, and skips those that are not, so if the final root of the tree is not a chain, it is actually not returned.

RogerHYang · 2024-10-16T15:07:47Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

+                root_chain_spans.append(span)
+
+            span = tracer._parent_span_by_span_id.get(span_id)  # get parent span
+        return root_chain_spans if root_chain_spans else None


nit. might as well just return an empty list, since it makes no practical difference but simplifies the types.

Suggested change

return root_chain_spans if root_chain_spans else None

return root_chain_spans

RogerHYang · 2024-10-16T15:09:57Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

+            span_id = span.get_span_context().span_id
+            tracer = self._tracer
+            assert tracer
+            if span_id in tracer._root_span_ids:


As mentioned earlier, in this function you could use the existing tracer.run_map from the base Tracer to simplify your implementation, rather than accessing private attributes in this scope. Since tracer.run_map already provides all the necessary functionality, there’s no need to re-implement everything from scratch: the parts highlighted by arrows in the screenshot below would give you the parent_run_id and run_type that you're using for your tree traversal algorithm.

Let's give it one stab to have one tree data structure to rule them all. It probably is worth it if we can do that.

RogerHYang

Approved to unblock. Concerns are noted in comments

mikeldking · 2024-10-16T17:41:54Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

@@ -64,6 +64,19 @@ def _uninstrument(self, **kwargs: Any) -> None:
    def get_span(self, run_id: UUID) -> Optional[Span]:
        return self._tracer.get_span(run_id) if self._tracer else None

+    def get_root_chain_spans(self, run_id: UUID) -> Optional[List[Span]]:


Maybe tying to "chain" might not be the best first approach. What about just get_ancestors which traverses up the run tree is best for now? then we can at least say the last one is "likely" to be a root?

I would also add a docstring here to explain it.

rationale for not tying to "chain" is because we might have things just be "agent" for disambiguation

mikeldking · 2024-10-16T17:46:20Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

+            span_id = span.get_span_context().span_id
+            tracer = self._tracer
+            assert tracer
+            if span_id in tracer._root_span_ids:


Let's give it one stab to have one tree data structure to rule them all. It probably is worth it if we can do that.

mikeldking · 2024-10-17T03:33:50Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

@@ -64,6 +64,26 @@ def _uninstrument(self, **kwargs: Any) -> None:
    def get_span(self, run_id: UUID) -> Optional[Span]:
        return self._tracer.get_span(run_id) if self._tracer else None

+    def get_ancestors(self, run_id: UUID) -> Optional[List[Span]]:


Suggested change

def get_ancestors(self, run_id: UUID) -> Optional[List[Span]]:

def get_ancestor_spans(self, run_id: UUID) -> Optional[List[Span]]:

just to be explicit about the return type

mikeldking · 2024-10-17T03:35:17Z

...ninference-instrumentation-langchain/src/openinference/instrumentation/langchain/__init__.py

+
+            run = tracer.run_map.get(str(run_id))
+            run_id = run.parent_run_id
+        return ancestors if ancestors else None


I think for the simplicity of the typing (and maybe just my FP brain) - I think an empty array might be preferable to represent "none"?

Use ContextVars to track langchain root spans

c200b4e

anticorrelator requested a review from a team as a code owner October 7, 2024 15:36

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Oct 7, 2024

Do not explicitly manage a stack

a6d6702

anticorrelator added 3 commits October 7, 2024 11:51

Ruff 🐶

1f43388

Use separate reset token store

9734103

Use better contextvar type annotation

298318a

anticorrelator force-pushed the dustin/track-langchain-root-spans branch from aefa72c to 298318a Compare October 7, 2024 16:10

anticorrelator added 2 commits October 7, 2024 12:14

Flesh out type annotation for Token object

29263e1

Add langchain instrumentor tests

1af67a4

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 7, 2024

anticorrelator added 2 commits October 7, 2024 17:40

Use better type annotation

89a0d45

Refactor tests

a0b14b4

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 8, 2024

anticorrelator added 4 commits October 8, 2024 17:39

Refactor root chain span propagation

3ddc798

Ruff 🐶

50ffb5a

Update tests

45d1a0e

Fix type annotations

39bd069

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 9, 2024

anticorrelator added 3 commits October 9, 2024 02:12

Remove unused type: ignore

141ee9f

Simplify test to not use sequences

6c02b33

Explicitly define type annotations for RunnableLambda

965ce0f

RogerHYang reviewed Oct 9, 2024

View reviewed changes

anticorrelator added 3 commits October 10, 2024 01:11

Use a RunnableSequence for more robust testing

e1609ac

Remove root span tracking from Span attributes

aea1ae7

Track span tree manually

71a77b7

Remove references to extra span attribute

dd41d7c

RogerHYang reviewed Oct 10, 2024

View reviewed changes

anticorrelator added 4 commits October 15, 2024 10:47

Remove redundant logic

73a40b8

Properly test concurrency

5880442

Remove unused variable

ca6d3ba

Test root chain tree walking

30ddc8f

RogerHYang reviewed Oct 16, 2024

View reviewed changes

RogerHYang approved these changes Oct 16, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 16, 2024

anticorrelator added 3 commits October 16, 2024 15:23

Get all ancestors

5bcc238

Only use run map

611ed88

Remove old bookkeeping

7a35bec

mikeldking approved these changes Oct 17, 2024

View reviewed changes

anticorrelator added 4 commits October 17, 2024 10:14

Fix type annotations

6b82c94

Add docstring

a3e45e3

Ignore unused ignores

26dbe41

Update return types for consistency

4a9e7b2

anticorrelator merged commit 4337aa1 into main Oct 17, 2024
3 checks passed

anticorrelator deleted the dustin/track-langchain-root-spans branch October 17, 2024 20:38

github-actions bot mentioned this pull request Oct 17, 2024

chore(main): release python-openinference-instrumentation-langchain 0.1.29 #1047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `get_chain_root_span` utility for langchain instrumentation #1054

feat: Add `get_chain_root_span` utility for langchain instrumentation #1054

anticorrelator commented Oct 7, 2024 •

edited

Loading

RogerHYang commented Oct 7, 2024

RogerHYang Oct 10, 2024

anticorrelator Oct 10, 2024

RogerHYang Oct 10, 2024

anticorrelator Oct 10, 2024

RogerHYang Oct 10, 2024

anticorrelator Oct 11, 2024

RogerHYang Oct 11, 2024

RogerHYang Oct 12, 2024

anticorrelator Oct 15, 2024

RogerHYang Oct 15, 2024

RogerHYang Oct 10, 2024

RogerHYang Oct 10, 2024

RogerHYang Oct 10, 2024

RogerHYang Oct 16, 2024

RogerHYang Oct 16, 2024

RogerHYang Oct 16, 2024

mikeldking Oct 16, 2024

RogerHYang left a comment

mikeldking Oct 16, 2024

mikeldking Oct 16, 2024

mikeldking Oct 16, 2024

mikeldking Oct 17, 2024

mikeldking Oct 17, 2024

	len(set(id(span) for span in root_spans_during_execution)) == 2 * n
	len(set(id(span) for span in root_spans_during_execution)) == n

	def get_root_chain_span(self, run_id: UUID) -> Optional[Span]:
	def get_root_chain_run_id(self, run_id: UUID) -> Optional[UUID]:

	return root_chain_spans if root_chain_spans else None
	return root_chain_spans

	def get_ancestors(self, run_id: UUID) -> Optional[List[Span]]:
	def get_ancestor_spans(self, run_id: UUID) -> Optional[List[Span]]:

feat: Add get_chain_root_span utility for langchain instrumentation #1054

feat: Add get_chain_root_span utility for langchain instrumentation #1054

Conversation

anticorrelator commented Oct 7, 2024 • edited Loading

RogerHYang commented Oct 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feat: Add `get_chain_root_span` utility for langchain instrumentation #1054

feat: Add `get_chain_root_span` utility for langchain instrumentation #1054

anticorrelator commented Oct 7, 2024 •

edited

Loading