Skip to content

Latest commit

 

History

History
184 lines (158 loc) · 9.96 KB

Monocle_User_Guide.md

File metadata and controls

184 lines (158 loc) · 9.96 KB

Monocle User Guide

Monocle Concepts

Traces

Traces are the full view of a single end-to-end application KPI, for example Chatbot application to provide a response to end user’s question. Traces consist of various metadata about the application run including status, start time, duration, input/outputs etc. They also include a list of individual steps aka “spans with details about that step. It’s typically the workflow code components of an application that generate the traces for application runs.

Spans

Spans are the individual steps executed by the application to perform a GenAI related task”, for example app retrieving vectors from DB, app querying LLM for inference etc. The span includes the type of operation, start time, duration and metadata relevant to that step e.g., Model name, parameters and model endpoint/server for an inference request. It’s typically the workflow code components of an application that generate the traces for application runs.

Setup Monocle

  • You can download Monocle library releases from Pypi
    > pip install monocle_apptrace
  • For Azure support (to upload traces to Azure), install with the azure extra:
    > pip install monocle_apptrace[azure]
  • For AWS support (to upload traces to AWS), install with the aws extra:
    > pip install monocle_apptrace[aws]
  • You can locally build and install Monocle library from source
    > pip install .
  • Install the optional test dependencies listed against dev in pyproject.toml in editable mode
    > pip install -e ".[dev]"

Using Monocle with your application to generate traces

Enable Monocle tracing

You need to import monocle package and invoke the API setup_monocle_telemetry(workflow=<workflow-name>) to enable the tracing. The 'workflow-name' is what you define to identify the give application workflow, for example "customer-chatbot". Monocle trace will include this name in every trace. The trace output will include a list of spans in the traces. You can print the output on the console or send it to an HTTP endpoint.

Using Monocle's out of box support of genAI technology components

Monocle community has done the hard work of figuring out what to trace and how to extract relevant details from multiple genAI technology components. For example, if you have a python app coded using LlamaIndex and using models hostsed in OpenAI, Monocle can seamlessly trace your app. All you need to do enable Monocle tracing.

Example - Enable Monocle tracing in your application

from monocle_apptrace.instrumentor import setup_monocle_telemetry
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate

# Call the setup Monocle telemetry method
setup_monocle_telemetry(workflow_name = "simple_math_app")

llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

chain = LLMChain(llm=llm, prompt=prompt)
chain.invoke({"number":2})

# Request callbacks: Finally, let's use the request `callbacks` to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt)
chain.invoke({"number":2}, {"callbacks":[handler]})

Accessing monocle trace

By default monocle generate traces in a json file created in the local directory where the application is running. The file name by default is monocle_trace_{workflow_name}_{trace_id}_{timestamp}.json where the trace_id is a unique number generated by monocle for every trace. Please refere to Trace span json. The file path and format can be changed by setting those properties as argement to setup_monocle_telemetry(). For example,

setup_monocle_telemetry(workflow_name = "simple_math_app",
    span_processors=[BatchSpanProcessor(FileSpanExporter(
        out_path = "/tmp",
        file_prefix = "map_app_prod_trace_",
        time_format = "%Y-%m-%d"))
    ])

To print the trace on the console, use ConsoleSpanExporter() instead of FileSpanExporter()

For Azure: Install the Azure support as shown in the setup section, then use AzureBlobSpanExporter() to upload the traces to Azure.

For AWS: Install the AWS support as shown in the setup section, then use S3SpanExporter() to upload the traces to an S3 bucket.

Leveraging Monocle's extensibility to handle customization

When the out of box features from app frameworks are not sufficent, the app developers have to add custom code. For example, if you are extending a LLM class in LlamaIndex to use a model hosted in NVIDIA Triton. This new class is not know to Monocle. You can specify this new class method part of Monocle enabling API and it will be able to trace it.

Default configuration of instrumented methods in Monocle

The following files comprise of default configuration of instrumented methods and span names corresponding to them, for each framework respectively.

Following configuration instruments invoke(..) of RunnableSequence, aka chain or worflow in Langchain parlance, to emit the span.

    {
        "package": "langchain.schema.runnable",
        "object": "RunnableSequence",
        "method": "invoke",
        "span_name": "langchain.workflow",
        "wrapper": task_wrapper
    }

Example - Monitoring custom methods with Monocle

from monocle_apptrace.wrapper import WrapperMethod,task_wrapper,atask_wrapper
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# extend the default wrapped methods list as follows
app_name = "simple_math_app"
setup_monocle_telemetry(
        workflow_name=app_name,
        span_processors=[BatchSpanProcessor(ConsoleSpanExporter())],
        wrapper_methods=[
            WrapperMethod(
                package="langchain.schema.runnable",
                object_name="RunnableParallel",
                method="invoke",
                span_name="langchain.workflow",
                wrapper=task_wrapper),
            WrapperMethod(
                package="langchain.schema.runnable",
                object_name="RunnableParallel",
                method="ainvoke",
                span_name="langchain.workflow",
                wrapper=atask_wrapper)
        ])

Going beyond supported genAI components

  • If you are using an application framework, model hosting service/infra etc. that's not currently supported by Monocle, please submit a github issue to add that support.
  • Monocle community is working on adding an SDK to enable applications to generate their own traces.

Understanding the trace output

Trace span json

Monocle generates spans which adhere to Tracing API | OpenTelemetry format. The trace output is an array of spans. Each trace has a unique id. Every span has in the trace has this parent trace_id. Please note that trace_id groups related spans and is auto generated with-in Monocle.

Span JSON Description
{
"name": "langchain.workflow", span name and is configurable in init.py or in setup_monocle_telemetry(...)
"context": { this gets autogenerated
  "trace_id": "0xe5269f0e534efa098b240f974220d6b7",
  "span_id": "0x30b13075eca52f44",
  "trace_state": "[]"
  },
"kind": "SpanKind.INTERNAL", an enum that describes what this span is about. Default value is SpanKind.INTERNAL, as current enums do not cover ML apps
"parent_id": null, if null, this is root span
"start_time": "2024-07-16T17:05:15.544861Z",
"end_time": "2024-07-16T17:05:43.502007Z",
"status": {
  "status_code": "UNSET" status of span to OK or ERROR. Default is UNSET
  },
"attributes": {
  "workflow_name": "ml_rag_app", defines the name of the service being set in setup_monocle_telemetry(...) during initialization of instrumentation
  "workflow_type": "workflow.langchain" type of framework that generated this span
  },
"events": [ captures the log records
  {
   "name": "input", name of the event. If the span is about LLM, then this will be 'input'. For vector store retrieval, this would be 'context_input'
   "timestamp": "2024-07-16T17:05:15.544874Z",
   "attributes": { captures the 'input' attributes. Based on the workflow of the ML framework being used, the attributes change
    "question": "What is Task Decomposition?", represents LLM query
    "q_a_pairs": "..." represents questions and answers for a few shot LLM prompting
   }
  },
  {
   "name": "output", represents 'ouput' event of LLM
   "timestamp": "2024-07-16T17:05:43.501996Z",
  "attributes": {
    "response": "Task Decomposition is ..." response to LLM query.
   }
  }
  ],
  "links": [], unused. Ideally this links other causally-related spans,
but as spans are grouped by trace_id, and parent_id links to parent span, this is unused
  "resource": { represents the service name or server or machine or container which generated the span
    "attributes": {
     "service.name": "ml_rag_app" only service.name is being populated and defaults to the value of 'workflow_name'
    },
  "schema_url": "" unused
   }
}