Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support query tracing in the Postgres filter #18657

Closed
ahachete opened this issue Oct 17, 2021 · 7 comments
Closed

Support query tracing in the Postgres filter #18657

ahachete opened this issue Oct 17, 2021 · 7 comments
Labels
area/postgres area/tracing enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently

Comments

@ahachete
Copy link

Support query tracing in the Postgres filter: RFC: record all or a % of the queries that traverses the Postgres filter and send them as traces to an external collector

Description:
This is a RFC / feature request, if it is feasible.

There are many potential very interesting uses for capturing the queries that are sent to a database like Postgres. They may be used for performance analysis (including throughput and latency); audit and security analysis; load replay; and potentially many others. The main goal of this issue is to enhance the functionality of the existing Postgres filter to capture those queries and send them to an external system for processing.

It is a goal to not impact (measurably) the performance on the Postgres filter due to this process. For this reason, minimal to no processing of the queries should happen. In particular, query parsing should not be performed. Query fingerprinting should also not be performed.

Essentially, this would be similar to logging al the queries, and send the logs via an external logging mechanism; except that it is proposed to send them as traces, including the duration of the query.

It may work as follows: for every Postgres wire protocol message that contains a query (either simple protocol; or extended protocol, including subsequent bind message(s) to also capture the values of the parameters) a trace would be generated. The trace would contain a single span, including among other usual trace/span fields:

  • The query text
  • The parameters and the values, if it is an extended protocol query
  • The timestamp when the query was received at Envoy
  • The duration of the query, as observed by Envoy (when the first reply packet from the query is sent downstream)

These traces may be then sent externally to a collector. Ideally, this could be done over the Open Telemetry Protocol in gRPC format. But given that there's no OTLP protocol support in Envoy (yet: related issues #13801 and #9958); and that there's definitely zipkin support in Envoy; it is proposed to send the traces via zipkin (which is a supported receiver in OpenTelemetry collector contrib), reusing as much code as possible.

There should be options to limit the number of queries logged. However, it should be designed to support high volumes of query logging, as that's required for some of the use cases (in particular, workload replaying). A initial few options (which could potentially be combined) could be:

  • A percentage, used for sampling.
  • Only trace queries whose duration is above a given threshold.
  • Only trace queries if their response is an error.

It is a non-goal to support context propagation --which would be hard, the Postgres protocol doesn't provision anything for this, and applications that support it use non-standardized mechanisms based on SQL comments embedded into the queries.

@ahachete ahachete added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Oct 17, 2021
@rojkov
Copy link
Member

rojkov commented Oct 18, 2021

cc @fabriziomello @cpakulski @dio

@rojkov rojkov added area/postgres area/tracing and removed triage Issue requires triage labels Oct 18, 2021
@cpakulski
Copy link
Contributor

Agree. That would be useful not only for postgres, but other filters as well. There is opened issue #13085 which I am slowly working on and will probably address described use case.

@ahachete
Copy link
Author

Sounds great what you comment about #13085. I just have two related questions, then:

  • This issue relates more to logging, whereas here I'm considering submitting a trace. While a trace can be much more generic, and indeed contain logs, this one is conceptually simple and essentially just adds the duration (and the trace structure). Would this fit in the ongoing work?
  • What's the planned method/technology for logging in that issue? Since logging and tracing use different protocols, and zipkin is proposed here (just an option, others could be used), which would be the technology used for sending to an external location the data?

@cpakulski
Copy link
Contributor

I was thinking about extending access logging, which sends logs to various sinks: a file or grpc server. File sink obviously accepts just lines, but grpc uses more structured approach and can represent database specific structure, like DB name, query, result, etc.
That would however not cover zipkin format.

@ahachete
Copy link
Author

grpc server

If you are planning to send logs via gRPC, then I'll be supporter no. 1 :) Actually, the only reason I suggested zipkin is because there's already zipkin code in Envoy and may be re-used easily (possibly gRPC too).

My question would be: if you plan on sending logs via gRPC, "which" gRPC are you planning to use? I'd recommend not to reinvent the wheel and create a new format; but rather instead to use OTLP/gRPC for that, as OpenTelemetry is becoming more and more standardized. And actually it's quite simple.

As for traces vs logs, I believe a query executed is more of a trace (as it also has a duration; and if one day Postgres would understand this format it could add further spans reflecting internal execution steps of the query) than a log, but honestly I'd be happy with either.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 21, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/postgres area/tracing enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

3 participants