-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add persistent storage to exporter #1278
Add persistent storage to exporter #1278
Comments
We can probably leave this to each exporter since they might want to use other formats? |
Yes, it makes sense for every exporter to use their own format. |
What is a |
I should have clarified this earlier, |
To avoid data loss with transient errors, enable OpenTelemetry exporters to store failed telemetry and retry sending it at later time. Idea is based on persistent storage option available with Application Insights SDK and @reyang design principles discussed in OpenCensus Python. Adding persistent storage to exporters will cover following scenarios.
When the SDK failed to export data to the backend system due to networking issues, to prevent eating up all the memory, we need to either discard excessive data (depending on the case, it could be either latest or oldest), or store them locally (e.g. file, log, reliable pipe, ETW).
In case of application exit/restart/crash, we want to reduce the data loss. Although data loss is unavoidable given we're not a fully transactional system (e.g. your code writes traces to a queue, and the process got killed before the queue item got processed, the data will get lost), having ability to store things locally and being able to pick up later (after machine or application restart) would be useful for some cases.
Console application (backend job, periodic task, command line tools) might need to store the traces during the exit grace period, since sending all the data across networking might not be possible within that grace period.
There are cases where developers need more reliability, for example, auditing logs and QoS logs. We might need to provide an alternative way, so developers can sacrifice performance (e.g. without going through the queue, synchronously persist the log in a local storage or even transmit the data across the network) for reliability.
The design principles:
Storage folder:
Storage folder
=transmission root folder
/application folder
transmission root folder
=HOME
directory of theCURRENT USER
, or the path explicitly specified by the userapplication folder name
=SHA256
hash ofUser identity that runs the application's process
+Path of current executable
datetimestamp(ISO 8601)
-GUID
on it. Example:2020-09-15T210909.267417-21ae34ceb5ee46888f04f9ceb437eec6.blob
RetryAfter
sent from the service and transmit data.Store data:
Read data:
Thread wakes up at configurable time (for example, 30 seconds), reads data from folder and re-transmit to backend.
Delete data:
Delete the data only after transmission is success or data is expired.
The text was updated successfully, but these errors were encountered: