Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooled objects in attibutes (working with Microsoft.Extensions.ComplianceRedaction) #5276

Open
NatMarchand opened this issue Jan 29, 2024 · 13 comments
Labels
bug Something isn't working logs Logging signal related

Comments

@NatMarchand
Copy link

Bug Report

Currently working on net8.0

    <PackageReference Include="Microsoft.Extensions.Compliance.Redaction" Version="8.1.0" />
    <PackageReference Include="Microsoft.Extensions.Telemetry" Version="8.1.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.7.0" />
    <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.7.0" />

Symptom

When using redaction and batch otlp protocol, redacted values are not stored.

What is the expected behavior?

Expected: the redacted value

What is the actual behavior?

Actual: no value store (the key is even not sent)

Reproduce

Little hard to reproduce as it needs a sink to send logs to. However, I've found the issue: when using redaction, the redacted values are stored as JustInTimeRedactor (see) to avoid allocations. When using batch exporter, as it might take time to send telemetry and as JustInTimeRedactor is pooled, the value can either not be available or reused by something else.

Additional Context

I've found a kind of workaround which is adding a Processor which "freezes" pooled values.

public class FreezePooledValuesProcessor : BaseProcessor<LogRecord>
{
    public override void OnStart(LogRecord data)
    {
        base.OnStart(data);
    }

    public override void OnEnd(LogRecord data)
    {
        if (data.Attributes is not null)
        {
            data.Attributes = data.Attributes
                .Select(kvp => kvp.Value is IResettable resettable ? KeyValuePair.Create<string, object?>(kvp.Key, resettable.ToString()) : kvp)
                .ToList();
        }

        base.OnEnd(data);
    }
}
@NatMarchand NatMarchand added the bug Something isn't working label Jan 29, 2024
@NatMarchand NatMarchand changed the title Pooled objects in state (Microsoft.Extensions.ComplianceRedaction) Pooled objects in attibutes (working with Microsoft.Extensions.ComplianceRedaction) Jan 29, 2024
@cijothomas
Copy link
Member

You can use a OTEL Collector and OTLP Exporter to show the repro. You may use this example app - https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/examples/Console/TestOtlpExporter.cs#L16

The SDK's ILogger integration is expected to copy all attributes in the sync path itself, so if this is not working, it is likely a bug. But need to see a minimal repro to confirm.

@NatMarchand
Copy link
Author

Here's a repro:
https://github.com/NatMarchand/repro-dotnet-log-redactor/blob/main/ConsoleApp1/Program.cs
On my computer, I often (not every time) get a NullReferenceException in Microsoft.Extensions.Telemetry.dll!Microsoft.Extensions.Logging.JustInTimeRedactor.ToString
as the inner _redactor is null because the JustInTimeRedactor is reset when it gets back to the pool.
The issue is on a separated thread OpenTelemetry-BatchExportProcessor-OtlpLogExporter and the stack trace is the following:

Microsoft.Extensions.Telemetry.dll!Microsoft.Extensions.Logging.JustInTimeRedactor.ToString(string format, System.IFormatProvider formatProvider) Line 53
Microsoft.Extensions.Telemetry.dll!Microsoft.Extensions.Logging.JustInTimeRedactor.ToString() Line 45
OpenTelemetry.Exporter.OpenTelemetryProtocol.dll!OpenTelemetry.Internal.TagTransformer<OpenTelemetry.Proto.Common.V1.KeyValue>.TryTransformTag(System.Collections.Generic.KeyValuePair<string, object> tag, out OpenTelemetry.Proto.Common.V1.KeyValue result, int? maxLength) Line 73
OpenTelemetry.Exporter.OpenTelemetryProtocol.dll!OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.OtlpLogRecordTransformer.ToOtlpLog(OpenTelemetry.Logs.LogRecord logRecord) Line 152
OpenTelemetry.Exporter.OpenTelemetryProtocol.dll!OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.OtlpLogRecordTransformer.BuildExportRequest(OpenTelemetry.Proto.Resource.V1.Resource processResource, OpenTelemetry.Batch<OpenTelemetry.Logs.LogRecord> logRecordBatch) Line 42
OpenTelemetry.Exporter.OpenTelemetryProtocol.dll!OpenTelemetry.Exporter.OtlpLogExporter.Export(OpenTelemetry.Batch<OpenTelemetry.Logs.LogRecord> logRecordBatch) Line 55
OpenTelemetry.dll!OpenTelemetry.BatchExportProcessor<OpenTelemetry.Logs.LogRecord>.ExporterProc()	

@cijothomas
Copy link
Member

On my computer, I often (not every time) get a NullReferenceException in Microsoft.Extensions.Telemetry.dll!Microsoft.Extensions.Logging.JustInTimeRedactor.ToString

This is something to be reported to the owners of Microsoft.Extensions.Telemetry itself.

The issue is on a separated thread OpenTelemetry-BatchExportProcessor-OtlpLogExporter and the stack trace is the following:

Are you seeing this exception escaping or its caught by OTLPExporter and proceeds without crashing the app?

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/Shared/TagTransformer.cs#L63-L74 is the place where the exception is thrown (and caught correctly as well). I am guessing the issue is with the JustInTimeRedactor itself, that it is incorrectly assuming the state can remain after the Log call is over..... (I don't have cycles to take a deeper look to confirm this...)

@CodeBlanch who fixed similar issue in the OTel SDK itself.

@cijothomas cijothomas added question Further information is requested logs Logging signal related and removed bug Something isn't working labels Feb 21, 2024
@cijothomas
Copy link
Member

I removed the bug tag. When I saw "pool", I initially assumed the LogRecord pooling in this repo, but since then, it is clear that you were referring to the pooling done inside JustInTimeRedactor.

@NatMarchand
Copy link
Author

NatMarchand commented Feb 22, 2024

This is something to be reported to the owners of Microsoft.Extensions.Telemetry itself.

I'll try to report the issue on the other repo, but as you said at first glance it's ok to assume that after leaving Log in OpenTelemetryLogger the attributes are safe to collect as they should be "frozen" or projected to something safe. Like I do in my workaround.

Are you seeing this exception escaping or its caught by OTLPExporter and proceeds without crashing the app?

I catch it because for debugging purpose I configured VS to catch any CLR exception. With regular settings, it indeed doesn't break. However, in such case I don't have the attribute flowing in my log pipeline.

@joperezr
Copy link

joperezr commented Apr 4, 2024

Hey @cijothomas, thanks for looking into this. As explained by @geeknoid in this comment we believe that the issue lies in OTel here as it shouldn't try to call ToString on object references that it keeps around as it cannot be guaranteed that doing so is safe (objects might be disposed, nulled out, or simply in a state that is no longer valid).

@cijothomas
Copy link
Member

Hey @cijothomas, thanks for looking into this. As explained by @geeknoid in this comment we believe that the issue lies in OTel here as it shouldn't try to call ToString on object references that it keeps around as it cannot be guaranteed that doing so is safe (objects might be disposed, nulled out, or simply in a state that is no longer valid).

As suggested here, OTel SDK is expected to read everything off the state in the sync path itself. If it is not working, then it is a bug in OTel .NET sdk. Would you make a minimal repro app to confirm if this is indeed an issue in OTel .NET?
OTel code section doing the buffering: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Logs/LogRecord.cs#L475-L485

@NatMarchand
Copy link
Author

NatMarchand commented Apr 9, 2024

Unless I'm missing something, I don't see how the BufferLogAttributes() method can prevent the issue.

I've read the associated issue mentioned in the method comment but we are not in the same case.
In 2905, the AttributesData property needed to be enumerated and copied because it was of type HostingRequestStartingLog and the enumeration need to be sync. However all the object in the KeyValuePair<string,object> are objects which does not have lifetime scope. Most of them are just good ol' primitive types.

Here the difference is that it is the Value property of KeyValuePair<string,object> which is scoped and the method don't freeze or copy its state.

Something like this would prevent the issue instead:

private void BufferLogAttributes()
{
    var attributes = this.AttributeData;
    if (attributes == null || attributes == this.AttributeStorage)
    {
        return;
    }

    var attributeStorage = this.AttributeStorage ??= new List<KeyValuePair<string, object?>>(attributes.Count);

    foreach(var (key, value) in attributes)
    {
        if(value is IDisposable || value is IResettable)
        {
            value = value.ToString();
        }
        attributeStorage.Add(KeyValuePair<string,object?>.Create(key, value));
    }

    this.AttributeData = attributeStorage;
}

@KalleOlaviNiemitalo
Copy link

I don't believe value is IDisposable || value is IResettable will recognize all limited-lifetime types that can be logged.

@cijothomas cijothomas added the bug Something isn't working label Apr 9, 2024
@cijothomas
Copy link
Member

Thanks!
I have re-added bug label, will reachout to @CodeBlanch to comment further.

@KalleOlaviNiemitalo
Copy link

My employer has some logging code that creates logging-specific wrappers for application-specific objects. These objects have limited lifetime and are not safe to access across threads. The wrappers exist to provide ToString() for logs but skip all string formatting if ToString() is not called. These wrappers do not implement IDisposable, because they just reference the application-specific objects rather than own those; and they do not implement Microsoft.Extensions.ObjectPool.IResettable, because they antedate .NET 8.0.

@reyang reyang removed the question Further information is requested label May 7, 2024
@CodeBlanch
Copy link
Member

Whole lot to unpack here! I'll do my best 🤣

High level I do agree OTel should fully capture the log data. Probably mistakes were made along the way in our design/API but we are where are and now we have to do something here which will be non-breaking.

I want to start with the proposed workaround.

        if (value is IDisposable || value is IResettable)
        {
            value = value.ToString();
        }

IResettable is part of Microsoft.Extensions.ObjectPool. We would have to discuss, but I doubt very much we are going to entertain adding a Microsoft.Extensions.ObjectPool dependency to OTel SDK.

What if we just did this?

        if (value is IDisposable)
        {
            value = value.ToString();
        }

Perhaps we could do this. I have some concerns with it being a breaking change. But what would this really achieve?

If we look at JustInTimeRedactor:

https://github.com/dotnet/extensions/blob/6e0195db97d66cdf082e32799858c34bf5fd0cdb/src/Libraries/Microsoft.Extensions.Telemetry/Logging/JustInTimeRedactor.cs#L14-L17

The whole point of that thing is to implement ISpanFormattable and let supporting components access\write the redacted string without allocation(s).

If we force a ToString() what is the point of JustInTimeRedactor and the pooling altogether? We'll always allocate a string. It just becomes an over-optimization and wasted cycles at that point 😄

Let's say we went ahead and made this change anyway. How would that deploy? Microsoft.Extensions.Telemetry isn't opinionated about what version (if any) of the OTel SDK is used. It doesn't have a way to make sure users are on a good version. Users would somehow need to sort out they need a sufficient version of OTel compatible with Microsoft.Extensions.Telemetry.

Here is an idea for a workaround we could do in extensions:

https://github.com/dotnet/extensions/compare/main...CodeBlanch:redaction-pool-tweak?expand=1

Right now Microsoft.Extensions.Telemetry assumes all logger providers will fully capture the log data. What that change does is essentially bypass the pool if log data is NOT captured. That will work with all OTel SDK versions and potentially any other logger providers out there which also might misbehave. It makes it defensive.

Let's say we start there to stop the bleeding. That workaround will fix the redacted data being missing and preserve the ISpanFormattable to skip string allocations but it will cause extensions to always allocate a JustInTimeRedactor.

Could we do more?

I think it would be great if OTel allowed users/libraries/extensions to participate in its buffering/pooling. Very tricky to do. I took at stab at making it work using IDisposable and ICloneable interfaces:

The idea is OTel would detect values which are IDisposable & ICloneable and take a clone of them. Then it will dispose the clone when it is done with it. This is sort of a mechanism for things to opt-into transferring ownership to OTel. Also hoping the combination of IDisposable & ICloneable is rare enough it won't break anyone.

@michaeldimoudis
Copy link

michaeldimoudis commented Sep 19, 2024

Hi I just ran into this issue using Microsoft.Extensions.Telemetry and Microsoft.Extensions.Compliance.Redaction to redact and hmac sensitive data, and the values are not coming through to Honeycomb (in our case).

Creating a processor I see the values are actually there. Do you suggest any workaround or any ETA when this will be fixed?

Thanks in advance

Edit: I got around this issue with a workaround, by using a Processor and intercepting the redacted attribute (we only have a handful), removing it from Attributes and re-adding it as a ToString() value. Hopefully the SDK can natively support this soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logs Logging signal related
Projects
None yet
Development

No branches or pull requests

7 participants