Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windowseventlogreceiver application channel Failed to open publisher when running as service #21491

Closed
ShiranAvidov opened this issue May 4, 2023 · 14 comments · Fixed by #28587

Comments

@ShiranAvidov
Copy link
Contributor

Component(s)

pkg/stanza, receiver/windowseventlog

What happened?

Description

I am running otel contrib exe as a service in Windows, which collects logs from the event viewer (application, system and security) and writes them to a file. I get this error (shown in event viewer under application):
windows/operator.go:224 Failed to open publisher: failed to open publisher handle: The system cannot find the file specified.: writing log entry to pipeline without metadata
Its worth mentioning that if I run otel contrib exe via terminal (not as service) I do not get this error.

Steps to Reproduce

Running otel contrib exe as service:
New-Service -Name 'MyService' -BinaryPathName "<<OTEL_CONTRIB_EXE>> --config <<OTEL_CONFIG>>"
The error will be shown in event viewer under application and in exporter (in my case a file).

Expected Result

No error.

Actual Result

`The description for Event ID 3 from source MyService cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

1.683206406737252e+09 error windows/operator.go:224 Failed to open publisher: failed to open publisher handle: The system cannot find the file specified.: writing log entry to pipeline without metadata {"kind": "receiver", "name": "windowseventlog/application/localhost_windows_system", "data_type": "logs", "operator_id": "windows_eventlog_input", "operator_type": "windows_eventlog_input"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).processEvent
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:224
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).read
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:195
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readToEnd
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:179
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readOnInterval
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:167
`

Collector version

v0.75.0

Environment information

Environment

OS: Windows Server 2019

OpenTelemetry Collector configuration

receivers:
  windowseventlog/application/localhost_windows_system:
    channel: Application
    attributes:
      type: agent-windows
  windowseventlog/security/localhost_windows_system:
    channel: Security
    attributes:
      type: agent-windows
  windowseventlog/system/localhost_windows_system:
    channel: System
    attributes:
      type: agent-windows
processors:
  resourcedetection/system:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]
exporters:
  logging:
  file:
    path: C:\Users\logzio\Documents\test.txt
service:
  pipelines:
    logs:
      receivers:
        - windowseventlog/application/localhost_windows_system
        - windowseventlog/security/localhost_windows_system
        - windowseventlog/system/localhost_windows_system
      processors:
        - resourcedetection/system
      exporters: [file]
  telemetry:
    logs:
      level: "debug"

Log output

The description for Event ID 3 from source MyService cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

1.683206406737252e+09	error	windows/operator.go:224	Failed to open publisher: failed to open publisher handle: The system cannot find the file specified.: writing log entry to pipeline without metadata	{"kind": "receiver", "name": "windowseventlog/application/localhost_windows_system", "data_type": "logs", "operator_id": "windows_eventlog_input", "operator_type": "windows_eventlog_input"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).processEvent
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:224
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).read
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:195
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readToEnd
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:179
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readOnInterval
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.75.0/operator/input/windows/operator.go:167

Additional context

No response

@ShiranAvidov ShiranAvidov added bug Something isn't working needs triage New item requiring triage labels May 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented May 8, 2023

By default, the collector detects it's running on Windows and will actively register as a service. I don't think you need to run it as a service explicitly. That might explain the problem you're seeing.

@atoulme atoulme removed the needs triage New item requiring triage label May 8, 2023
@kago-dk
Copy link

kago-dk commented May 9, 2023

@atoulme Do you know where I find more details about "the collector detects it's running on Windows and will actively register as a service."?

@ShiranAvidov
Copy link
Contributor Author

@atoulme The service should be in Services? When I run it regularly (otelcol-contrib.exe --config cofig.yaml) I can't see any new service in Services...

@andrzej-stencel
Copy link
Member

By default, the collector detects it's running on Windows and will actively register as a service. I don't think you need to run it as a service explicitly. That might explain the problem you're seeing.

I don't think this is the correct way to describe what's happening. The way I'd describe it is:

The OpenTelemetry Collector binary automatically detects whether it is being run as a Windows Service or from an interactive session and behaves a little differently.

This is probably the (non-root) cause of the problem. Now we probably need to dig deeper to find why this bug occurs when the binary is run as a Windows Service.

@ShiranAvidov
Copy link
Contributor Author

Another thing I noticed is that the error occurs only when I use Application channel. The service writes application events and collects them - maybe this is why we get errors...

@ShiranAvidov
Copy link
Contributor Author

ShiranAvidov commented Jul 19, 2023

@atoulme I noticed when running as a service, all debug logs are written as events in Application channel, which get processed by the receiver, which writes debug logs, which get processed again and so on. The pr I opened will fix all these issues, by excluding the service itself from processing its events (the provider is the service name). That way I will no longer get this error, and this loop of events will not happen anymore.

@rlarranaga
Copy link

Hi @ShiranAvidov ,

I was testing the log filter on windows, and I ran into an issue where messages are not filtered. Looking in the logs, I encountered the exact same error message you describe here.

Because the message states there is no metadata being added to the log, It made
to my surprise, filtering started to work.

I might have misunderstood your comments above, but your PR simply excludes otelcol as a log source to prevent these messages from being forwarded, correct? Would your PR also resolve the issue with filtering events? Or is there a deeper issue with the windowseventlogreceiver that needs to be looked at?

Thanks

@ShiranAvidov
Copy link
Contributor Author

Hey @rlarranaga, yep the PR excludes logs by provider name. I did not make any changes to the receiver's logic, so there is definitely an issue. I did some tests and I got the error from the otelcol-contrib debug logs. When I ran other services I did not get the error. I will test it more to be sure I don't get the error anymore.
Regarding your issue, do you run the otelcol-contrib as a service?

@rlarranaga
Copy link

Hi @ShiranAvidov

I am actually running a downstream project from open-telemetry collector - A repackaged version from one of the centralized log vendors). My main objectives in posting this question here are:

  1. Confirming that the issue filtering windows events properly is correlated with this issue, and confirming that the issue is present in opentelemetry-collector-contrib.

  2. My main concerns with the PR are that, if these problems are indeed correlated, removing the provider masks a functional issue without resolving it. So i was interested in understanding what is the context behind doing that.

I do run the Open telemetry collector as a service. But i see differences in functional behavior when the collector runs from the command prompt, versus when it runs as a service (namely, the event filtering functionality is not working when running as service).

Since you are running the plain vanilla collector, would you mind testing event filtering when the collector runs as a service? I currently do not have another machine available to do so. I can send a couple of examples if needed.

Thanks!

@atoulme
Copy link
Contributor

atoulme commented Sep 29, 2023

@pjanotti offered to take a look. Assigning to him

@pjanotti
Copy link
Contributor

pjanotti commented Oct 3, 2023

Summary of my investigation:

  1. The issue is not directly caused by the collector running as a service. Running as a service triggers the issue because it generates events from a provider that is "broken". You can get similar results by running the collector as a normal process and by logging events that are being captured for which the provider metadata is not available.

  2. I couldn't repro the "The description for Event ID 3 from source MyService cannot be found. Either the component that raises this event ...". My guess is that this was with an old version of the collector. I tried to run v0.75.0, but, it failed against the configuration on the description. Anyway, this won't matter much to the other findings.

  3. This should be handled by the installer, when the collector is installed as service, (see manual workaround below): the current code from core avoids the mentioned on item 2 (above), but, doesn't setup a proper event provider. This makes reading the event log a bit harder:
    a. On eventwvr the "General" tab doesn't show the text, only on the "Detailed" tab that is available.
    b. On Powershell the Message property is empty.

  4. The code assumes that if the event has a provider, that the same provider is installed and functional. That is incorrect. It is expected that some providers may go corrupt, e.g.: message file was deleted but provider wasn't removed. This seems the case of warning, not error, and either logged only once or damped. Related optimization: provider metadata can be stored to avoid the native call for every event.

  5. For events without metadata the conversion to XML is broken: the EventData element is being incorrectly handled and the useful information doesn't reach the log file configured on the description.

Temporary workaround for item 3 (which avoids the error when "capturing own events"):

eventcreate /t information /id 1 /l application /d "Creating event provider for '<MyServiceName>'" /so <MyServiceName>

[EDIT: item 3 actually shouldn't be fixed via code, but, whoever is installing the service should create the proper event source]

djaglowski pushed a commit that referenced this issue Oct 13, 2023
**Description:**

Cache the publisher event to:

1. Avoid logging the same error message every time one event from the
given source is logged.
2. Avoid opening and closing the event publisher for every single event.

**Link to tracking Issue:**

[Item 4 described on the
investigation](#21491 (comment))
for issue #21491.

**Testing:**

* Go tests for `pkg/stanza` and `receiver/windowseventlogreceiver` on
Windows box.
* Ran the contrib build locally to validate the change.
* Can't run the full make locally: misspell is failing on Windows
because the command line is too long.

**Documentation:**

Let me know if changing the severity of the log message requires a
changelog update.
JaredTan95 pushed a commit to openinsight-proj/opentelemetry-collector-contrib that referenced this issue Oct 18, 2023
…-telemetry#27658)

**Description:**

Cache the publisher event to:

1. Avoid logging the same error message every time one event from the
given source is logged.
2. Avoid opening and closing the event publisher for every single event.

**Link to tracking Issue:**

[Item 4 described on the
investigation](open-telemetry#21491 (comment))
for issue open-telemetry#21491.

**Testing:**

* Go tests for `pkg/stanza` and `receiver/windowseventlogreceiver` on
Windows box.
* Ran the contrib build locally to validate the change.
* Can't run the full make locally: misspell is failing on Windows
because the command line is too long.

**Documentation:**

Let me know if changing the severity of the log message requires a
changelog update.
djaglowski added a commit that referenced this issue Oct 27, 2023
…in EventXML (Windows) (#28587)

**Description:**
The [XML schema for Windows events supports `Data` elements without the
`Name`
attribute](https://learn.microsoft.com/en-us/windows/win32/wes/eventschema-datafieldtype-complextype),
however, the current implementation doesn't capture `Data` elements
without the `Name` attribute.

Capturing such elements is specially important for events for which the
publisher metadata is invalid. These elements contain the data that will
give a user a much better chance of actually understanding the event,
see
[here](#21491 (comment))
for an example.

I'm adding also the optional `Binary` element. Although this element
typically requires knowledge of the actual data type it is representing
sometimes it can be useful together with the data elements.

I consider this to be a breaking change because it modifies the layout
of the event generated by the package. It isn't an addition, the old
representation is changed, please refer to the changes in tests to see
the difference.

**Link to tracking Issue:**
This is the last pending item to fix #24493, #21491 ([item
5](#21491)).

**Testing:**
- Local run of the affected receiver and package
- "Run Windows" on my fork

**Documentation:**
N/A

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
@crobert-1
Copy link
Member

Fixed by #28587

jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…-telemetry#27658)

**Description:**

Cache the publisher event to:

1. Avoid logging the same error message every time one event from the
given source is logged.
2. Avoid opening and closing the event publisher for every single event.

**Link to tracking Issue:**

[Item 4 described on the
investigation](open-telemetry#21491 (comment))
for issue open-telemetry#21491.

**Testing:**

* Go tests for `pkg/stanza` and `receiver/windowseventlogreceiver` on
Windows box.
* Ran the contrib build locally to validate the change.
* Can't run the full make locally: misspell is failing on Windows
because the command line is too long.

**Documentation:**

Let me know if changing the severity of the log message requires a
changelog update.
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…in EventXML (Windows) (open-telemetry#28587)

**Description:**
The [XML schema for Windows events supports `Data` elements without the
`Name`
attribute](https://learn.microsoft.com/en-us/windows/win32/wes/eventschema-datafieldtype-complextype),
however, the current implementation doesn't capture `Data` elements
without the `Name` attribute.

Capturing such elements is specially important for events for which the
publisher metadata is invalid. These elements contain the data that will
give a user a much better chance of actually understanding the event,
see
[here](open-telemetry#21491 (comment))
for an example.

I'm adding also the optional `Binary` element. Although this element
typically requires knowledge of the actual data type it is representing
sometimes it can be useful together with the data elements.

I consider this to be a breaking change because it modifies the layout
of the event generated by the package. It isn't an addition, the old
representation is changed, please refer to the changes in tests to see
the difference.

**Link to tracking Issue:**
This is the last pending item to fix open-telemetry#24493, open-telemetry#21491 ([item
5](open-telemetry#21491)).

**Testing:**
- Local run of the affected receiver and package
- "Run Windows" on my fork

**Documentation:**
N/A

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this issue Nov 24, 2023
…in EventXML (Windows) (open-telemetry#28587)

**Description:**
The [XML schema for Windows events supports `Data` elements without the
`Name`
attribute](https://learn.microsoft.com/en-us/windows/win32/wes/eventschema-datafieldtype-complextype),
however, the current implementation doesn't capture `Data` elements
without the `Name` attribute.

Capturing such elements is specially important for events for which the
publisher metadata is invalid. These elements contain the data that will
give a user a much better chance of actually understanding the event,
see
[here](open-telemetry#21491 (comment))
for an example.

I'm adding also the optional `Binary` element. Although this element
typically requires knowledge of the actual data type it is representing
sometimes it can be useful together with the data elements.

I consider this to be a breaking change because it modifies the layout
of the event generated by the package. It isn't an addition, the old
representation is changed, please refer to the changes in tests to see
the difference.

**Link to tracking Issue:**
This is the last pending item to fix open-telemetry#24493, open-telemetry#21491 ([item
5](open-telemetry#21491)).

**Testing:**
- Local run of the affected receiver and package
- "Run Windows" on my fork

**Documentation:**
N/A

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants