Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable full DNS names #260

Closed
wants to merge 3 commits into from
Closed

Conversation

kubakukis14
Copy link

Description: The limit for DNS hostnames being sent to the reducer increased from 80 characters to the full 256, as according to RFC 1035.

Link to tracking Issue: #256

Testing: automatic tests

Documentation: none

Copy link

linux-foundation-easycla bot commented Apr 6, 2024

CLA Not Signed

@kubakukis14 kubakukis14 marked this pull request as draft April 6, 2024 23:22
@yonch
Copy link
Contributor

yonch commented Apr 9, 2024

Hi @kubakukis14 , awesome you're tackling this! I see the PR is still in draft but thought I'd see if I can help out.

You probably noticed this, but just to document here, that the bpf code copies a fixed amount of bytes from the DNS packets into perf rings, and then the userspace collector parses those bytes. So this change does not need to modify eBPF code.

I don't remember whether this change would require percolating through to and within the reducer. If running this patch locally doesn't seem to output the full length from the reducer, that might be where to look.

@yonch
Copy link
Contributor

yonch commented Apr 9, 2024

I also triggered a build run, if helpful to your efforts

@kubakukis14
Copy link
Author

Hey @yonch, I ran into some more problems trying to test the patch out, I was hoping you could help. Assume I use the build from the unchanged public repo for the following (other than the updated java and gradle versions for building).

First is running kernel collector tests. I have build the vagrant box for ubuntu jammy64 and the necessary test targets, and ran the header tests. Those had no problem running. However running the the kernel collector test (./run-test.sh --kernel-collector-test ubuntu jammy64 according to https://github.com/open-telemetry/opentelemetry-network/blob/main/test/kernel/README.md#kernel_collector_test), I experienced timeouts. I am attaching the log for reference.
kernel-log.txt

The other problem relates to manual testing. I was hoping to run the ebpf collector pipeline in the ubuntu vagrant devbox, and get some deployments with long names to generate tcp traffic, yet I can't seem to run the ~/k8s/deploy.sh script to set up the pipeline (according to https://github.com/open-telemetry/opentelemetry-network/blob/main/dev/devbox/README.md#deploying-the-opentelemetry-ebpf-pipeline-and-optional-workloads-in-a-kubernetes-cluster-in-a-devbox-vm). I am guessing it is because the helm chart has been updated, but it was not reflected in the setup script. Are there any steps I am missing, or is there any other chart I can use?

+ microk8s helm install ebpf-net -f ebpf-net.yaml -f ebpf-net-logging-exporter.yaml --set=splunkObservability.realm=REALM --set=splunkObservability.accessToken=TOKEN splunk-otel-collector-chart/splunk-otel-collector
Error: INSTALLATION FAILED: values don't meet the specifications of the schema(s) in the following chart(s):
splunk-otel-collector:
- (root): Additional property networkExplorer is not allowed

I am not sure if I have the right approach for this, could you please give me some pointers? What is the usual scenario for developing and testing new patches for the kernel collector? Sorry for dumping this here and thank you in advance, setting it all up has been a little frustrating.

@kubakukis14
Copy link
Author

@yonch After enabling extraction of full dns names in previous commits for the kernel collector, they had now been cut off from the right. After walking through the reducer pipeline I've managed to find the spot where the truncation is happening. The place is

if (std::tie(role_a, az_a) <= std::tie(role_b, az_b)) {
, during the creation of agg_root. Since agg_root::role2_t (short_string<80>) is defined in generated code, I'm not quite sure how to change it, and what changing it would imply, so I'm going to need some help with this. How do you think I should proceed?

Here's some debug prints I put around the suspected code:

role_a: nc, role_b: tcp-server-service-abcdefghijklmnopqrstuvwxyzabcdef0123456789.ebpf-net-ns.svc.cluster.local, az_a: (null), az_b: (null)
role1: nc, role2: tcp-server-service-abcdefghijklmnopqrstuvwxyzabcdef0123456789.ebpf-net-ns.svc.cl, az1: , az2:

I believe enlarging the role field should solve the issue.

On another note, this account will likely not get CLA authorised, so once we figure out this issue I'd like to move to a different account and PR and archive this convo.

@yonch
Copy link
Contributor

yonch commented May 3, 2024

Thank you for persevering, sorry for missing your comment.

The specification for generated code is in the render directory. This might be the line:

The render language generated both messages, and the code and data structures to handle them (serialization/deserialization). Changing render on the aggregation core would change only components internal to the aggregation core so is safe, go ahead see if it increases the length throughout (iirc it should).

Hope that’s what you were looking for.. please lmk if I didn’t get it right.

Signed-off-by: Jakub Ráček <kubik54321@gmail.com>
@kubakukis14
Copy link
Author

Thanks @yonch, that was exactly what I was looking for. I managed to get it working and tested and moved it under my CLA authorized account, here: #266

@yonch
Copy link
Contributor

yonch commented May 21, 2024

superseded by #266, closing

@yonch yonch closed this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants