Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - Duplicate trace ids & span by different hosts in distributed system #95

Closed
fularac opened this issue Apr 9, 2024 · 3 comments
Closed

Comments

@fularac
Copy link

fularac commented Apr 9, 2024

We've rolled out our use in production! With that, we've noticed that there are duplicate trace ids and trace parent id combinations. Our logs show that this is happening with incoming requests that do not have a traceparent header and they are on separate hosts. Our current configuration creates launches 20 hosts with 36 workers all around the same time. This is happening for .003 % of requests over a 4 hour period, but still concerning.

I suspect the math.randomseed seed is not sufficiently random to prevent the same seed from being used in different hosts.

math.randomseed(ngx.time() + ngx.worker.pid())

Problems I see with using ngx.time() + ngx.worker.pid():

  • ngx.time() is returning seconds https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#ngxtime

    Returns the elapsed seconds from the epoch for the current time stamp from the Nginx cached time (no syscall involved unlike Lua's date library).

  • The PID across our hosts will be fairly similar values as they all launch from the same state.
  • Launching multiple hosts with x many workers creates a x many second window where the random generator can be seeded the same between host. In my case with 36 workers per host, there's a 36 second window

Some improvements/fixes that would help with this issue:

  • Use higher resolution timestamp e.g. ngx.now (Something with nano seconds?)
  • Leverage host level random generator as the seed
  • Expose ability to set math.randomseed for utility module
@yangxikun
Copy link
Owner

Thanks! I will fix this bug by random_seed() in skywalking-nginx-lua.

yangxikun added a commit that referenced this issue Apr 14, 2024
@fularac
Copy link
Author

fularac commented Apr 22, 2024

@yangxikun can we cut a version for this change?

@yangxikun
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants