-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional pid as tag #1843
Optional pid as tag #1843
Conversation
Instead of making
You can make the index deterministic by sorting some how (such as by pid). This should address your use case, and solve the cardinality problem. |
@phemmer I think that's a great idea! I'll modify the PR soon to implement your idea. Thanks! |
I think I would prefer it to just add the PID rather than tracking an index. Seems to me that in order to keep the data unique the index cardinaltiy would need to be just as large as the PID cardinality. |
this PR will fix #1668 |
@sparrc The cardinality will be much smaller by keeping an index. For example, if you're tracking a single process, the index will always be |
Where did 65500 series come from? Are you saying that the PID is changing very frequently? |
Every time the process restarts, you'll get a new pid. Over the life time of your influxdb, this can accumulate to a very big number. |
right....I see what you mean, but it does seem like a band-aid that won't really matter once influxdata/influxdb#7151 is finished. |
Required for all PRs:
I've read #1460 and I understand that storing pid as a tag leads to performance problems. However, scenarios(such as ours) exist where having pid as a tag is desirable.
We have a number of long running processes, all with the same name. We're monitoring these processes with procstat. All of our data goes into influx under a 1 day retention policy and is downsampled and placed under a 1 week retention policy. Therefore at any point in time the number of unique pids we store is going to be relatively low.
Not having pid as a tag makes downsampling very difficult, if not impossible. Grouping by fields isn't allowed and therefore doing things like finding the max(memory_rss) / minute , which would require us to group by pid, isn't possible.
This has also had ripple effects making the data difficult to work with in Grafana as well.
Therefore, I've made it possible to have pid as a tag instead of a field via the configuration.