Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie: stop indexing empty directory nodes #445

Merged
merged 2 commits into from
Jan 17, 2022

Conversation

bom-d-van
Copy link
Member

@bom-d-van bom-d-van commented Jan 11, 2022

There is a strange bug in trie that seems related to indexing empty directory nodes. The actual root cause
is still unknown to me and I fail to reproduce the issue.
In our production, we notice the issue associates
with a cron job for removing files (which would causes the generation of empty directories).

Considering that empty directories doesn't produce much value in go-carbon/graphite, excluding it from
index should cause no harm. And this change also appears to resolve the issue on our production.

Edit:

The bug has been identified and fixed in this commit. But I'm still keeping the changes of excluding empty directories from the trie index as overall, it should be a good change:

There are two bugs fixed in this commit:

  1. Empty directories not properly renewed during indexing (and could be
    trimmed at every two file list scans).
    During insert, trie.insert failed to bump up trie nodes properly if it's
    inserting directory. This means that if the directory being inserted is
    empty and already exists in the trie tree, when concurrent index is
    enabled, the directory nodes might be pruned and then re-inserted in the
    next file list scan. And then on and on. This issue it self is not a
    serious concern if bug #2 below doesn't exist.

  2. Query logics do not handle well for empty directories.
    The symptoms are that if the trie index tree contains empty directories,
    and if a query happens to matching it, it would causes the query state
    stack (matchers) jump to the wrong index, and lead to incorrect matches of
    the metrics.

Context:

On our Graphite production, for cleanup purpose, we have daily cron jobs
removing stale graphite metric files and empty directories in the whisper tree.
The cronjob that removes files is run at 5AM and empty directory removal at 7AM.
This means the above bugs have a time window of 1-2 hours being triggered. And
because empty directories are not handled properly due to bug #1, the issue is
triggered every 2 file list scans. Essentially, if the query matched an empty
directory node and there are other metrics listed after the empty directory path,
trie query can't return proper result. In our case, it's returning missing
results, in theory, it could also return incorrect results.

In theory, these fixes are not needed because we stop indexing empty directory
nodes in commit 67446d3. But it's nice to figure
out the root cause and resolve the issue properly!

Took me almost a week to figure it out! But I'm happy. Tears in rain.

There is a strange bug in trie that seems related to indexing empty directory nodes. The actual root cause
is still unknown to me and I fail to reproduce the issue. In our production, we notice the issue associates
with a cron job for removing files (which would causes the generation of empty directories).

Considering that empty directories doesnt produce much value in go-carbon/graphite, excluding it from
index should cause no harm. And this change also appears to resolve the issue on our production.
@bom-d-van bom-d-van force-pushed the trie/stop-indexing-empty-dirs branch from 6e517d1 to 67446d3 Compare January 11, 2022 08:33
There are two bugs fixed in this commit:
  1. Empty directories not properly renewed during indexing (and could be
     trimmed at every two file list scans).
     During insert, trie.insert failed to bump up trie nodes properly if it's
     inserting directory. This means that if the directory being inserted is
     empty and already exists in the trie tree, when concurrent index is
     enabled, the directory nodes might be pruned and then re-inserted in the
     next file list scan. And then on and on. This issue it self is not a
     serious concern if bug #2 below doesn't exist.

  2. Query logics do not handle well for empty directories.
     The symptoms are that if the trie index tree contains empty directories,
     and if a query happens to matching it, it would causes the query state
     stack (matchers) jump to the wrong index, and lead to incorrect matches of
     the metrics.

Context:

On our Graphite production, for cleanup purpose, we have daily cron jobs
removing stale graphite metric files and empty directories in the whisper tree.
The cronjob that removes files is run at 5AM and empty directory removal at 7AM.
This means the above bugs have a time window of 1-2 hours being triggered. And
because empty directories are not handled properly due to bug #1, the issue is
triggered every 2 file list scans. Essentially, if the query matched an empty
directory node and there are other metrics listed after the empty directory path,
trie query can't return proper result. In our case, it's returning missing
results, in theory, it could also return incorrect results.

In theory, these fixes are not needed because we stop indexing empty directory
nodes in commit 67446d3. But it's nice to figure
out the root cause and resolve the issue properly!

Took me almost a week to figure it out! But I'm happy. Tears in rain.
@bom-d-van bom-d-van merged commit 30538ce into master Jan 17, 2022
@bom-d-van bom-d-van deleted the trie/stop-indexing-empty-dirs branch January 18, 2022 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant