Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete empty shard folders when DB is deleted #54

Merged
merged 2 commits into from
Aug 26, 2022

Conversation

danydrouin
Copy link
Contributor

Deleting a DB will currently only remove the index data files in the shard.. but doesn't actually delete the folders which may lead to an inodes leak in certain file systems (such as Spectrum scale).

after recursing the files, the folder should also be deleted.

Deleting a DB will currently only remove the index data files in the shard.. but doesn't actually delete the folders which may lead to an inodes leak in certain file systems (such as Spectrum scale).

after recursing the files, the folder should also be deleted.
@rnewson
Copy link
Contributor

rnewson commented Aug 24, 2022

I think it's safe but we've never tried cleaning up directories as we go. I worry about what happens if something is removing an index and its newly empty parent directories at the same time as a request to create an index that overlaps with one of those directories. can that happen? I think yes. cleanup is one 'thread' and so we can discount collisions of multiple cleanups (which likely wouldn't be a problem anyway), but index creation is done by the managerservice, so delete and create can happen concurrently. if it does, does the index creation spuriously fail because cleanup deleted a directory we just made (but was empty at the time)? I think so, and we must guard against that if so.

@theburge
Copy link
Contributor

@rnewson Yes, the raciness is a bother here (as it is for the "CouchDB DB recovery" code that renames files and resides in IndexCleanupService -- i.e., a separate actor than that which would create directories, etc.).

One much more insidious possibility is that we unlink a directory that another bit of the code has open and is perhaps about to use with openat or similar, such that newly created files are immediately unreachable. If that did occur, it would be pretty serious because the LRU would then refer to something that didn't exist in filesystem terms.

@danydrouin
Copy link
Contributor Author

Hi @rnewson / @theburge, my changes dont impact database recovery scenario since that uses Rename event.

What i noticed is creating a new DB will also call the same cleanup. It does nothing since none of the folders exist.. no index created yet. https://github.com/apache/couchdb/blob/main/src/dreyfus/src/dreyfus_index_manager.erl#L123-L125

If i create a db and index, then delete this DB and recreate it with an index, it will result in a new shard folder. Same DB name was used but it's uniquely identified with a suffix in the shard folder.

// first create db and index
[actor:2] INFO clouseau.cleanup - Removing shards/40000000-5fffffff/test1.1661353443
// delete db
[actor:4] INFO clouseau.cleanup - Removing shards/40000000-5fffffff/test1.1661353443

// second create db and index create calls cleanup)
[actor:3] INFO clouseau.cleanup - Removing shards/40000000-5fffffff/test1.1661353706

I believe my change is safe. But the ultimate test would be to get it tested.
If I can get a build going of clouseau, I could include the jar in a new couchdb docker image and test it on my env.

I'm not setup locally to build clouseau im afraid :(

@theburge
Copy link
Contributor

@danydrouin

Same DB name was used but it's uniquely identified with a suffix in the shard folder.

That's generally true, but not necessarily in all cases. For our purposes (as those running a service using this code and with additional tools, etc.), we cannot necessarily rely on it for reasoning about the safety of the change.

In any case, the recursivelyDelete() function is called in two cases: database deletion and removal of unused indexes. It's the latter case that really opens up the possibility of races, because a user updating their indexes programmatically might trigger the cleanup and then create new indexes rapidly in sequence (from their perspective, although it may end up being resequenced in actual execution).

But the ultimate test would be to get it tested.

If only. :) Proving it works in some number of trials doesn't prove freedom from races.

For a potential race condition we'd need to either successfully explore all possible interleavings (challenging!) or reason it through (with enough confidence that we can decide one way or another).

@danydrouin
Copy link
Contributor Author

@theburge what if i change the recursivelyDelete() to only delete the folders for the database deletion case CleanupPathMsg. Keep it intact for the removal of unused indexes which happens via CleanupDbMsg.

Would that be safer?

@theburge
Copy link
Contributor

@danydrouin

@theburge what if i change the recursivelyDelete() to only delete the folders for the database deletion case CleanupPathMsg. Keep it intact for the removal of unused indexes which happens via CleanupDbMsg.

Would that be safer?

Yeah, that would be safer. It shouldn't be possible to hit any race window there in normal usage. :)

@rnewson Any thoughts on/objection to that change?

@danydrouin danydrouin closed this Aug 25, 2022
@danydrouin danydrouin reopened this Aug 25, 2022
@rnewson
Copy link
Contributor

rnewson commented Aug 25, 2022

That sounds smart to me.

@danydrouin
Copy link
Contributor Author

Please review latest commits.. changes are in.

if (fileOrDir.isDirectory) {
for (file <- fileOrDir.listFiles)
recursivelyDelete(file)
fileOrDir.delete
recursivelyDelete(file, deleteDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should actually always specify deleteDir = true, regardless of the recursivelyDelete caller-passed value. (At least in this case.)

The key issue arises from deleting the top-level directory (i.e., ${SEARCH_ROOT}/shards/${RANGE}/${DB_NAME_AND_SUFFIX}) if it temporarily becomes empty during a user changing a ddoc (e.g., when clouseau is cleaning up an orphaned index signature concurrently with trying to create a directory for a new index signature). However, we do typically want to clean-up the empty ${DDOC_INDEX_SIG} directories because deletion only takes place if a signature has changed.

Passing deleteDir = false ensures that any user changing their ddocs burns through inodes by leaving the subdirectories associated with the orphaned signatures on disk.

However, this whole thing is subtle, and I'm happy to hear it argued differently. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this also made me raise #55 because I think this may be a damned if we do/damned if we don't situation: regardless of the value of deleteDir when recursing, I believe it's possible for CleanupDbMsg to actually delete newly created Lucene files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the usecase.

For our application we create and delete many DBs. This leads to a huge amount of directories left behind and thus the inodes leak.

We hardly ever update our design docs and index definitions. So in our particular case CleanupDbMsg is less of an issue.

At this stage im more interested in having CleanupPathMsg actually cleanup the leftovers. This is why i chose to set the deleteDir = false for CleanupDbMsg scenario, so to not impact it's current behavior (even if it's problematic today). I do understand the need to fix CleanupDbMsg as well which can probably be done via #55 at a future stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the usecase.

Not really, it's partly about correctness and partly about maintaining stable behaviour for large-scale deployments (as with the one I'm considering).

I understand that your current use case is different, but effectively the change introduces a regression for users who do update their index definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im failing to understand why you claim this will cause a regression for "updates" to index definitions.

I didn't change the CleanupDbMsg scenario at all.. it behaves as it did before. It will NOT delete the directories.

Only a the CleanupPathMsg is changed to delete the directories which is from my testing only called when a DB is deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry. (I lost track of the fact that the subdirectories weren't deleted before.)

I'm okay with this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theburge Thanks.. glad we are all clear now.

Can you approve and run the workflow. thanks!

@theburge theburge merged commit 45c155a into cloudant-labs:master Aug 26, 2022
@danydrouin
Copy link
Contributor Author

@theburge can we get a new release build with these changes? thanks!

@theburge
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants