Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "ENSURE: empty page must be defined as empty type" #2504

Conversation

oleksii-datsiuk
Copy link
Contributor

@oleksii-datsiuk oleksii-datsiuk commented Jun 10, 2024

This PR fixes "ENSURE: empty page must be defined as empty type" error which is described in this issue: #2503

@oleksii-datsiuk
Copy link
Contributor Author

In addition to fixing the problem described in the issue, this PR has 2 more improvements:

  1. In Snapshot.NewPage() there is comment:
    // there is need for _header.Savepoint() because changes here will incremental and will be persist later
    // if any problem occurs here, rollback will catch this changes"
    However there is no savepoint in this function (and according to the history never was). So I added savepoint here. But not exactly in the place where comment is written, but in the "else" block inside this function, where savepoint seems to be important.

  2. I made WalIndexService.CurrentReadVersion synchronized. Its backing field (_currentReadVersion) is accessed (read and modified) in the same class in Clear() and ConfirmTransaction() functions. These functions are atomic by using _indexLock. Especially ConfirmTransaction is interesting, because it first increases _currentReadVersion and then adds some data based on this increased value to the WAL index. So it seems unsafe to be able to access WalIndexService.CurrentReadVersion property while ConfirmTransaction() is still being executed.

@@ -38,7 +38,22 @@ public WalIndexService(DiskService disk, LockService locker)
/// <summary>
/// Get current read version for all new transactions
/// </summary>
public int CurrentReadVersion => _currentReadVersion;
public int CurrentReadVersion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the lock in this property? It's read-only so it shouldn't matter how many threads access it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property itself is read-only, but its backing field is updatable. As I wrote in the comment above:

I made WalIndexService.CurrentReadVersion synchronized. Its backing field (_currentReadVersion) is accessed (read and modified) in the same class in Clear() and ConfirmTransaction() functions. These functions are atomic by using _indexLock. Especially ConfirmTransaction is interesting, because it first increases _currentReadVersion and then adds some data based on this increased value to the WAL index. So it seems unsafe to be able to access WalIndexService.CurrentReadVersion property while ConfirmTransaction() is still being executed.

@mbdavid mbdavid merged commit ca3f905 into mbdavid:master Jul 3, 2024
1 check passed
@mbdavid
Copy link
Owner

mbdavid commented Jul 3, 2024

Hi @oleksii-datsiuk,

Today I managed to do the tests I needed to better understand the problem. I could see the competition problem occurring with your example. I want to thank you again for your attention and effort, as I know it is a very complex job. The whole community thanks you!

@oleksii-datsiuk
Copy link
Contributor Author

Hi @mbdavid

Thank you very much for your appreciation! In fact, we worked on this issue together with Volodymyr Dombrovsky (https://github.com/dombrovsky), so we'll share thanks together ))

Honestly, we had almost no choice other than to investigate this problem and try to fix it. Our product strongly depends on LiteDB, and the more clients are using it, the more we see problems with DB corruption.

After creating this PR I also have found that the fix provided here solves not only scenarios described in original issue (#2503), but also some other related scenarios that finally may cause DB corruption. So for us it was highly important to have this fix. Right now we are using our own fork from version 5.0.17 where we apply our fixes. We had some additional problems with 5.0.19, so we are not risking to switch to the latest version right now.

Btw, I'm not sure that you noticed one more issue #2480 and PR #2481, as it was merged not by you. There we have discovered another scenario in which we were getting DB corruption quite often. My PR prevents DB corruption by preventing usage of disposed snapshots, but maybe there could be some better solution that will also allow problematic use-case to work properly (see issue description).

We also would like to thank you for the great product. We hope that all critical issues will soon be stabilized, and we all will enjoy it to the fullest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants