Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md: introduce known issue section #675

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Jan 12, 2024

The users might run into data corrupted issues caused by underlay filesystem. It's out of scope for bboltdb maintainers to fix filesystem issue. But the section to track known issues can help users and contributors to analyse root cause when they run into data corrupted issues.

Closes: #562

Thanks @fyfyrchik for providing the details!

README.md Outdated
Comment on lines 937 to 944
## Known Issue

- Linux kernel releases [ext4: fast commits](https://lwn.net/Articles/842385/)
feature in v5.10. It's a new, lighter-weight journaling method to reduce
unrelated IO for fsync/fdatasync. However, `fast commits` is new feature and
users run into data loss issue after power failure. The data loss cat corrupt
boltdb. If you enable fast commits feature on ext4 filesystem, please ensure
that the kernel includes related fix patches to fast commit. Details in [issue 562](https://github.com/etcd-io/bbolt/issues/562).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add an item in the content
  • Please also provide a link to the fast commit issue, and also clarify which linux kernel versions contain the fix or a related link for users reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please take a look. Thanks

README.md Outdated
@@ -69,6 +69,7 @@ New minor versions may add additional features to the API.
- [LMDB](#lmdb)
- [Caveats & Limitations](#caveats--limitations)
- [Reading the Source](#reading-the-source)
- [Known Issue](#known-issue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Known Issue](#known-issue)
- [Known Issues](#known-issues)

README.md Outdated
Comment on lines 938 to 951
## Known Issue

- Linux kernel releases [ext4: fast commits](https://lwn.net/Articles/842385/)
feature in v5.10. It's a new, lighter-weight journaling method to reduce
unrelated IO for fsync/fdatasync. However, `fast commits` is new feature and
users run into data loss issue after power failure. The data loss cat corrupt
boltdb. There are some related patches.

* [ext4: fast commit may miss tracking unwritten range during ftruncate](https://lore.kernel.org/linux-ext4/20211223032337.5198-3-yinxin.x@bytedance.com/)
* [ext4: fast commit may not fallback for ineligible commit](https://lore.kernel.org/lkml/202201091544.W5HHEXAp-lkp@intel.com/T/#ma0768815e4b5f671e9e451d578256ef9a76fe30e)

These patches were merged into [kernel v5.17](https://lore.kernel.org/lkml/YdyxjTFaLWif6BCM@mit.edu/).
If you enable fast commits feature on ext4 filesystem, please ensure that the
kernel includes related fix patches to fast commit. Details in [issue 562](https://github.com/etcd-io/bbolt/issues/562).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Known Issue
- Linux kernel releases [ext4: fast commits](https://lwn.net/Articles/842385/)
feature in v5.10. It's a new, lighter-weight journaling method to reduce
unrelated IO for fsync/fdatasync. However, `fast commits` is new feature and
users run into data loss issue after power failure. The data loss cat corrupt
boltdb. There are some related patches.
* [ext4: fast commit may miss tracking unwritten range during ftruncate](https://lore.kernel.org/linux-ext4/20211223032337.5198-3-yinxin.x@bytedance.com/)
* [ext4: fast commit may not fallback for ineligible commit](https://lore.kernel.org/lkml/202201091544.W5HHEXAp-lkp@intel.com/T/#ma0768815e4b5f671e9e451d578256ef9a76fe30e)
These patches were merged into [kernel v5.17](https://lore.kernel.org/lkml/YdyxjTFaLWif6BCM@mit.edu/).
If you enable fast commits feature on ext4 filesystem, please ensure that the
kernel includes related fix patches to fast commit. Details in [issue 562](https://github.com/etcd-io/bbolt/issues/562).
## Known issues
- bbolt might run into data corruption issue on Linux when the feature
[ext4: fast commit](https://lwn.net/Articles/842385/), which was introduced in
linux kernel version v5.10, is enabled. The fixes to the issue were included in
linux kernel version v5.17, please refer to links below,
* [ext4: fast commit may miss tracking unwritten range during ftruncate](https://lore.kernel.org/linux-ext4/20211223032337.5198-3-yinxin.x@bytedance.com/)
* [ext4: fast commit may not fallback for ineligible commit](https://lore.kernel.org/lkml/202201091544.W5HHEXAp-lkp@intel.com/T/#ma0768815e4b5f671e9e451d578256ef9a76fe30e)
* [ext4 updates for 5.17](https://lore.kernel.org/lkml/YdyxjTFaLWif6BCM@mit.edu/)
Please also refer to the discussion in https://github.com/etcd-io/bbolt/issues/562.

The users might run into data corrupted issues caused by underlay filesystem.
It's out of scope for bboltdb maintainers to fix filesystem issue. But
the section to track known issues can help users and contributors to analyse
root cause when they run into data corrupted issues.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@ahrtr ahrtr merged commit 273dc4e into etcd-io:main Jan 12, 2024
15 checks passed
@fuweid fuweid deleted the update-unknown-issues branch January 12, 2024 15:51
@fuweid
Copy link
Member Author

fuweid commented Jan 12, 2024

Thanks @ahrtr

@ahrtr
Copy link
Member

ahrtr commented Feb 21, 2024

Based on the Linux kernel version history, both 5.10 and 5.15 are LTS releases, but 5.17 isn't. So I think we should clearly document the exact 5.10.x and 5.15.x patches in which the fix is included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

DB cannot be opened after node hard reset
2 participants