-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: reading a chunk ends up in DataFileRead hanging, other chunks work fine #7054
Comments
We also ran into this DataFileRead hang issue a couple of days ago and had to perform a restore. @fordfrog this might be unrelated, but did you upgrade to version 2.15.2 recently without running this script as per the release notes? Just wondering if this is somehow related. |
in fact 2.15.2 is the first version that i used (so i didn't use the script). i'm in the process of migrating table with close to 5 billions of ticks to hypertable with 71 chunks as of now, and so far this was the only issue i encountered. |
also, not sure if that is important, but i have the database server replicated to another instance of postgresql and there the cluster was fine. so i used the cluster from the replicated database to recover the records (dump and then insert back to the production database after removing the broken chunk). |
@fordfrog as can be seen in the logs, someone ran a
Why was the VF command initiated? It's generally not a recommended practice to use VF. |
it was already after the chunk was broken and it was a try whether complete rewrite of the chunk helps or not... but it freezed, as any other operation on the chunk. |
Are you self-hosting? When the process freezes in I/O and cannot be killed, I would think that my disk is dying. This is probably too low-level to be related to the TimescaleDB extension. Anything interesting in |
i am self-hosting. this issue occured only on that chunk. no issues with the disks otherwise. no reports of issues with disks or filesystem so far. |
@fordfrog looks like linux OOM killer is killing your expensive processes. Did u checked your kernel logs? |
i was killing myself some hanging processes as things that should last few minutes were lasting hours... after recreating that chunk with the same data there were no issues so i suppose it was not the data but rather some corruption of the chunk, or maybe a filesystem (btrfs) issue with blocked read. |
What type of bug is this?
Data corruption, Performance issue
What subsystems and features are affected?
Command processing, Partitioning
What happened?
i can't read from a single chunk in my database. other chunks work fine. anything (select, vacuum, compress, ...) that i try to do on the chunk ends up in DataFileRead hanging. that process can't then even be killed and the whole database has to be restarted, which does not work without issues because of the hanging process.
TimescaleDB version affected
2.15.2
PostgreSQL version used
16.3
What operating system did you use?
gentoo linux
What installation method did you use?
Source
What platform did you run on?
Not applicable
Relevant log output and stack trace
How can we reproduce the bug?
as other chunks work fine, i suspect some corruption and i have no idea how to replicate it. the log shows what was going on there with the chunk during the last week. i just recall today i noticed a vacuum hanging on the chunk for two days or so, so i killed that. but the chunk was probably already broken at that time.
EDIT: i just found out that dropping the broken chunk and re-inserting the data that belong to the chunk should recreate that chunk.
EDIT2: i managed to get rid of the chunk though the database got stuck again. and i restored the data from backup. still checking the data...
The text was updated successfully, but these errors were encountered: