Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running influxd.exe backup -portable leaves <x>.tmp folders in data directory on Windows #16289

Closed
KoosBusters opened this issue Dec 19, 2019 · 19 comments · Fixed by #22551 or #22562
Closed

Comments

@KoosBusters
Copy link

Steps to reproduce:
List the minimal actions needed to reproduce the behavior.

  • Installation of Influxdb on Windows 2012 R2.
  • Run from elevated command prompt:
    influxd.exe backup -portable c:\backupdir

Expected behavior:
A backup of Influxdb in the c:\backupdir

Actual behavior:
A backup of Influxdb in the c:\backupdir, but also x.tmp folders in the "data" directory.
x increases for each backup you run.
The x.tmp folders contain a copy of the tsm files.

You cannot delete these folders while Influxdb is running (files are in use).
When you restart Influxdb the folders are automatically cleaned/removed.

influxdb

influxdb2

@russorat
Copy link
Contributor

russorat commented Jan 3, 2020

@KoosBusters thanks for the issue. My guess is that these get cleaned up eventually and restarting influxdb just kicks that process off sooner. Are these .tmp folders causing issues for you and if so, how?

@KoosBusters
Copy link
Author

KoosBusters commented Jan 3, 2020

@russorat thanks for looking at this issue.

We run backups daily for the full database dat and move the backup files to off-site backup location. But the temp folders are never removed during normal operations of the database and are only removed when we restart influxdb. This means that after running for about 2-3 weeks without restart the disk space of the server is running low because all the temp folders/files are several times larger in disk space than the data of the database itself. A restart of influxdb will fix this, but it does mean that we have to restart our production environment at regular intervals.

Additional information that might help:
Our normal production load includes saving near real time data to the database every second, so if the database is waiting for a moment without activity it will not find that moment during our normal operations.

We do not have any retention set in influx itself so there is no automatic deletion of any data. So if the temp. cleanup is included in the retention process we might not have this process running in the db.

ydkn added a commit to ydkn/influxdb that referenced this issue Jun 2, 2020
run cleanup after snapshot creation
if no parallel snapshot is in progress

fixes issue influxdata#16289
ydkn added a commit to ydkn/influxdb that referenced this issue Jun 2, 2020
run cleanup after snapshot creation
if no parallel snapshot is in progress

fixes issue influxdata#16289
@gboor
Copy link

gboor commented Oct 21, 2020

I can confirm that this issue is not Windows-specific, even though it's tagged os/windows. I just got burned by this on an Influx instance running in a Kubernetes cluster on Linux.

@stelb
Copy link

stelb commented Oct 22, 2020

I have the same problem on linux, .tmp directories are not at all cleaned up aafter startup, only restart does.
At least 1.8.2 and 1.8.3 are affected.

@cbonte
Copy link
Contributor

cbonte commented Jan 8, 2021

Being affected by the issue, I've made some tests and it seems it's fixed in the upcoming 1.8.4 version.
Commits related to the fix:

  • dfa6aa8 fix(tsm1): "snapshot in progress" error during backup
  • 07a9c0e fix(tsm1): "snapshot in progress" error during backup
  • 196f600 fix(tsm1): "snapshot in progress" error during backup
  • 0b1ee04 fix(tsm1): "snapshot in progress" error during backup

@oori
Copy link
Contributor

oori commented Apr 24, 2021

Release 1.8.5 is supposed to resolve this. but, I've just tested it on Windows server, and it's still the same error/.tmp issue on portable backup.

@davidby-influx
Copy link
Contributor

davidby-influx commented Apr 26, 2021

@oori - Do you have any logs from the backup you could attach here to help? Thanks! Also, the directory location of the .tmp files would be helpful.

@oori
Copy link
Contributor

oori commented Apr 27, 2021

@davidby-influx thanks for the quick follow up!
This is a Windows Server 2019 (std), running InfluxDB 1.8.5. The backups are fine, just the .tmp filling up disk space.
Here's the backup log from last night:

ts=2021-04-25T20:30:02.380809Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\124\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\124\\2.tmp\\000000009-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:02.746288Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\126\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\126\\2.tmp\\000000009-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:03.091602Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\127\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\127\\2.tmp\\000000009-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:03.442681Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\128\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\128\\2.tmp\\000000009-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:03.801436Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\129\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\129\\2.tmp\\000000009-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:04.146619Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\130\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\130\\2.tmp\\000000010-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:04.536502Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\131\\1.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\131\\1.tmp\\000000001-000000001.tsm: Access is denied."
ts=2021-04-25T20:30:04.904410Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\7\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\7\\2.tmp\\000000074-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:05.303087Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\13\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\13\\2.tmp\\000000109-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:05.753939Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\21\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\21\\2.tmp\\000000109-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:06.177588Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\29\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\29\\2.tmp\\000000109-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:06.633096Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\37\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\37\\2.tmp\\000000109-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:07.058710Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\45\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\45\\2.tmp\\000000110-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:07.465970Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\53\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\53\\2.tmp\\000000110-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:07.987648Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\61\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\61\\2.tmp\\000000111-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:08.526707Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\69\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\69\\2.tmp\\000000112-000000003.tsm: Access is denied."
ts=2021-04-25T20:30:08.916714Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\77\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\77\\2.tmp\\000000110-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:09.300913Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\85\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\85\\2.tmp\\000000126-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:09.706904Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\93\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\93\\2.tmp\\000000114-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:10.107110Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\101\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\101\\2.tmp\\000000115-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:10.510890Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\109\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\109\\2.tmp\\000000119-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:10.978031Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\117\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\117\\2.tmp\\000000113-000000002.tsm: Access is denied."
ts=2021-04-25T20:30:11.488333Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\125\\2.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\GEMS\\autogen\\125\\2.tmp\\000000032-000000003.tsm: Access is denied."
ts=2021-04-26T20:30:02.062000Z lvl=warn msg="backup could not remove temporary snapshot directory" log_id=0TiNbayG000 engine=tsm1 path=c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\124\\3.tmp error="remove c:\\GEMS\\data\\influxdb\\data\\_internal\\monitor\\124\\3.tmp\\000000009-000000002.tsm: Access is denied."

@davidby-influx
Copy link
Contributor

@oori - So this seems to be either a Windows directory permissions problem, or a file-left-open problem. We have other code where we have to explicitly close files on Windows that Linux permits operations like rename on.

The goodish new is that this probably isn't a Linux problem anymore (i.e. our fix was correct algorithmically in finding and trying to delete the files), but only a Windows problem. We will triage this and drop it in the work queue appropriately.

@sonejostudios
Copy link

sonejostudios commented Aug 26, 2021

I think I've got a similar issue on Linux with Influx 1.8.0, but in my case it is NOT a permission problem.
https://community.influxdata.com/t/data-size-questions-influx-much-bigger-than-raw-data/21355/6

Restarting influx did the job! I will upgrade and hope this will work without restarting.

@davidby-influx
Copy link
Contributor

@sonejostudios - this has been fixed on Linux for 1.8.5 and later builds.

@sonejostudios
Copy link

sonejostudios commented Aug 27, 2021

@davidby-influx Ok, thanks for the info!

I upgraded to 1.8.9 but got a starting script issue which I could solve with this thread: https://community.influxdata.com/t/influxdb-1-8-7-will-not-start/20965/25

@lesam
Copy link
Contributor

lesam commented Sep 7, 2021

See also #16289

@lesam
Copy link
Contributor

lesam commented Sep 13, 2021

See also #21470 which has a potential suggested fix

@davidby-influx
Copy link
Contributor

@lesam - Is the delete failure of the snapshot hard links is because the file is still open through the original locations by the Influx engine? In Windows, DeleteFile() fails if a file is open, which is a property of the file, not the directory entry through which it was opened. Only if a file was created with the dwShareMode parameter including the FILE_SHARE_DELETE flag can a process delete a file when another process has it open.

See here for a discussion of FILE_SHARE_DELETE support in Go (tldr; not likely anytime soon).

@davidby-influx
Copy link
Contributor

@lesam and I discussed options to fix this, as well as other problems that this may be causing. For instance, this may cause deletion failures when the compactor attempts to delete a file that is also linked in a temp directory and opened by backup.

Initially brainstormed options to fix this:

  • use copy instead of hard linking to preserve snapshots of files. Imposes large costs on file copy, disk space, etc. Two options:
    • Uniform on all platforms,
    • Only on Windows.
  • Return open file handles or TSMReaders from snapshots, rather than file names. Also increase the reference count on the files during backup. This will keep the files from deletion during backup. We will need to make sure other engine code, notably the compactor, is not blocked on this in any way, and that file deletes requested by the engine are eventually carried out when the backup is complete.
  • Create our own Windows FileIO package that supports FILE_SHARE_DELETE. This has been one strategy recommended in the Go community (see above discussions in the golang forums). The downside is maintenance and versioning.

@davidby-influx
Copy link
Contributor

davidby-influx commented Sep 17, 2021

Thinking aloud:

The second option above, returning open file handles, will require modifications to the Influxdata tar package, which operates on filenames. The tar.Stream, tar.StreamFile, tar.SinceFilterTarFile, and tar.StreamRenameFile will all require revision. Perhaps returning file handles and file names would minimize the changes, allowing current code to filter with the names, but avoiding the calls to open them.

@davidby-influx
Copy link
Contributor

@lesam - Evaluation of the options:

  1. Copy is clean, but expensive. A few lines of code change, possibly only on Windows, and everything should work.
  2. Handing back TSMReaders and keeping the reference count incremented during back-up is going to hang the compactor until the back-up is done. While I originally favored this option, the complexity and testing required for the change is daunting.
  3. Use a customer OpenFile that sets FILE_SHARE_DELETE; while simple, the maintenance burden and opportunity for future developer error is high. Obviously, support for this as an option in Go would be the best solution, but this does not seem likely.

davidby-influx added a commit that referenced this issue Sep 21, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289
davidby-influx added a commit that referenced this issue Sep 21, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289
danxmoran pushed a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289
@davidby-influx davidby-influx linked a pull request Sep 22, 2021 that will close this issue
3 tasks
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)
davidby-influx added a commit that referenced this issue Sep 22, 2021
fix(restore): enforce the -db parameter when -newdb used

closes #15901

(cherry picked from commit 1dde65b)

closes #22560

fix: for Windows, copy snapshot files being backed up

On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22559
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22557
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22557
davidby-influx added a commit that referenced this issue Sep 22, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22557

(cherry picked from commit 47007f6)
davidby-influx added a commit that referenced this issue Sep 22, 2021
fix(restore): enforce the -db parameter when -newdb used

closes #15901

(cherry picked from commit 1dde65b)

closes #22560

fix: for Windows, copy snapshot files being backed up

On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22559
davidby-influx added a commit that referenced this issue Sep 22, 2021
… (#22563)

On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

closes #22557

(cherry picked from commit 47007f6)

closes #22558
davidby-influx added a commit that referenced this issue Oct 14, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)
davidby-influx added a commit that referenced this issue Oct 14, 2021
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes #16289

(cherry picked from commit 3702fe8)

Closes #22556
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment