Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Get-DbaDbBackupHistory (Break in LSN chain) #7515

Open
1 of 4 tasks
saxet77 opened this issue Jul 8, 2021 · 44 comments
Open
1 of 4 tasks

[Bug]: Get-DbaDbBackupHistory (Break in LSN chain) #7515

saxet77 opened this issue Jul 8, 2021 · 44 comments
Assignees
Labels
bugs life pending team feedback Waiting for feedback from other maintainers/contributors

Comments

@saxet77
Copy link

saxet77 commented Jul 8, 2021

Report

Host used

  • powershell.exe
  • ISE
  • VS Code
  • Other (please specify)

Errors Received


WARNING: [21:10:15][Restore-DbaDatabase] Database TODELETE failed testing,  skipping

Steps to Reproduce

# Get backup history from the SourceInstance msdb database
$ListBackupFile = Get-DbaDbBackupHistory -SqlInstance Server1 -Database Database1 -last 
  
 # Generate the PIT restore script
 $RestoreScript = $ListBackupFile | Restore-DbaDatabase -SqlInstance Server2 -DatabaseName TODELETE -RestoreTime '2021-07-07 01:17:15' -WithReplace -OutputScriptOnly -ReplaceDbNameInFile -TrustDbBackupHistory;

$ListBackupFile
SqlInstance                 Database   Type TotalSize   DeviceType Start                   Duration End                    
-----------                 --------   ---- ---------   ---------- -----                   -------- ---                    

#### Expected Behavior

Restore Script provide by SSMS 'Restore/Database/Script'

Actual Behavior

Restore Script provide by DBATools (missing 1 log file)

Environmental information

🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
Please provide the output of the below script

Powershell Version = 5.1.14393.3866
dbatools lastest installed = 1.0.169
Culture of OS = 
LCID             Name             DisplayName                                                                                                                                                                                                                              
----             ----             -----------                                                                                                                                                                                                                              
1036             fr-FR            French (France)

SQL Server:

Microsoft SQL Server 2017 (RTM-CU23) (KB5000685) - 14.0.3381.3 (X64)   Feb  9 2021 12:08:50   Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows Server 2016 Datacenter 10.0 <X64> (Build 14393: ) (Hypervisor) 
us_english
@saxet77 saxet77 added bugs life triage required New issue that has not been reviewed by maintainers labels Jul 8, 2021
@saxet77
Copy link
Author

saxet77 commented Jul 8, 2021

Expected Behavior generate from the SSMS 'Database/Restore/Script'

Actual behavior generated by DBATools (seems missing 1 log file)

@saxet77 saxet77 changed the title [Bug] [Bug]: Get-DbaDbBackupHistory (Break in LSN chain) Jul 8, 2021
@saxet77
Copy link
Author

saxet77 commented Jul 8, 2021

Capture d’écran 2021-07-08 à 15 20 56

Include the msdb..backupset table extract for the database/log backup LSN.
DatabaseBackupLSN are the same for the backup database and the missing backup log.
Peharps it's the problem ...

@andreasjordan
Copy link
Contributor

@saxet77 - Can you please verify that you get the correct result from Get-DbaDbBackupHistory if you add -IncludeCopyOnly, so using $ListBackupFile = Get-DbaDbBackupHistory -SqlInstance Server1 -Database Database1 -last -IncludeCopyOnly?

@Stuart-Moore - Can you tell me more about this part?
https://github.com/sqlcollaborative/dbatools/blob/d81f07cc176102dda4c48ae620953d0f62ed3cb6/functions/Get-DbaDbBackupHistory.ps1#L340-L345
Especially the difference in the Where-Object: [bigint]$_.DatabaseBackupLSN -eq [bigint]$fullDb.CheckPointLSN.

@andreasjordan
Copy link
Contributor

About your situation:
The full backup finished after the log backup started, but before the log backup finished. That's why the missing log backup has a different DatabaseBackupLSN than the other log backup and then CheckpointLSN of the full backup.
Because of that, the missing log backup is not been taken into account.

@saxet77
Copy link
Author

saxet77 commented Jul 9, 2021

@andreasjordan, I will use -IncludeCopyOnly in my script.

You think it's need a fix on the Get-DbaDbBackupHistory cmdlet ?

@andreasjordan andreasjordan added pending team feedback Waiting for feedback from other maintainers/contributors confirmed Label to be used by maintainers that confirm a bug does exist for the given issue pending OP feedback Waiting for feedback from the OP of the issue and removed triage required New issue that has not been reviewed by maintainers pending team feedback Waiting for feedback from other maintainers/contributors labels Jul 9, 2021
@andreasjordan
Copy link
Contributor

So here the $FilteredBackupHistory (that is what is in your $backupinformation) is still including the "missing log backup":
https://github.com/sqlcollaborative/dbatools/blob/879021234ae8730f4550d75f1ee43397b6b673a7/functions/Restore-DbaDatabase.ps1#L705

Test-DbaBackupInformation is then building an internal $DbHistory which is passed to Test-DbaLsnChain and failes there because of the missing log backup.

Ok, I'll have a deeper look into Test-DbaBackupInformation...

@andreasjordan andreasjordan added pending team feedback Waiting for feedback from other maintainers/contributors and removed pending OP feedback Waiting for feedback from the OP of the issue labels Jul 9, 2021
@Stuart-Moore
Copy link
Contributor

@andreasjordan
DatabaseBackupLSN is the LSN of the last 'real' full backup taken before the current backup was taken.

CheckpointLSN is usuallly the same as FirstLSN and is the LSN to which the backup would be restored to. It's rare to not have them be the same but it can happen

Reading those 2 lines I can't remember why they're different. Again, i expect it was another edge case that cropped up

@saxet77 is it possible for you to share you backup history? Then we can wrap it into the tests to make it easier to play with/replicate

If you could just run:

| Export-CliXml -Path c:\somewhere\file.xml

and upload it to here, then we can rehydrate it and work with. If you want to obfuscate server and db names that's fine. Really just want the LSNs and the other metadata.

@wsmelton wsmelton removed pending team feedback Waiting for feedback from other maintainers/contributors confirmed Label to be used by maintainers that confirm a bug does exist for the given issue labels Jul 11, 2021
@Stuart-Moore
Copy link
Contributor

sorry, mucked up the Markdown in the last comment. Pleae could you upload:
$ListBackupFile | Export-CliXml -Depth 5 -Path c:\somewhere\file.xml

I can fake something if needed, but having 'real' data will make sure we have a proper test to stop it coming back again

@Stuart-Moore
Copy link
Contributor

Going go need a bit more info please, was concentrating on the restore side to much. Could you dump out the raw backup history for me please? Will need something like:
Get-DbaDbBackupHistory -sqlinstance x -database y -since '10/07/2021 21:10' | Export-CliXml -path -depth 5 ./some.xml

Sorry for the delay, time is being eaten up by a large migration atm.

@Stuart-Moore
Copy link
Contributor

Think i now have a fix for this, but it's failing a single test. Need to rebuild my local test rig to work out why.

@Stuart-Moore
Copy link
Contributor

Ok, so this is getting annoying. Passes fine locally:
image

But fails in Appveyor.

Trying to rdp in and it appears to not be liking the password. Is it not just the one in appveyor.yml anymore? (@andreasjordan or @potatoqualitee can you help here?)

@potatoqualitee
Copy link
Member

oh sorry, ill msg you on twitter, stuart

@Stuart-Moore
Copy link
Contributor

Gah, it's a bug with 2008r2! That's why it works fine for me.

For some reason 2008r2 returns 3 rows of history, others return 4. Both result sets are restoreable though, so that's all that really matters. will fix up tests and push it back up

@potatoqualitee
Copy link
Member

haha awesome work! 🚒

@Stuart-Moore
Copy link
Contributor

Yeah, that was confusing the heck out of me. The good news is that the filtering of all back up history is done by select-dbabackupinformation, so we should have consistency between get-dbadbbackuphistory and the restore stack.

@saxet77 Really sorry for the delay on this one. Would you be able to take this new branch for a spin and see if it fixes your issue. I think it should, but would be great to know it does.

@potatoqualitee
Copy link
Member

Thanks, @Stuart-Moore ! I'll wait for your PR to release 1.1.0 💯

@saxet77
Copy link
Author

saxet77 commented Jul 22, 2021

Hi @Stuart-Moore, no problem for the delay.
Not very familiar with GitHub "how can I take this new branch" to validate the fix on my side ?
Thks

@potatoqualitee
Copy link
Member

potatoqualitee commented Jul 23, 2021

hey @saxet77 - please close out powershell entirely then Import-module C:\wherever\to\the\dir\dbatools.psd1 -Force

without a force, it can look like the new module files are imported, but they arent.

@potatoqualitee
Copy link
Member

bummer. thank you for the files, tho. they may be able to help stuart 💯

@Stuart-Moore
Copy link
Contributor

Take 2. Would you mind taking this new version for a spin. It seems to work on the 'msdb' I've hacked together from the files, but need to know it works with a proper one please

https://github.com/sqlcollaborative/dbatools/archive/refs/heads/lsnbreak.zip

@saxet77
Copy link
Author

saxet77 commented Jul 27, 2021

Sorry ;(
Same error message with this new version.

@github-actions github-actions bot added the stale Stale Bot label used to identify issues that are no longer active and bot has closed them label Oct 26, 2021
@andreasjordan
Copy link
Contributor

@Stuart-Moore Do you have an eye on this?

@github-actions github-actions bot removed the stale Stale Bot label used to identify issues that are no longer active and bot has closed them label Oct 31, 2021
@dataplat dataplat deleted a comment from github-actions bot Nov 1, 2021
@andreasjordan
Copy link
Contributor

Do we still want to try to solve this? Or is the root cause a very rare situation and not worth the effort? Will close this in a few days if there are no dissenting votes.

@potatoqualitee
Copy link
Member

I'll defer to @Stuart-Moore

@github-actions github-actions bot added the stale Stale Bot label used to identify issues that are no longer active and bot has closed them label Mar 22, 2022
@dataplat dataplat deleted a comment from github-actions bot May 16, 2022
@potatoqualitee
Copy link
Member

sometimes stuart gets free cycles, let's hold off to closed this until that time comes up.

@andreasjordan andreasjordan added the pending team feedback Waiting for feedback from other maintainers/contributors label Oct 25, 2022
@andreasjordan
Copy link
Contributor

I will close thi now, but can reopen if needed.

@wsmelton
Copy link
Member

This is still a known issue and should not be closed.

@andreasjordan
Copy link
Contributor

andreasjordan commented Jun 23, 2023

I finally can reprodure the issue (update: only once and havn't saved it) and will start working on this.

Here is my script (that does not work):

$sqlInstance = 'SQL01'
$database = 'BackupTest7515'

$null = New-DbaDatabase -SqlInstance $sqlInstance -Name $database 
Invoke-DbaQuery -SqlInstance $sqlInstance -Database $database -Query "CREATE TABLE JustWasteSpace (id int IDENTITY PRIMARY KEY, space CHAR(100) DEFAULT 'SPACE')"
Invoke-DbaQuery -SqlInstance $sqlInstance -Database $database -Query "INSERT INTO JustWasteSpace DEFAULT VALUES"

$null = New-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Full"
$null = New-DbaAgentJobStep -SqlInstance $sqlInstance -Job "$database-Full" -StepName Backup -Subsystem TransactSql -Command "BACKUP DATABASE $database TO DISK='$database.bak'; INSERT INTO $database.dbo.JustWasteSpace DEFAULT VALUES"
$null = New-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Log"
$null = New-DbaAgentJobStep -SqlInstance $sqlInstance -Job "$database-Log" -StepName Backup -Subsystem TransactSql -Command "INSERT INTO $database.dbo.JustWasteSpace DEFAULT VALUES; BACKUP LOG $database TO DISK='$database.trn'"

$null = Start-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Full" -Wait
$null = Start-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Log" -Wait

# use 1..24 for a big database
1..10 | % { Invoke-DbaQuery -SqlInstance $sqlInstance -Database $database -Query "INSERT INTO JustWasteSpace SELECT space FROM JustWasteSpace" }

# Now we have a log of data to back up, both in the data file and the log file
# Goal is that the log backup starts while full backup is running and finishes after the full backup finished
$null = Start-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Full"
Start-Sleep -Seconds 1
$null = Start-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Log" -Wait
Start-Sleep -Seconds 1
$null = Start-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Log" -Wait


# get the backup history (complete and last)
$fullHistory = Get-DbaDbBackupHistory -SqlInstance $sqlInstance -Database $database 
$fullHistory | ogv

$lastHistory = Get-DbaDbBackupHistory -SqlInstance $sqlInstance -Database $database -Last
$lastHistory | ogv

$msdbHistory = Invoke-DbaQuery -SqlInstance $sqlInstance -Query "select backup_set_id, first_lsn, last_lsn, checkpoint_lsn, database_backup_lsn, backup_start_date, backup_finish_date, [type] from msdb..backupset where database_name = '$database'"
$msdbHistory | ogv



# try to use the last history to restore the database
$restore = $lastHistory | Restore-DbaDatabase -SqlInstance $sqlInstance -DatabaseName "$database-Restore" -ReplaceDbNameInFile


# cleanup
$null = Remove-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Full" -Confirm:$false
$null = Remove-DbaAgentJob -SqlInstance $sqlInstance -Job "$database-Log" -Confirm:$false
$null = Remove-DbaDbBackupRestoreHistory -SqlInstance $sqlInstance -Database $database -Confirm:$false
$null = Remove-DbaDatabase -SqlInstance $sqlInstance -Database $database -Confirm:$false
$null = Remove-DbaDbBackupRestoreHistory -SqlInstance $sqlInstance -Database "$database-Restore" -Confirm:$false
$null = Remove-DbaDatabase -SqlInstance $sqlInstance -Database "$database-Restore" -Confirm:$false

Goal is to have a situation where the end of the log backup is after the end of the full backup. This works, but it also needs to have a higher end-lsn than the full backup. And this does not work.

Have to stop for today, but will work again on this in the next day.

If anyone can help in setting up this situation, I would be glad to hear from you.

@andreasjordan
Copy link
Contributor

Capture d’écran 2021-07-08 à 15 20 56

Include the msdb..backupset table extract for the database/log backup LSN. DatabaseBackupLSN are the same for the backup database and the missing backup log. Peharps it's the problem ...

After spending some more time on this I think this picture is a key to understanding the problem.

Let me describe the situation and kind of shorten the LSN numbers for better readability:

  • The full backup starts at LSN 449 and ends at LSN 464, and has a checkpoint LSN 449 (!)
  • The first log backup starts at LSN 461 and ends at LSN 467 - so it overlaps the end of the full backup both in time and LSN. (I still can not reproduce this situation). As the database backup LSN it has the checkpoint LSN of a privious full backup. I agree with the author of this issue that this is the key, will go into detail later.
  • The second log backup starts at LSN 467 and ends at LSN 485. The database backup LSN is 449 and is equal to the checkpoint LSN of the full backup.

As @Stuart-Moore and also https://sqlbak.com/academy/backup-chain describe it, the database backup LSN tell us, to which full backup the log backup belongs. So this is coded in line 361 of Get-DbaDbBackupHistory: [bigint]$_.DatabaseBackupLSN -eq [bigint]$fullDb.CheckPointLSN. This is the reason that the first log backup is not beeing restored, although the last LSN is greater than the last LSN of the full backup and thus has transactions that need to be applied.

I managed to manipulate the database backup LSN of the first log backup in my lab and this way the first log backup is not returnd by the command. But I also tested with SSMS and SSMS still returns the first log backup in the list of needed restores. But SSMS only gets all the entries from msdb..backupset and so I don't see the code that returns the list. Maybe they only test based on first LSN and last LSN.

If we want to do it like SSMS, then we would need to speak with a nice person at Microsoft and ask for help.

Or we say that the database backup LSN of the first log backup is wrong (= it's a bug in SQL Server - that we can not reproduce) and our code is right.

Maybe this helps the other contributors (@Stuart-Moore / @wsmelton / @potatoqualitee ) to decide what to do.

@andreasjordan andreasjordan removed the stale Stale Bot label used to identify issues that are no longer active and bot has closed them label Jun 26, 2023
@andreasjordan
Copy link
Contributor

One more update: I could reproduce half of the situation: The first log backup has a lower database backup LSN than expected, but the same end LSN as the full backup. So this log backup is not needed for the restore.

As expected dbatools only uses the second log backup and can restore that without problem. SSMS uses first log backup as well, which is not needed and only extra time spend - but without any error, so use would not notice.

Here are the screenshots:
image

image

image

So do we want the same filtering as SSMS? Maybe they filter on last LSN of log backup must be greater or equal than last LSN of full backup.

@wsmelton
Copy link
Member

wsmelton commented Jun 26, 2023

You can trace the t-sql that SSMS is executing if you can get the same backup result set. Microsoft is not hiding it from view.

I think it would interesting to know what query filter they have. If it doesn't show one then they are doing it in the application code and it is hidden logic.

@andreasjordan
Copy link
Contributor

I did start a trace, but found nothing useful. So they do it in the application. Which makes sense as I can select and de-select individual backups. So they only get the complete history once and then slice and dice in the GUI.

@andreasjordan
Copy link
Contributor

Let me link this blog post as it gives some insight as well: https://www.brentozar.com/archive/2015/12/what-happens-to-transaction-log-backups-during-full-backups/

@katamoto75
Copy link

katamoto75 commented Aug 22, 2024

We have the same issue with the missing last log file in Get-DbaDbBackupHistory. This is because we offloaded backups to our secondary replica and had to use copy_only for full backups. In this case, DatabaseBackupLSN never updates and is always less than CheckpointLSN. I discovered this while using Restore-DbaDatabase with -RestoreTime and not getting all the expected data.
This can be easily reproduced by performing a copy only full backup followed by a log backup. There's actually no need for an AG

@AdamJKoehler
Copy link
Contributor

AdamJKoehler commented Sep 10, 2024

We have this same problem where a log backup that is needed for restore is tagged as a break in the LSN Chain. Restore-DbaDatabasefails this process, but if we go through the process described in Invoke-DbaAdvancedRestoreand use the pipeline for the restore, it works without a problem. I found the problem to be down in Test-DBALsnChainon line 99. It needs additional checks to see if the FirstLSN of the log backup being checked is greater than the FULL backup's CheckPointLSN.

Our scenario is just like what's referenced in Brent Ozar's link above.

@AdamJKoehler
Copy link
Contributor

AdamJKoehler commented Sep 11, 2024

We have this same problem where a log backup that is needed for restore is tagged as a break in the LSN Chain. Restore-DbaDatabasefails this process, but if we go through the process described in Invoke-DbaAdvancedRestoreand use the pipeline for the restore, it works without a problem. I found the problem to be down in Test-DBALsnChainon line 99. It needs additional checks to see if the FirstLSN of the log backup being checked is greater than the FULL backup's CheckPointLSN.

Our scenario is just like what's referenced in Brent Ozar's link above.

@wsmelton @potatoqualitee @Stuart-Moore @andreasjordan fyi..

#9467 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugs life pending team feedback Waiting for feedback from other maintainers/contributors
Projects
None yet
Development

No branches or pull requests

7 participants