Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examine index recovery #17205

Open
wants to merge 3 commits into
base: v13/contrib
Choose a base branch
from

Conversation

kows
Copy link

@kows kows commented Oct 7, 2024

I think this flag (tryFixMainIndexIfCorrupt) was forgotten to be activated to enable the newly implemented index recovery?

#16163
https://github.com/Shazwazza/Examine/blob/release/3.0/src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs#L76

UPDATE: only tested this by replacing this activation in a startup of a clients project.
Was able to inject it into a controller and saw the flag was true then so construction should be fine.

Copy link

github-actions bot commented Oct 7, 2024

Hi there @kows, thank you for this contribution! 👍

While we wait for one of the Core Collaborators team to have a look at your work, we wanted to let you know about that we have a checklist for some of the things we will consider during review:

  • It's clear what problem this is solving, there's a connected issue or a description of what the changes do and how to test them
  • The automated tests all pass (see "Checks" tab on this PR)
  • The level of security for this contribution is the same or improved
  • The level of performance for this contribution is the same or improved
  • Avoids creating breaking changes; note that behavioral changes might also be perceived as breaking
  • If this is a new feature, Umbraco HQ provided guidance on the implementation beforehand
  • 💡 The contribution looks original and the contributor is presumably allowed to share it

Don't worry if you got something wrong. We like to think of a pull request as the start of a conversation, we're happy to provide guidance on improving your contribution.

If you realize that you might want to make some changes then you can do that by adding new commits to the branch you created for this work and pushing new commits. They should then automatically show up as updates to this pull request.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@mikecp
Copy link
Contributor

mikecp commented Oct 8, 2024

Hi @kows ,

Thank you for spotting this issue and providing a fix 👍
I noticed that you provided the fix on the V13/dev branch, while we expect contributions from the community to be pushed to our contrib branch. It would it be great if you could adapt this, and then someone from the core collaborators team will have a look at it 😊

Cheers!

@kows kows force-pushed the bugfix/examine-index-recovery branch from 7a315a0 to d0530fc Compare October 8, 2024 10:41
@kows kows changed the base branch from v13/dev to v13/contrib October 8, 2024 10:42
@kows kows force-pushed the bugfix/examine-index-recovery branch from da458ef to 4f0f2a9 Compare October 8, 2024 10:44
s.GetRequiredService<IApplicationRoot>().ApplicationRoot,
s.GetRequiredService<ILockFactory>(),
s.GetRequiredService<ILoggerFactory>(),
true,
Copy link
Contributor

@bielu bielu Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be configurable, as this flag might affect general performance of site if any of indexes gets corrupted.
Just to add details, it is based on this:
https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/CheckIndex.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe so, but in its current state an application is stuck in a faulty state so I believe this should be on by default.
The 13.5 release with update of Examine actually changes nothing since the highly wanted fix is disabled.

Umbraco.CMS.Examine.EnableRecovery (default on) or any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kows I am happy now with it :)

@readingdancer
Copy link
Contributor

@kows & @bielu - We also have the same indexing issue with U10 and @Shazwazza told me at the US Festival last week that I would need to install Examine 3.3.0 which we have done.

Are you saying that his fix will not actually be working without this flag and is it possible to port this change back to U10,11 & 12 so it works for everyone as I know others are continuing to have this issue.

@kows
Copy link
Author

kows commented Oct 9, 2024

@kows & @bielu - We also have the same indexing issue with U10 and @Shazwazza told me at the US Festival last week that I would need to install Examine 3.3.0 which we have done.

Are you saying that his fix will not actually be working without this flag and is it possible to port this change back to U10,11 & 12 so it works for everyone as I know others are continuing to have this issue.

You can actually replace it in your project if you want to test I think by just adding the same block in your startup after Umbraco registrations:

services.RemoveAll<SyncedFileSystemDirectoryFactory>();
services.AddSingleton<SyncedFileSystemDirectoryFactory>(
s =>
{
...
}

@Shazwazza
Copy link
Contributor

This was not forgotten, see https://shazwazza.com/post/an-examine-fix-for-umbraco-index-corruption/

There's also a new option to fix a corrupted index but this is not enabled by default since it can mean a loss of documents.

I understand that is subtle and will need to add docs on this. There is a reason why this is not enabled by default because it will mean a loss of documents. When it auto-fixes the index (which is already corrupted) the only way it can do that is by removing the documents that are not listed in known index files. Since we don't know why the index was corrupted (i.e. process termination, network issues, timing, etc....) we don't know if the auto fixing will be good or bad.

I can add more documentation to this actual class https://shazwazza.github.io/Examine/api/Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.html but essentially it jumps through a lot of hoops now which is documented here https://shazwazza.com/post/an-examine-fix-for-umbraco-index-corruption/

Turning this option on is not necessary, but it can help in the one circumstance where there is no recoverable option (i.e. neither the main or local index is healthy but one exists). By default, when there is no recoverable option and this flag is not turned on, it will just mean an index rebuild takes place.

Understanding the solution:
The SyncedFileSystemDirectoryFactory has been updated to:

  • Check the health of the main index if it exists ('slow drive').
  • Check the health of the local index if it exists ('fast drive').
  • If the main index is unhealthy or doesn't exist and the local index is healthy, it will synchronize the local index to the main index. This can occur only if a site hasn't moved to a new worker.
  • If the main index is unhealthy and the local index doesn't exist or is unhealthy, then it will delete the main (corrupted) index.
  • Once health checks are done, the index from main is always synced to local. If the main index was deleted due to corruption, this will mean that the local index is empty and an index rebuild will occur.
    This change will attempt to keep any healthy index that is available (main vs local), but if nothing can be read, the indexes will be deleted and an index rebuild will occur.

There's also a new option to fix a corrupted index but this is not enabled by default since it can mean a loss of documents.

@kows
Copy link
Author

kows commented Oct 9, 2024

This was not forgotten, see https://shazwazza.com/post/an-examine-fix-for-umbraco-index-corruption/

There's also a new option to fix a corrupted index but this is not enabled by default since it can mean a loss of documents.

I understand that is subtle and will need to add docs on this. There is a reason why this is not enabled by default because it will mean a loss of documents. When it auto-fixes the index (which is already corrupted) the only way it can do that is by removing the documents that are not listed in known index files. Since we don't know why the index was corrupted (i.e. process termination, network issues, timing, etc....) we don't know if the auto fixing will be good or bad.

I can add more documentation to this actual class https://shazwazza.github.io/Examine/api/Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.html but essentially it jumps through a lot of hoops now which is documented here https://shazwazza.com/post/an-examine-fix-for-umbraco-index-corruption/

Turning this option on is not necessary, but it can help in the one circumstance where there is no recoverable option (i.e. neither the main or local index is healthy but one exists). By default, when there is no recoverable option and this flag is not turned on, it will just mean an index rebuild takes place.

Understanding the solution:
The SyncedFileSystemDirectoryFactory has been updated to:

  • Check the health of the main index if it exists ('slow drive').
  • Check the health of the local index if it exists ('fast drive').
  • If the main index is unhealthy or doesn't exist and the local index is healthy, it will synchronize the local index to the main index. This can occur only if a site hasn't moved to a new worker.
  • If the main index is unhealthy and the local index doesn't exist or is unhealthy, then it will delete the main (corrupted) index.
  • Once health checks are done, the index from main is always synced to local. If the main index was deleted due to corruption, this will mean that the local index is empty and an index rebuild will occur.
    This change will attempt to keep any healthy index that is available (main vs local), but if nothing can be read, the indexes will be deleted and an index rebuild will occur.

There's also a new option to fix a corrupted index but this is not enabled by default since it can mean a loss of documents.

Then the described behavior is not working as intended as people are still reporting issues and I myself have had this occur with:

  • single app service
  • umbraco 13.5
  • Indexes corrupt after Azure internal machine move
  • I would've expected a rebuild here but site was functional with no index data

@Shazwazza
Copy link
Contributor

@kows did your flag to force recover the indexes work? I would need some help in replicating this and it needs to be done in tests. There are several tests and test cases for different scenarios for SyncedFileSystemDirectoryFactory:

https://github.com/Shazwazza/Examine/blob/release/3.0/src/Examine.Test/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactoryTests.cs

Any help is appreciated. I've put a lot of effort into making this work for Umbraco (this should exist in Umbraco, not Examine) but unfortunately never heard back from anyone regarding testing, etc... If we can replicate the issue in tests, than its an easy fix.

@Shazwazza
Copy link
Contributor

@kows stack traces are helpful, I've only seen a handful and they are all the same. Now, a new one has popped up #16163 (comment) which I've not seen before. Please keep the info on this thread #16163 regarding steps to replicate, stack traces, etc...

@kows
Copy link
Author

kows commented Oct 9, 2024

@kows did your flag to force recover the indexes work? I would need some help in replicating this and it needs to be done in tests. There are several tests and test cases for different scenarios for SyncedFileSystemDirectoryFactory:

https://github.com/Shazwazza/Examine/blob/release/3.0/src/Examine.Test/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactoryTests.cs

Any help is appreciated. I've put a lot of effort into making this work for Umbraco (this should exist in Umbraco, not Examine) but unfortunately never heard back from anyone regarding testing, etc... If we can replicate the issue in tests, than its an easy fix.

didn't notice this reply first.
I wasn't able to test that since it is a production site so the usual clear indexes + reboot has everything up and running again.
I can make a copy of the local umbraco\Data\TEMP\ExamineIndexes if it happens again if this aids in debugging.
The main index will be rough due to Kudu tools running separate (SCM separation flag), changing it reboots so potentially causes a rebuild already.

@mikecp
Copy link
Contributor

mikecp commented Oct 11, 2024

Great constructive conversation going on here, thanks all!! 👍👍

I cross-checked this with HQ, and they would prefer the default value to be set on false, so it would be great if you could do that small update @kows 😁, and then it will be ready to merge!

What HQ also mentioned is that this fix will obviously come in the next V13 patch, but there is no specific timeframe defined for that yet.

In the meantime, they suggest the following workaround, I hope this will help (disclaimer: copy-pasting so no guarantee of it working at once 😅).

"Delete the registration of SyncedFileSystemDirectoryFactory and add a new one with the fix. That should be possible from a composer. Basically same lines as in the PR"

public class MyComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
    {
        builder.Services.RemoveAll<SyncedFileSystemDirectoryFactory>();
        builder.Services.AddSingleton<SyncedFileSystemDirectoryFactory>(
            s =>
            {
                var tempDir = UmbracoTempEnvFileSystemDirectoryFactory.GetTempPath(
                    s.GetRequiredService<IApplicationIdentifier>(), s.GetRequiredService<IHostingEnvironment>());
                return ActivatorUtilities.CreateInstance<SyncedFileSystemDirectoryFactory>(
                    s,
                    new object[]
                    {
                        new DirectoryInfo(tempDir),
                        s.GetRequiredService<IApplicationRoot>().ApplicationRoot,
                        s.GetRequiredService<ILockFactory>(),
                        s.GetRequiredService<ILoggerFactory>(),
                        true,
                    });
            });
    }
}

@Shazwazza
Copy link
Contributor

Hi everyone - some good news. I managed to replicate another issues which is fixed. I can't exactly replicate the one with the 'invalid deletion count' but I think it is directly related to the new fix.

I've published a beta here https://www.nuget.org/packages/Examine/3.3.1-beta.1

If anyone can please test, that would be great. This version will end up being 3.4.0 as it comes with performance improvements too and a small breaking change which reduces the default max results from 500 to 100 (if you need more, than use QueryOptions)

@nul800sebastiaan
Copy link
Member

Hi there @kows!

Just a hint: as you may know, this PR qualifies for Umbraco's Hacktoberfest participation for which you can earn rewards.
If you were to make 1 more contribution in the next few days, would be great to see your contributions qualify! You can do it.. 😉

@Shazwazza
Copy link
Contributor

Examine 3.4.0 has been published https://github.com/Shazwazza/Examine/releases/tag/v3.4.0

@umbrabot
Copy link

umbrabot commented Dec 4, 2024

Hi there @kows!

First of all: A big #H5YR for making an Umbraco related contribution during Hacktoberfest! We are very thankful for the huge amount of PRs submitted, and all the amazing work you've been doing 🥇

Due to the amazing work you and others in the community have been doing, we've had a bit of a hard time keeping up. 😅 While all of the PRs for Hacktoberfest might not have been merged yet, you still qualify for receiving some Umbraco swag, congratulations! 🎉

In the spirit of Hacktoberfest we've prepared some exclusive Umbraco swag for all our contributors - including you!
This year's swag is a custom designed notebook and custom Umbraco Hacktoberfest sticker:

image

As an alternative choice, you can opt-out of receiving anything and ask us to help improve the planet instead by planting a tree on your behalf. 🌳

Receive your swag or plant a tree! 👈 Please follow this link to fill out and submit the form, before December 25nd, 2024, 23:59:00 UTC.

Following this date we'll be sending out all the swag, but please note that it might not reach your doorstep for a few weeks/months, so please bear with us and be patient 🙏

The only thing left to say is thank you so much for participating in Hacktoberfest! We really appreciate the help!

Kind regards,
The various Umbraco teams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants