Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 #16163

Open
konius opened this issue Apr 26, 2024 · 55 comments
Open
Labels

Comments

@konius
Copy link

konius commented Apr 26, 2024

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

11.3.1

Bug summary

Examine index gets corrupt and can't view or manage the Examine dashboard and any content trying to read index for display purpose becomes empty.

Happens on version 11.3.2, but also on 13.1.1 with the only solution to complete delete Examine folder and restart the application.

Issue is already discussed on Our.

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 (resource: BufferedChecksumIndexInput(SimpleFSIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_vd")))
   at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName)
   at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists)
   at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
   at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Umbraco.Cms.Infrastructure.Examine.ConfigurationEnabledDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass1_0.<.ctor>b__0()
   at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
--- End of stack trace from previous location ---
   at System.Lazy`1.CreateValue()
   at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable`1 valueSets, CancellationToken cancellationToken)
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass49_0.<PerformIndexItems>b__0()
   at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass73_0.<QueueTask>b__0(Task x)

image

Specifics

For an unknown reason the index gets corrupt and bricks the back office dashboard.

Application is hosted on Azure and config is applied as per this guide: https://docs.umbraco.com/umbraco-cms/v/10.latest-lts/fundamentals/setup/server-setup/azure-web-apps

Steps to reproduce

N/A

Expected result / actual result

Expected to be able to at least view the dashboard and rebuild indexes if they get corrupt.

Copy link

Hi there @konius!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

  • We'll assess whether this issue relates to something that has already been fixed in a later version of the release that it has been raised for.
  • If it's a bug, is it related to a release that we are actively supporting or is it related to a release that's in the end-of-life or security-only phase?
  • We'll replicate the issue to ensure that the problem is as described.
  • We'll decide whether the behavior is an issue or if the behavior is intended.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@shallett-ghd
Copy link

We've been dealing this issue with Umbraco 10.6.1. It has been impossible to replicate in development environments but does present itself in large traffic client sites with a lot of content.

@Migaroez
Copy link
Contributor

Might be related to
#15783

@TQ-Benji
Copy link

TQ-Benji commented May 2, 2024

Had similar issue on Umbraco Cloud using 10.8.5. Work around solution was to delete indexes via kudu and restart site - not ideal

@sytexa-julia
Copy link

Just started seeing this after we rolled out our upgrade to 13.x. Strangely it's been working fine in App Service for over a year.

@marius-ruhrmann-syzygy
Copy link

Same problem here.

  • Hosted on Azure in windows environment
  • Member index is corrupted
  • Quick fix from @TQ-Benji works, but it would be cool to have a fix very soon!

@kows
Copy link

kows commented May 15, 2024

still happens on 13.3.0, single instance app service.

@readingdancer
Copy link
Contributor

readingdancer commented May 17, 2024

We have the same issue on a client site running on 10.8.5

One thing I thing that would be a great improvement would be for the Umbraco back office to actually respond when an error like this occurs. I did look into this the last time it happened and either the API just doesn't reply, or it replies with an error. Either way, the UI just continues to look like it's waiting to load the page. It would be much better if this was handled and an appropriate error message was displayed, like:

It looks like your indexes are _____ed ( broken ) , the best thing to do is log into Kudu and delete them all and restart your server, or if you are not technical, contact your technical contact and ask them to do it ( again ) for you.

Or words to that effect ;-)

@sniffdk
Copy link
Contributor

sniffdk commented May 21, 2024

We are running into this problem as well on a 13.1.0 installation running on a single Azure App Service.
The error seems to start at random, a deletion of the examine folder and a site restart fixes it, but not for long.

We can't even use the api to rebuild the index, as that will throw the same "invalid deletion count" error.

This should definitely have a higher priority in getting fixed 🙏

@benbarnett02
Copy link

Just adding my 2 cents - Same issue continuing on 13.3.2.

Running on Azure App Service (free plan), Azure SQL, Azure storage account for media & imagesharp stuff.

@Marcin-Niznik
Copy link

same problem Umbraco v12.3.10 on Umbraco Cloud, ContentDeliveryAPI Index gets corrupted.

@AdamKronquistToxic
Copy link

We have a similar problem on Umbraco 8, 10 and 13.
The website's search function stops working due to a corrupt index.

When searching
Searchfield

Response
When searching

Trying enter Examine Management
Trying enter Examine Management

Logs for Examine Management
Logs for Examine Management

The solution is to delete the TEMP-folder in Kudo, restart the project and rebuild InternalIndex.

It's a quickfix but is not a long-term solution. For some of our customers, the problem recurs at weekly intervals.
1. Warning.txt
2. Error.txt
3. Error.txt
Error in Log Viewer when searching.txt

@sputnik-liam
Copy link

Experiencing this constantly running Umbraco 13.3.2 deployed to Azure using recommended config.
Lucene.Net.Index.CorruptIndexException: invalid deletion count: 171 vs docCount=1

It's especially troublesome as we don't have direct access to every customers infrastructure.

@seanrockster
Copy link

We're having this issue as well, it seems to be random. Had it a few weeks ago however removing the index files and restarting the webapp seemed to resolve it. However we're having the same issue. We've release some features and updates (same umbraco version 12.3.7) and we're suffering again. Removing the index files and restarting has had no affect.

Unfortunately our client relies on the member search because we have a custom index and searcher with member properties they need to search on - the standard searcher doesn't search on custom properties btw.

We're seeing different errors.

  1. we're seeing the issue mentioned in this ticket
  2. we're seeing log errors where examine is actually looking for filenames that do not exist, e.g. its looking for _1k.si and the membersindex folder doesn't have that file name, its _1q.si

System.IO.FileNotFoundException: Could not find file 'C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\_1k.si'. File name: 'C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\_1k.si' at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options) at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable1 unixCreateMode)
at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable1 unixCreateMode) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share) at Lucene.Net.Store.MMapDirectory.OpenInput(String name, IOContext context) at Lucene.Net.Store.Directory.OpenChecksumInput(String name, IOContext context) at Lucene.Net.Codecs.Lucene46.Lucene46SegmentInfoReader.Read(Directory dir, String segment, IOContext context) at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName) at Lucene.Net.Index.SegmentInfos.FindSegmentsFileAnonymousClass.DoBody(String segmentFileName) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) --- End of stack trace from previous location --- at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf) at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s) at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory) at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Umbraco.Cms.Infrastructure.Examine.ConfigurationEnabledDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s) at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory) at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass1_0.<.ctor>b__0() at System.Lazy1.ViaFactory(LazyThreadSafetyMode mode)
--- End of stack trace from previous location ---
at System.Lazy1.CreateValue() at Examine.Lucene.Providers.LuceneIndex.GetLuceneDirectory() at Examine.Lucene.Providers.LuceneIndex.IndexReady() at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable1 valueSets, CancellationToken cancellationToken)
at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass49_0.b__0()
at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass73_0.b__0(Task x)`

@seanrockster
Copy link

Deleting all of the examine indexes and restarting the webapp restored the indexes correct.

@sniffdk
Copy link
Contributor

sniffdk commented Jun 20, 2024

@seanrockster good to hear

But this seem to be a very common problem, would love to get some kind of HQ feedback here 😅

@sputnik-liam
Copy link

@seanrockster this is a known quick fix, but doesn't solve the underlying problem that has been a documented issue for a long time and theres been little to no formal communication from Umbraco on this.

@sniffdk
Copy link
Contributor

sniffdk commented Jun 20, 2024

Tagging @nul800sebastiaan 🙈

@authoritymikael
Copy link

authoritymikael commented Jun 24, 2024

We got this on Umraco Cloud and Cms 9.5.4. Deleting the examine indexes did it. But waiting for a permanent fix.

@AdamKronquistToxic
Copy link

We have experienced this on 8.18.14, 10.8.6 and 13.3.2.

@kevinstampe
Copy link

kevinstampe commented Jun 25, 2024

Also happening on 13.3.1, hosted in azure app service with azure sql db

EDIT:
Trying out this
Shazwazza/Examine#382 (comment)

@Migaroez
Copy link
Contributor

Hey y'all, we are aware that these issues exist, but we have strong suspicions these things are popping up due to misconfiguration of azure web apps/load balancing/slot swapping. To be able to determine whether these issues are related to misconfiguration/bugs in examine/bugs in Umbraco, we are trying to build a troubleshooting guide, this will take some time still. In the meantime I advice you to read up on the issue @kevinstampe linked over on the examine repo (Shazwazza/Examine#382 (comment))
and the docs concerning azure/examine
https://docs.umbraco.com/umbraco-cms/fundamentals/setup/server-setup/load-balancing/azure-web-apps#lucene-examine-configuration
https://docs.umbraco.com/umbraco-cms/fundamentals/setup/server-setup/azure-web-apps#what-are-azure-web-apps

You can check a few of these configurations with @warrenbuckley's RuntimeValidators
https://github.com/Gibe/Umbraco.Community.RuntimeValidators

Hope to get back to you soon.

@sniffdk
Copy link
Contributor

sniffdk commented Jun 25, 2024

@Migaroez As @paulsterling writes here: Shazwazza/Examine#382 (comment) this is also an issue on a default Umbraco Cloud v. 13.4.0 configuration.
From your statement, that would mean, that the build-in configuration in Umbraco Cloud is faulty as well?

@eqtr-ab
Copy link

eqtr-ab commented Jul 3, 2024

+1 for another site hosted in Umbraco Cloud encountering this same problem, Umbraco v13.3.2.

@mbogunovic
Copy link

Okay everyone, as @Shazwazza mentioned on Shazwazza/Examine#382. This could be due to:

  1. Examine config being set to default here https://docs.umbraco.com/umbraco-cms/reference/configuration/examinesettings and we are all trying to use multi-instance load-balancer setup
  2. I will try to apply and if I never come back to this post, means it works... 👯 "Examine": {
    "LuceneDirectoryFactory": "TempFileSystemDirectoryFactory"
    }
  3. IN THEORY: This should indicate the examine to to create different cache path taking into an account hosting environment name so indexing doesn't get mixed up between two environment causing errors and inconsistencies.

Cheers!

@seanrockster
Copy link

We've upgraded to 13.4 and we are still having this issue. This is impacting our business, not good. @UmbracoHQ

@seanrockster
Copy link

Its logging this error every second, and the members index refuses to rebuild.

System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_3'.
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable1 unixCreateMode) at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable1 unixCreateMode)
at Lucene.Net.Store.Directory.Copy(Directory to, String src, String dest, IOContext context)
--- End of stack trace from previous location ---
at Lucene.Net.Store.Directory.Copy(Directory to, String src, String dest, IOContext context)
at Lucene.Net.Replicator.IndexReplicationHandler.RevisionReady(String version, IDictionary2 revisionFiles, IDictionary2 copiedFiles, IDictionary`2 sourceDirectory)
at Lucene.Net.Replicator.ReplicationClient.DoUpdate()
at Lucene.Net.Replicator.ReplicationClient.ReplicationThread.Run()

@seanrockster
Copy link

@Shazwazza that didn't work and brought the site down with some error - i'll check the log for the exact error but had to revert to TempFileSystemDirectoryFactory

@seanrockster
Copy link

Not sure why but i think the db was timing out and that was the cause of the 500, i'm getting a timeout when I run the app via kudu. However after 4-5 restarts the app starts again. That did not happen on v12 as we restarted if often to rebuild the member indexes. The azure db is S6 400dtu and it smashes it, maxes the DTUs while starting.

@Shazwazza
Copy link
Contributor

@seanrockster Yes there is a reason why the DB will get smashed. I've spoken about this issue at length in a couple Code Garden presentations and is also why I created ExamineX so that you can have your indexes persisted in a managed service instead of in local Lucene files, this avoids index rebuilds, issues with Lucene files in Azure, etc...

To re-cap this #15783 (comment):

The 'Synced' directory is there to avoid performance implications of rebuilding indexes on startup when a site is moved to another worker in Azure. That is the only reason it exists, and it can only be used on your Primary node. If you aren't load balancing and don't care about this performance hit, than change it to Temp (local only storage). If you think that there isn't any overhead, think about this: If you scale out to +5 nodes in a load balancing setup, that means that 5x nodes will be performing index rebuilds around the same time, this means that your DB is going to get pummeled by queries to build all of those new indexes. The performance hit isn't the index building - it is the DB queries and this can lead to DB locks and lead to the dreaded SQL Lock Timeout issue in the back office. Plus, if search is critical to your front-end, than for a while after your site has started up, there won't be any index which means there won't be any search until the background processing is done. Many of these reasons is why ExamineX was created.

  • UmbracoTempEnvFileSystemDirectoryFactory was introduced in Umbraco here Add custom Examine FileSystemDirectoryFactory using Umbraco SiteName #15571 (not sure what version that is released in), else if you are on an older version than TempFileSystemDirectoryFactory can be used. This will result in indexes being rebuilt anytime your site is moved to another worker (see above)
  • The SyncedFileSystemDirectoryFactory attempts to work around this challenge where indexes are just local files. It attempts to syncronize a copy of the indexes in temp storage to main file storage on azure so that when a site is moved to another worker, it can sync from main storage to local storage to work from in order to avoid the index rebuild.

The problem with SyncedFileSystemDirectoryFactory is that this implementation doesn't take into account what happens if the index files in your main storage become corrupted which can happen for a number of reasons - misconfiguration, network latency, process termination, etc... I've been helping the Umbraco team with a fix for this and will publish an Examine release for that next week.

@sniffdk
Copy link
Contributor

sniffdk commented Jul 19, 2024

Ohh my, this is such a mess 😬
I do look forward to a SomewhatFixedUmbracoSyncTempEnvFileSystemDirectoryFactory or whatever naming is gonna rival the best Microsoft can come up with 😄

@Shazwazza thank you for taking the time to help out on this 🙏
Even though this really is a responsibility of HQ, as I see it. They ship a product, that is clearly broken even when running on their own infrastructure.

Perhaps ExamineX should have a better bundling option on Cloud setups going forward 🤔

@paulsterling
Copy link
Contributor

paulsterling commented Jul 19, 2024

Yes...that's my vote > to offer Examine X as part of Umbraco Cloud! It does solve the fundamental issues with Lucene and Azure App Services. Whatever is done with current Examine will be a workaround at best.

I've tagged that as a request in umbraco/Umbraco.Cloud.Issues#110

@Shazwazza
Copy link
Contributor

The updates for SyncedFileSystemDirectoryFactory can be found here Shazwazza/Examine#387. Essentially, this will just allow the site to recover if the main or local index has become corrupt for whatever reason. It may mean that index rebuilds occur in these scenarios but at least the site will bootup. I've also added an options (not enabled by default) to run a repair on the index. This may result in documents being deleted, but in some cases, those documents may have been legitimate deletions before the index was committed. That would mean that index rebuilds don't occur but potentially document loss in the index - which is why I'll keep it optional. Hoping to get a release out this week.

@ProudSebastiaan
Copy link

@Shazwazza I see it hasn't been released yet, do you have an update as to when you expect to have this fix deployed? We have a customer that has cbeen having corrupt index errors for over a year now without us being able to fix the problem. Lots of headache. This would be a lifesaver for us.

@Shazwazza
Copy link
Contributor

Examine 3.3.0 has been published, release notes are here https://github.com/Shazwazza/Examine/releases/tag/v3.3.0

@Mikkel-Veyhe-jepsen
Copy link

We're still experiencing this issue having just updated to U13.5.1, our understanding was that Examine 3.3.0 was part of Umbraco 13.5.
Site hostet in Azure, no load balancing, using the SyncedTempFileSystemDirectoryFactory. We had internal, external and a custom siteSearch index fail this morning.
Quickfix of deleting indexes and restarting still works to fix it temporarily.

@sniffdk
Copy link
Contributor

sniffdk commented Oct 2, 2024

Hi @AndyButland
This issue is not resolved yet.
We are still experiencing this issue on Umbraco 13.4.1 and Examine 3.3.0 🤷

This is the error we see in the log:
image

Or is that because we use the SyncedTempFileSystemDirectoryFactory as opposed to the SyncedFileSystemDirectoryFactory mentioned in @Shazwazza 's post ?

Edit: I see now, going through the code, that setting the "LuceneDirectoryFactory" app setting to "SyncedTempFileSystemDirectoryFactory" does indeed actually end up using the SyncedFileSystemDirectoryFactory 👍

@AndyButland
Copy link
Contributor

I agree... sorry, it seems I inadvertently closed it when referencing this issue in an related PR from another project.

@InfiniteSpirals
Copy link

Not sure if this will help anyone, but on investigating our own instance of this issue, I stumbled across the fact that the Azure app service plan had been set up with Zone Redundancy, which requires a minimum of 3x instance count. This wasn't made clear when provisioning the service - so the production site was running on multiple instances without proper load balancing configuration. I'm sure this contributed in some way to all the indexing issues we were having. I wonder if others are running on multiple workers without knowing. Checking the umbracoServer table for active instances would show this if so.

@Shazwazza
Copy link
Contributor

@sniffdk That is a different stack trace than any previous one. invalid deletion count is new to me. If you have a way to replicate this please send me the info. (i.e. a copy of your indexes)

@kows
Copy link

kows commented Oct 9, 2024

@Shazwazza as requested.

My trace after Azure machine move:
Starts with
"MembersIndex" index is corrupt, a new one will be created

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 (resource: BufferedChecksumIndexInput(MMapIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_eos")))
   at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName)
   at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists)
   at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
   at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.GetIndexWriter(Directory mainDir, OpenMode openMode)
   at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.TryGetIndexWriter(OpenMode openMode, Directory luceneDirectory, Boolean createNewIfCorrupt, String indexName, IndexWriter& indexWriter)

Then
An error occurred processing the index batch.

Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 (resource: BufferedChecksumIndexInput(MMapIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_eos")))
at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName)
at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists)
at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.GetIndexWriter(Directory mainDir, OpenMode openMode)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.TryGetIndexWriter(OpenMode openMode, Directory luceneDirectory, Boolean createNewIfCorrupt, String indexName, IndexWriter& indexWriter)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.TryCreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock, Directory& directory)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
at Umbraco.Cms.Infrastructure.Examine.ConfigurationEnabledDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)
at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass1_0.<.ctor>b__0()
at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
--- End of stack trace from previous location ---
at System.Lazy`1.CreateValue()
at Examine.Lucene.Providers.LuceneIndex.IndexReady()
at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable`1 valueSets, CancellationToken cancellationToken)
at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass50_0.<PerformIndexItems>b__0()
at Examine.Lucene.Providers.LuceneIndex.<>c__DisplayClass75_0.<QueueTask>b__0(Task x)

@readingdancer
Copy link
Contributor

readingdancer commented Oct 9, 2024

Hi @Shazwazza

As always its much appreciated that you spend your time working on Examine, the last time the index failed for us was on Oct 7th and we had the following errors, you can see three different errors are being logged per request, we are catching all errors and pushing them to Slack, hence the screen shots, I hope it's useful:

image

Stack Trace
at Lucene.Net.Index.SegmentInfos.Read(Directory directory, String segmentFileName) at Lucene.Net.Index.IndexFileDeleter..ctor(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, InfoStream infoStream, IndexWriter writer, Boolean initialIndexExists) at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf) at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s) at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory) at Examine.Lucene.Directories.DirectoryFactoryBase.Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock) at Umbraco.Cms.Infrastructure.Examine.ConfigurationEnabledDirectoryFactory.CreateDirectory...

image

Stack Trace
This is the same as the trace shown above.

image

After this I updated Examine to 3.3.0 and so far, we haven't yet had the index fail again. However from experience this can be a matter of days or a few weeks. If it happens again, I'll try and remember to back up the logs before deleting them.

@Shazwazza
Copy link
Contributor

@kows the log message:

"MembersIndex" index is corrupt, a new one will be created

is expected if in fact the index is corrupt. The log message is from here https://github.com/Shazwazza/Examine/blob/3e8bf11de0ecef7f77bbbb5cec14a095b37c7bfc/src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs#L224

When a new one is created and its empty, Umbraco will re-populate it but that doesn't kick in until a minute or two after site startup.

I'll revisit test cases, and add more scenarios until I can determine how to replicate in tests. Thanks for the info so far.

@readingdancer are you sure you have upgraded to 3.3.0? The stack trace you have provided doesn't show the stack that should be there, it should look like this on 3.3.0

at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.GetIndexWriter(Directory mainDir, OpenMode openMode)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.TryGetIndexWriter(OpenMode openMode, Directory luceneDirectory, Boolean createNewIfCorrupt, String indexName, IndexWriter& indexWriter)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.TryCreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock, Directory& directory)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)
at Examine.Lucene.Directories.DirectoryFactoryBase.<>c__DisplayClass2_0.<Examine.Lucene.Directories.IDirectoryFactory.CreateDirectory>b__0(String s)

Your stack trace only has

at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
at Examine.Lucene.Directories.SyncedFileSystemDirectoryFactory.CreateDirectory(LuceneIndex luceneIndex, Boolean forceUnlock)

@Shazwazza
Copy link
Contributor

@kows + @readingdancer + everyone - some good news. I managed to replicate another issues which is fixed. I can't exactly replicate the one with the 'invalid deletion count' but I think it is directly related to the new fix.

I've published a beta here https://www.nuget.org/packages/Examine/3.3.1-beta.1

If anyone can please test, that would be great. This version will end up being 3.4.0 as it comes with performance improvements too and a small breaking change which reduces the default max results from 500 to 100 (if you need more, than use QueryOptions)

@chrisrandledev
Copy link

We're also experiencing it with Umbraco v10.8.3, and I note that it mention of it in this Lucene issue, moved to Github here

@Shazwazza
Copy link
Contributor

That Lucene issue is only talking about the description of the error. Has anyone tested the beta release I mentioned above? Without feedback and testing, there is no way I can know if this is resolving the problem. In that case, I'll assume that no feedback means success and will publish the release.

@Shazwazza
Copy link
Contributor

Examine 3.4.0 has been published https://github.com/Shazwazza/Examine/releases/tag/v3.4.0

@somoreingold
Copy link

@Shazwazza thanks!!! I've had the beta running for a bit, but the nature of this issue is that it's very sporadic/intermittent. So I think only time will tell if your edits fixed it. Not sure if it's better to keep this issue open for a while, or close it and have people report again if the issue still persists. Thoughts?

@Shazwazza
Copy link
Contributor

Sure, i think if folks can update to 3.4.0 and see how it goes, than report back here. With any luck maybe we can close this in November.

@tommilleruk
Copy link

Any update on this? We're experiencing the issue on Umbraco Cloud CMS Version 12.2.0

@Shazwazza
Copy link
Contributor

@tommilleruk yes, see above. Upgrade to examine 3.4.0 and tell us how it goes.

@somoreingold
Copy link

Just upgraded today, I'll report back if i see issues. So far the only thing to note is making sure to change any code that may be affected by the described breaking change (default result total decreased to 100 max).

@binraider
Copy link

binraider commented Nov 25, 2024

I have just pushed the 3.4.0 upgrade to production and it has not fixed the problem. I am thinking of changing the target azure webapps from windows to linux to see if that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests