-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 #16163
Comments
Hi there @konius! Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better. We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.
We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions. Thanks, from your friendly Umbraco GitHub bot 🤖 🙂 |
We've been dealing this issue with Umbraco 10.6.1. It has been impossible to replicate in development environments but does present itself in large traffic client sites with a lot of content. |
Might be related to |
Had similar issue on Umbraco Cloud using 10.8.5. Work around solution was to delete indexes via kudu and restart site - not ideal |
Just started seeing this after we rolled out our upgrade to 13.x. Strangely it's been working fine in App Service for over a year. |
Same problem here.
|
still happens on 13.3.0, single instance app service. |
We have the same issue on a client site running on 10.8.5 One thing I thing that would be a great improvement would be for the Umbraco back office to actually respond when an error like this occurs. I did look into this the last time it happened and either the API just doesn't reply, or it replies with an error. Either way, the UI just continues to look like it's waiting to load the page. It would be much better if this was handled and an appropriate error message was displayed, like:
Or words to that effect ;-) |
We are running into this problem as well on a 13.1.0 installation running on a single Azure App Service. We can't even use the api to rebuild the index, as that will throw the same "invalid deletion count" error. This should definitely have a higher priority in getting fixed 🙏 |
Just adding my 2 cents - Same issue continuing on 13.3.2. Running on Azure App Service (free plan), Azure SQL, Azure storage account for media & imagesharp stuff. |
same problem Umbraco v12.3.10 on Umbraco Cloud, ContentDeliveryAPI Index gets corrupted. |
We have a similar problem on Umbraco 8, 10 and 13. Trying enter Examine Management The solution is to delete the TEMP-folder in Kudo, restart the project and rebuild InternalIndex. It's a quickfix but is not a long-term solution. For some of our customers, the problem recurs at weekly intervals. |
Experiencing this constantly running Umbraco 13.3.2 deployed to Azure using recommended config. It's especially troublesome as we don't have direct access to every customers infrastructure. |
We're having this issue as well, it seems to be random. Had it a few weeks ago however removing the index files and restarting the webapp seemed to resolve it. However we're having the same issue. We've release some features and updates (same umbraco version 12.3.7) and we're suffering again. Removing the index files and restarting has had no affect. Unfortunately our client relies on the member search because we have a custom index and searcher with member properties they need to search on - the standard searcher doesn't search on custom properties btw. We're seeing different errors.
|
Deleting all of the examine indexes and restarting the webapp restored the indexes correct. |
@seanrockster good to hear But this seem to be a very common problem, would love to get some kind of HQ feedback here 😅 |
@seanrockster this is a known quick fix, but doesn't solve the underlying problem that has been a documented issue for a long time and theres been little to no formal communication from Umbraco on this. |
Tagging @nul800sebastiaan 🙈 |
We got this on Umraco Cloud and Cms 9.5.4. Deleting the examine indexes did it. But waiting for a permanent fix. |
We have experienced this on 8.18.14, 10.8.6 and 13.3.2. |
Also happening on 13.3.1, hosted in azure app service with azure sql db EDIT: |
Hey y'all, we are aware that these issues exist, but we have strong suspicions these things are popping up due to misconfiguration of azure web apps/load balancing/slot swapping. To be able to determine whether these issues are related to misconfiguration/bugs in examine/bugs in Umbraco, we are trying to build a troubleshooting guide, this will take some time still. In the meantime I advice you to read up on the issue @kevinstampe linked over on the examine repo (Shazwazza/Examine#382 (comment)) You can check a few of these configurations with @warrenbuckley's RuntimeValidators Hope to get back to you soon. |
@Migaroez As @paulsterling writes here: Shazwazza/Examine#382 (comment) this is also an issue on a default Umbraco Cloud v. 13.4.0 configuration. |
+1 for another site hosted in Umbraco Cloud encountering this same problem, Umbraco v13.3.2. |
Okay everyone, as @Shazwazza mentioned on Shazwazza/Examine#382. This could be due to:
Cheers! |
We've upgraded to 13.4 and we are still having this issue. This is impacting our business, not good. @UmbracoHQ |
Its logging this error every second, and the members index refuses to rebuild. System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\MembersIndex\segments_3'. |
@Shazwazza that didn't work and brought the site down with some error - i'll check the log for the exact error but had to revert to TempFileSystemDirectoryFactory |
Not sure why but i think the db was timing out and that was the cause of the 500, i'm getting a timeout when I run the app via kudu. However after 4-5 restarts the app starts again. That did not happen on v12 as we restarted if often to rebuild the member indexes. The azure db is S6 400dtu and it smashes it, maxes the DTUs while starting. |
@seanrockster Yes there is a reason why the DB will get smashed. I've spoken about this issue at length in a couple Code Garden presentations and is also why I created ExamineX so that you can have your indexes persisted in a managed service instead of in local Lucene files, this avoids index rebuilds, issues with Lucene files in Azure, etc... To re-cap this #15783 (comment):
The problem with SyncedFileSystemDirectoryFactory is that this implementation doesn't take into account what happens if the index files in your main storage become corrupted which can happen for a number of reasons - misconfiguration, network latency, process termination, etc... I've been helping the Umbraco team with a fix for this and will publish an Examine release for that next week. |
Ohh my, this is such a mess 😬 @Shazwazza thank you for taking the time to help out on this 🙏 Perhaps ExamineX should have a better bundling option on Cloud setups going forward 🤔 |
Yes...that's my vote > to offer Examine X as part of Umbraco Cloud! It does solve the fundamental issues with Lucene and Azure App Services. Whatever is done with current Examine will be a workaround at best. I've tagged that as a request in umbraco/Umbraco.Cloud.Issues#110 |
The updates for SyncedFileSystemDirectoryFactory can be found here Shazwazza/Examine#387. Essentially, this will just allow the site to recover if the main or local index has become corrupt for whatever reason. It may mean that index rebuilds occur in these scenarios but at least the site will bootup. I've also added an options (not enabled by default) to run a repair on the index. This may result in documents being deleted, but in some cases, those documents may have been legitimate deletions before the index was committed. That would mean that index rebuilds don't occur but potentially document loss in the index - which is why I'll keep it optional. Hoping to get a release out this week. |
@Shazwazza I see it hasn't been released yet, do you have an update as to when you expect to have this fix deployed? We have a customer that has cbeen having corrupt index errors for over a year now without us being able to fix the problem. Lots of headache. This would be a lifesaver for us. |
Examine 3.3.0 has been published, release notes are here https://github.com/Shazwazza/Examine/releases/tag/v3.3.0 |
We're still experiencing this issue having just updated to U13.5.1, our understanding was that Examine 3.3.0 was part of Umbraco 13.5. |
Hi @AndyButland This is the error we see in the log: Or is that because we use the SyncedTempFileSystemDirectoryFactory as opposed to the SyncedFileSystemDirectoryFactory mentioned in @Shazwazza 's post ? Edit: I see now, going through the code, that setting the "LuceneDirectoryFactory" app setting to "SyncedTempFileSystemDirectoryFactory" does indeed actually end up using the SyncedFileSystemDirectoryFactory 👍 |
I agree... sorry, it seems I inadvertently closed it when referencing this issue in an related PR from another project. |
Not sure if this will help anyone, but on investigating our own instance of this issue, I stumbled across the fact that the Azure app service plan had been set up with Zone Redundancy, which requires a minimum of 3x instance count. This wasn't made clear when provisioning the service - so the production site was running on multiple instances without proper load balancing configuration. I'm sure this contributed in some way to all the indexing issues we were having. I wonder if others are running on multiple workers without knowing. Checking the umbracoServer table for active instances would show this if so. |
@sniffdk That is a different stack trace than any previous one. |
@Shazwazza as requested. My trace after Azure machine move:
Then
|
Hi @Shazwazza As always its much appreciated that you spend your time working on Examine, the last time the index failed for us was on Oct 7th and we had the following errors, you can see three different errors are being logged per request, we are catching all errors and pushing them to Slack, hence the screen shots, I hope it's useful: Stack Trace Stack Trace After this I updated Examine to 3.3.0 and so far, we haven't yet had the index fail again. However from experience this can be a matter of days or a few weeks. If it happens again, I'll try and remember to back up the logs before deleting them. |
@kows the log message:
is expected if in fact the index is corrupt. The log message is from here https://github.com/Shazwazza/Examine/blob/3e8bf11de0ecef7f77bbbb5cec14a095b37c7bfc/src/Examine.Lucene/Directories/SyncedFileSystemDirectoryFactory.cs#L224 When a new one is created and its empty, Umbraco will re-populate it but that doesn't kick in until a minute or two after site startup. I'll revisit test cases, and add more scenarios until I can determine how to replicate in tests. Thanks for the info so far. @readingdancer are you sure you have upgraded to 3.3.0? The stack trace you have provided doesn't show the stack that should be there, it should look like this on 3.3.0
Your stack trace only has
|
@kows + @readingdancer + everyone - some good news. I managed to replicate another issues which is fixed. I can't exactly replicate the one with the 'invalid deletion count' but I think it is directly related to the new fix. I've published a beta here https://www.nuget.org/packages/Examine/3.3.1-beta.1 If anyone can please test, that would be great. This version will end up being 3.4.0 as it comes with performance improvements too and a small breaking change which reduces the default max results from 500 to 100 (if you need more, than use QueryOptions) |
We're also experiencing it with Umbraco v10.8.3, and I note that it mention of it in this Lucene issue, moved to Github here |
That Lucene issue is only talking about the description of the error. Has anyone tested the beta release I mentioned above? Without feedback and testing, there is no way I can know if this is resolving the problem. In that case, I'll assume that no feedback means success and will publish the release. |
Examine 3.4.0 has been published https://github.com/Shazwazza/Examine/releases/tag/v3.4.0 |
@Shazwazza thanks!!! I've had the beta running for a bit, but the nature of this issue is that it's very sporadic/intermittent. So I think only time will tell if your edits fixed it. Not sure if it's better to keep this issue open for a while, or close it and have people report again if the issue still persists. Thoughts? |
Sure, i think if folks can update to 3.4.0 and see how it goes, than report back here. With any luck maybe we can close this in November. |
Any update on this? We're experiencing the issue on Umbraco Cloud CMS Version 12.2.0 |
@tommilleruk yes, see above. Upgrade to examine 3.4.0 and tell us how it goes. |
Just upgraded today, I'll report back if i see issues. So far the only thing to note is making sure to change any code that may be affected by the described breaking change (default result total decreased to 100 max). |
I have just pushed the 3.4.0 upgrade to production and it has not fixed the problem. I am thinking of changing the target azure webapps from windows to linux to see if that helps |
Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)
11.3.1
Bug summary
Examine index gets corrupt and can't view or manage the Examine dashboard and any content trying to read index for display purpose becomes empty.
Happens on version 11.3.2, but also on 13.1.1 with the only solution to complete delete Examine folder and restart the application.
Issue is already discussed on Our.
Specifics
For an unknown reason the index gets corrupt and bricks the back office dashboard.
Application is hosted on Azure and config is applied as per this guide: https://docs.umbraco.com/umbraco-cms/v/10.latest-lts/fundamentals/setup/server-setup/azure-web-apps
Steps to reproduce
N/A
Expected result / actual result
Expected to be able to at least view the dashboard and rebuild indexes if they get corrupt.
The text was updated successfully, but these errors were encountered: