Remove azure directory #226

bielu · 2021-05-01T11:43:24Z

@Shazwazza as agreed PR is changed to remove Azure Directory, I am going today to create new Repository Examine.Directories, and I will play there with Azure and s3 directories :)

…ob-v1 # Conflicts: # build/build.xml

…ob-v1 # Conflicts: # src/Examine.AzureDirectory/AzureDirectory.cs # src/Examine.AzureDirectory/Properties/AssemblyInfo.cs # src/Examine/Properties/AssemblyInfo.cs

…ons. Refactor to simplify code/ improve code reuse. Change framework to 4.6.1 for azuredirectory, test and demo projects as required by blob storage package

…h namespace. Integration test is not very reliable with the azure storage emulator.

Use IndexReader in lucene searcher - based on Shannon comments, add check sync on create input/output, add lock on rebuilding of cache

…giri-1.11.1' into feature/v1/azure-blob-1-fixes # Conflicts: # build/build.xml

…lob already started

…ment lock

# Conflicts: # build/build.xml # src/Examine.AzureDirectory/AzureDirectoryFactory.cs # src/Examine/Logging/LogEntry.cs # src/Examine/LuceneEngine/Providers/LuceneIndex.cs

update also target framework

# Conflicts: # src/Examine.AzureDirectory/AzureDirectoryFactory.cs

…lob-1-fixes # Conflicts: # src/Examine.Test/App.config # src/Examine.Test/Examine.Test.csproj # src/Examine.Test/packages.config # src/Examine.Web.Demo/Web.config # src/Examine/Examine.csproj # src/Examine/Properties/AssemblyInfo.cs

Shazwazza

Thanks, have left a bunch of questions/comments.

Shazwazza · 2021-05-03T17:44:06Z

src/Examine/Examine.ruleset

@@ -0,0 +1,47 @@
+<?xml version="1.0" encoding="utf-8"?>


What is this file doing?

@Shazwazza that was added by @nzdev, if I remember correctly will check :)

@Shazwazza I think @nzdev is using some static code analyse software and it was failing in building for him because of that.

It would be good to remove this file

Shazwazza · 2021-05-03T17:48:04Z

src/Examine/LuceneEngine/DeletePolicies/NoDeletionPolicy.cs

+
+namespace Examine.LuceneEngine.DeletePolicies
+{
+    public class NoDeletionPolicy : IndexDeletionPolicy


I'm unsure if these new bits are actually needed and instead just using the build in SnapshotDeletePolicy. For example, here's some docs (there's a bunch more out there) on Hot Backups with older lucene. 4.8 has a built in Replicator. Just wanted to mention this since potentially SnapshotDeletePolicy solves these issues https://freecontent.manning.com/hot-backups-with-lucene/

@Shazwazza that one will actually make removal of files, so it will cause End of file exception. so sadly we need deal with that in that way, as we dont make backups, we actively using indexes.
but for 4.8 I will look into Replicator as it is actually more or less what I am doing now :)

I had a read of that document along with https://cwiki.apache.org/confluence/display/solr/SolrReplication. Using the snapshotdeletionpolicy sounded like the way to go to me. It allows for hot backups. It wouldn't be a duplex R/W approach, but use the on commit method to choose how many versions to hold and send to blob storage. At the same time you could queue a message to signal the other instances to sync index files. This way you would not need to check the storage account files every query.

@nzdev you forgot about one thing, solr is using that internally, we have delay between push and pull, so that will cause EOF :) sadly that aproach would be perfect for doing syncTemp or backups, but not live synchronization :)
as long we dont want make message system between instances there is not really good way of doing that base on snapshotdeletionpolicy from my point of view ;/

I think it could be made to work as the SnapshotDeletionPolicy can prevent files from being deleted, so no risk of EOF. Each IndexCommit tracks the changed files. We could then write the files plus the list of changes to blob storage and have the replicas/ client either poll for changes or check on index reader creation. The Replicator looks like the way to go though for 4.8, but ultimately it will be using the same approach as solr. Have a read of the "how does it work section" from https://cwiki.apache.org/confluence/display/solr/SolrReplication. The delay between push and pull can be handled by versioning segments.gen in blob storage.

@nzdev hmmm i did read that, but yeah it will be something what will need further investigation, but yeah so far i have working production site base on code which is used in that repo already:
https://github.com/bielu/Examine.Directories
so if we will make changes there I guess we will need do modification of remote directory and test again :)

@nzdev @Shazwazza I looked into that more deeply, and there is few limitation:

all writers has to be closed when restoring (so restoring from blob storage)

based on point 1 we still staying with shuffle of folders in time of restore (so event on lucene index still needed)

hmmm with nodeletionpolicy I hit interesting issue (some documents are still present even if in the newest commit they are not existing)

I started looking into snapshot policy as it would make much faster restore as it would be only duplication of last commit, but from that what I am seeing we are stacking with same issue, but will investigate this weekend further as might be to that better solution

src/Examine/LuceneEngine/ExamineIndexWriter.cs

Shazwazza · 2021-05-03T17:53:01Z

src/Examine/LuceneEngine/MergePolicies/NoMergePolicy.cs

+
+namespace Examine.LuceneEngine.MergePolicies
+{
+    public class NoMergePolicy : MergePolicy


See other comment about SnapshotDeletionPolicy. Could that 'just work' by acquiring a snapshot for syncing indexes?

Shazwazza · 2021-05-03T17:54:54Z

src/Examine/LuceneEngine/Providers/LuceneIndex.cs

 #endif

            if (!RunAsync)
            {
                var msg = "Indexing Error Occurred: " + e.Message;
                if (e.InnerException != null)
                    msg += ". ERROR: " + e.InnerException.Message;
+                //if(this._directory is ExamineDirectory examineDirectory && examineDirectory.IsReadOnly)


Do we need this comment?

@Shazwazza will remove that asap :)

Shazwazza · 2021-05-03T18:10:26Z

src/Examine/LuceneEngine/Providers/LuceneIndex.cs

+
+                }
+
+                _writer = null;


Maybe a note here to explain what this is doing.

Shazwazza · 2021-05-03T18:11:36Z

src/Examine/LuceneEngine/Providers/LuceneSearcher.cs

            if (_nrtWriter != null)
            {
+                if (_directory != null && _nrtWriter.Directory != _directory)


What is this code doing?

@Shazwazza I added comment and because you commented I noticed I forgot to update that part to use WriterTracker.Current.MapAsOld( instead of direct dispose :)

There is a lot of edge case logic in here to support scenarios that are totally outside of the core of Examine and I worry that these changes will bleed into creating strange underlying issues and code maintenance questions. (i.e. why is this code here if it never needs to exist to support what Examine itself is doing). My other worry is that porting this to 2.0.0 on netcore will be problematic. We don't have any WriterTracker anymore. There is a single writer for an index and it is managed by the index and disposed when it shuts down. Maybe there's a better way to support all of the changes that you are wanting to do by exposing protected properties so that you can sub class some of these objects to control things the way that you want. I would very much prefer to not have code like this in here and instead make the code better extensible to provide changing the logic outside of this library. Does that make sense?

@Shazwazza in 2.0.0 we dont need that at all, as lucene 4.8 support duplicator, so we dont need really play around with swapping indexes :). So we can discuss what we can do that :)

Shazwazza · 2021-05-03T18:12:09Z

src/Examine/LuceneEngine/Providers/LuceneSearcher.cs

+                                    MaybeReopen();
+
+                                }
+                                catch (Exception)


what exception is actually thrown? that's the one that should be used.

@Shazwazza I think I actually was planning to add logger there, forgot will do that soon :)

@Shazwazza I will add logging there, but can't really remind myself what exception i hit here 🤦‍♂️

Shazwazza · 2021-05-03T18:12:56Z

src/Examine/LuceneEngine/WriterTracker.cs

@@ -41,9 +44,71 @@ public IndexWriter GetWriter(Directory dir, bool throwIfEmpty)

        public IndexWriter GetWriter(Directory dir, Func<Directory, IndexWriter> factory)
        {
-            var resolved = _writers.GetOrAdd(dir.GetLockId(), s => factory(dir));
-            return resolved;
+            lock (_locker)


_writers is a ConcurrentDictionary, why do we need another lock? Else if we do because of the logic in MapAsOld, we can probably just make it a Dictionary and manage the locking ourselves?

@Shazwazza I had some weird issue with just ConcurrentDictionary, so I added that, but will play around with just dictionary :)

Shazwazza · 2021-05-03T18:14:44Z

src/Examine/LuceneEngine/WriterTracker.cs

        }

+        public void MapAsOld(Directory dir)


Some code docs explaining what this does would be good.

@Shazwazza will add :)

Commented added, please let know if you need more descriptive comment

…g writer

Shazwazza

Have added a few more comments on code

Shazwazza · 2021-05-04T17:24:31Z

src/Examine/LuceneEngine/Providers/LuceneIndex.cs

+                    writer = new ExamineIndexWriter(d, FieldAnalyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
+                }
+
+                if (examineDirectory.GetMergeScheduler() != null)


Minor - but calling a method multiple times shouldn't be done since normally that will incur some performance. Better to store the result of GetMergeScheduler() and re-use it.

Shazwazza · 2021-05-04T17:25:48Z

src/Examine/LuceneEngine/Providers/LuceneIndex.cs

+            }
+        }
+
+        public ExamineDirectory ExamineDir { get; set; }


Making this settable is problematic, it needs to be readonly and injected

Shazwazza · 2021-05-04T17:28:05Z

src/Examine/LuceneEngine/Directories/ExamineDirectory.cs

+        /// Called on commit
+        /// </summary>
+        /// <param name="action"></param>
+        public void SetOnCommitAction(Action<ExamineIndexWriter> action)


All of these Set methods mean that you can mutate this instance at runtime which may be problematic. Since this needs to be injected into the ctor of the index, all of these things should be injected into the ctor of this object too. This is a new object so we don't need to work around breaking changes and introduce spaghetti code. I feel like this whole class can be simplified, it takes in it's requirements in the ctor and exposes them by properties (not methods).

Any thoughts on this?

@Shazwazza I am going to look into that soon, as need figure out maybe better way of handling as I also need introduce background queue and need figure out best way of handling that :)

Shazwazza · 2021-05-04T17:37:17Z

src/Examine/LuceneEngine/Providers/LuceneSearcher.cs

            if (_nrtWriter != null)
            {
+                if (_directory != null && _nrtWriter.Directory != _directory)


There is a lot of edge case logic in here to support scenarios that are totally outside of the core of Examine and I worry that these changes will bleed into creating strange underlying issues and code maintenance questions. (i.e. why is this code here if it never needs to exist to support what Examine itself is doing). My other worry is that porting this to 2.0.0 on netcore will be problematic. We don't have any WriterTracker anymore. There is a single writer for an index and it is managed by the index and disposed when it shuts down. Maybe there's a better way to support all of the changes that you are wanting to do by exposing protected properties so that you can sub class some of these objects to control things the way that you want. I would very much prefer to not have code like this in here and instead make the code better extensible to provide changing the logic outside of this library. Does that make sense?

nzdev · 2023-08-15T07:48:27Z

I think at this point, I wouldn't be able to progress with this approach without some sort of distributed log which I would say is more of a server implementation issue (think solr). I'd suggest closing this pr.

bielu · 2023-08-15T13:45:27Z

@nzdev yeah it is something to close

Arkadiusz Biel and others added 30 commits November 8, 2020 00:02

POC of nomerge, no write for readonly event for v1

688b287

add more security on missing files

a5f09bf

ver

dbd12f1

Merge remote-tracking branch 'github.com/v1.0.6' into bugfix/azure-bl…

16c3429

…ob-v1 # Conflicts: # build/build.xml

semi working ver

0f58a0e

make no merge sheduler public

c4edc82

create proper readonly DIrectory with aliasing for time of resync

e342c4a

correct versions

e53d62a

Merge remote-tracking branch 'github.com/master' into bugfix/azure-bl…

d439b77

…ob-v1 # Conflicts: # src/Examine.AzureDirectory/AzureDirectory.cs # src/Examine.AzureDirectory/Properties/AssemblyInfo.cs # src/Examine/Properties/AssemblyInfo.cs

move definition to new interface

62f0fb9

Separate method from new interface to new Factory

23257d7

removed unsed code and definitions

1ccbdec

correct definition

5ebc18e

correct nuspec and directory

35afc28

Fix ruleset so this builds

9ee49ac

Update blob storage package as previous obsolete. Add logging extensi…

be6ccb7

…ons. Refactor to simplify code/ improve code reuse. Change framework to 4.6.1 for azuredirectory, test and demo projects as required by blob storage package

Add logging

fb79db7

Deduplicate isreadonly

2993a68

don't change name unintentionally

2b60857

logging for lock errors

0d80363

Renamed AzureDirectory to AzureLuceneDirectory to avoid conflicts wit…

9fb1f56

…h namespace. Integration test is not very reliable with the azure storage emulator.

revert overwrite

b7af836

Tests run (with possible bugs) in azurite

66a16d9

allow overwrite

8c94405

More extensibility

df80409

update nuspec

e0f5504

update nuspec dependencies

7e65b23

bugfix. update nuspec

abd1216

Prevent indexwriter's merge scheduler being replaced.

005632c

refactor

19feb3a

Arkadiusz Biel added 19 commits February 12, 2021 14:50

add dirty check on getting indexwriter

631dea9

make logich more bullet proof

1bd1fdd

Use IndexReader in lucene searcher - based on Shannon comments, add check sync on create input/output, add lock on rebuilding of cache

Merge remote-tracking branch 'github.com/dependabot/bundler/docs/noko…

3e807fa

…giri-1.11.1' into feature/v1/azure-blob-1-fixes # Conflicts: # build/build.xml

stop resync if blob get removed on server, probably next refresh of b…

3e333ef

…lob already started

swallow exception for StaleReaders as they could be not initiated

7367563

avoid Dispoing index writer in use

0c9d143

avoid breaking change

197e83a

dont init writer in cleaning method as it cause double open and perna…

31f2ab4

…ment lock

Merge branch 'feature/logging' into feature/v1/azure-blob-1-fixes

d2a7264

# Conflicts: # build/build.xml # src/Examine.AzureDirectory/AzureDirectoryFactory.cs # src/Examine/Logging/LogEntry.cs # src/Examine/LuceneEngine/Providers/LuceneIndex.cs

remove old api packages

07abf30

update also target framework

upgrade azure api

a702238

Merge branch 'feature/upgrade-api' into feature/v1/azure-blob-1-fixes

28170fb

correct azure factories after merges from smaller prs

614259a

Merge branch 'feature/logging' into feature/v1/azure-blob-1-fixes

5d20be0

# Conflicts: # src/Examine.AzureDirectory/AzureDirectoryFactory.cs

clean up methods

b7e1ae5

remove azureDirectory

b954c60

cleanup project after merges

a10d620

correct build xml

ec77202

Shazwazza requested changes May 3, 2021

View reviewed changes

Arkadiusz Biel added 4 commits May 3, 2021 22:49

add missing methods in examine index writer, add coment about cleanin…

56486fa

…g writer

add code, correct lucene searcher

3e9ef59

unbind method

b20eb23

remove old comment

3a42255

Shazwazza requested changes May 4, 2021

View reviewed changes

Shazwazza mentioned this pull request Aug 24, 2021

Save&publish on Master restart Slave with Azure Directory #160

Closed

nikcio mentioned this pull request Jul 28, 2023

Examine Facets proposal #310

Closed

5 tasks

bielu closed this Aug 15, 2023

Remove azure directory #226

Remove azure directory #226

Conversation

bielu commented May 1, 2021

Shazwazza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bielu May 4, 2021 • edited Loading

Choose a reason for hiding this comment

nzdev May 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bielu Jul 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shazwazza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nzdev commented Aug 15, 2023

bielu commented Aug 15, 2023

bielu May 4, 2021 •

edited

Loading

nzdev May 4, 2021 •

edited

Loading

bielu Jul 23, 2021 •

edited

Loading