Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review tests for Lucene.Net assembly #259

Open
9 of 11 tasks
clambertus opened this issue Oct 11, 2019 · 19 comments · Fixed by #411, #420, #425, #435 or #414
Open
9 of 11 tasks

Review tests for Lucene.Net assembly #259

clambertus opened this issue Oct 11, 2019 · 19 comments · Fixed by #411, #420, #425, #435 or #414
Labels
Lucene.Net Test pri:high up-for-grabs This issue is open to be worked on by anyone
Milestone

Comments

@clambertus
Copy link

clambertus commented Oct 11, 2019

Occasionally, tests are still being found with important lines that have been commented because the functionality didn't exist at the time they were ported. Additionally, some test conditions have been changed from what they were in Java (although in some cases this is a necessary change due to a difference in platforms and in other cases it is a bug). We need some assurance that none of our green tests are false positives.

A line-by-line review would be best, but at the very least we should be checking that the implementations are complete and the test conditions are the same as in Java Lucene 4.8.0. Checking at the method level to ensure they all exist and have the right attributes has been completed already on Lucene.Net (core).

The tests in question need to be analyzed to ensure:

  1. That the test conditions weren't changed from what they were in Lucene (without some very good reason)
  2. That none of the asserts or parts of the test setup were simply commented out to make the test pass rather than fixing the underlying problem
  3. (for tests other than Lucene.Net (core)): The test methods/classes are not skipped in Lucene.NET unless they were skipped in Lucene or have some known problem

There may be other unforeseen issues with the tests (such as an incorrect translation of the line), as well, which is why a line-by-line comparison would be best.

Some Other Things to Look For

  1. Missing [Test] attributes. Note that in Lucene tests are run by naming convention for any parameterless method named starting with test, but in .NET the attribute is required.
  2. Formatting of arrays over multiple lines
  3. Stripped comments which would provide needed context
  4. Partially implemented classes/tests
  5. Method calls such as Convert.ToString(int) that require an explicit CultureInfo.Invariant parameter to match Java
  6. APIs that accept/return object that are using value types (boxing/unboxing)
  7. APIs that require casting in order to use them, which is something we should strive to solve
  8. APIs that accept/return non-generic collections, which we should strive to make generic

While it isn't the most important part of the task, the tests are also the easiest place to spot usability issues with the API, so if any are discovered (that aren't already marked LUCENENET TODO) we should open new issues for them as well.

The test projects that need review are:

JIRA link - [LUCENENET-632] created by nightowl888

@clambertus clambertus added Lucene.Net Test up-for-grabs This issue is open to be worked on by anyone labels May 5, 2020
@NightOwl888 NightOwl888 added this to the Lucene.Net 4.8.0 milestone May 7, 2020
@rclabo
Copy link
Contributor

rclabo commented Feb 1, 2021

The first paragraph and the 2nd paragraph don't seem quite congruent. Is the primary issue here that some lines of the tests may be commented out because the functionality didn't exist at the time the test was ported. If so, then I suggest the course of action is to visually inspect all tests for the projects listed to ensure no needed lines of code are commented out. If any commented lines are found then uncomment them and run the test to ensure it still passes. If that approach is sufficient for this issue then I'm willing to do that work and this issue can be assigned to me.

@NightOwl888
Copy link
Contributor

@rclabo

I have updated the text of the original task to make it clearer. I have also broken down the task by test project. If necessary, we can further break it down by namespace to divide up the work between multiple people.

Let me know if you are still on board with doing (at least some of) this task.

@rclabo
Copy link
Contributor

rclabo commented Feb 1, 2021

Let me do a dry run on some of the tests in Lucene.Net.Tests._A-D to get a sense of the time requirement for this. Lucene.Net has 7735 tests, which is no small amount. I need some sense of the time it's gonna take before I can know what I can sign up for. I'll report back here after I review some and see what level of effort is required.

@NightOwl888
Copy link
Contributor

I will begin working on the Lucene.Net.Tests.Facet tests.

@NightOwl888 NightOwl888 self-assigned this Feb 4, 2021
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 4, 2021
…riter::TestCommitUserData(): Corrected issue with asserting dictionary contents rather than checking for null on the exception message (apache#259)
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 4, 2021
…Class(): Added missing line to set the merge policy to NoMergePolicy.COMPOUND_FILES (apache#259)
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 4, 2021
…ValueSourceAnonymousInnerClassHelper: Completed implementation of GetHashCode() (apache#259)
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 4, 2021
…hod calls, marked fields readonly on anonymous classes (apache#259)
@NightOwl888
Copy link
Contributor

@rclabo

During #411 I found 2 API issues and 4 bugs (places where the code diverged from Java or were converted incorrectly to .NET). I also timed my progress and it took an average of about 3 minutes per test to do a line-by-line review. For someone less familiar with the project, that number can probably be adjusted to 3.5-4 minutes per test for an estimate of the remaining projects.

NightOwl888 added a commit that referenced this issue Feb 4, 2021
…riter::TestCommitUserData(): Corrected issue with asserting dictionary contents rather than checking for null on the exception message (#259)
NightOwl888 added a commit that referenced this issue Feb 4, 2021
…Class(): Added missing line to set the merge policy to NoMergePolicy.COMPOUND_FILES (#259)
NightOwl888 added a commit that referenced this issue Feb 4, 2021
…ValueSourceAnonymousInnerClassHelper: Completed implementation of GetHashCode() (#259)
NightOwl888 added a commit that referenced this issue Feb 4, 2021
…hod calls, marked fields readonly on anonymous classes (#259)
@NightOwl888 NightOwl888 linked a pull request Feb 4, 2021 that will close this issue
@rclabo
Copy link
Contributor

rclabo commented Feb 4, 2021

@NightOwl888 thanks, that's excellent info. I think I will learn in the test reviewing process. I see that you can check off the test set you reviewed in the list contained in this issue. That's cool.

@NightOwl888
Copy link
Contributor

I will begin working on the Lucene.Net.Tests.Join tests.

@NightOwl888 NightOwl888 removed their assignment Feb 6, 2021
NightOwl888 pushed a commit that referenced this issue Feb 12, 2021
 (#420)

* Reviewed classification tests. Made minor changes and uncommented 1 test and verified it passes.  See issue #259

* Finished my work ;-)
@NightOwl888
Copy link
Contributor

I will begin working on the Lucene.Net.Tests.Queries tests.

@NightOwl888 NightOwl888 self-assigned this Feb 21, 2021
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 22, 2021
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 22, 2021
NightOwl888 added a commit to NightOwl888/lucenenet that referenced this issue Feb 22, 2021
@NightOwl888
Copy link
Contributor

@rclabo - #446 has been merged, so we can resume reviewing Lucene.Net (core) tests.

@rclabo
Copy link
Contributor

rclabo commented May 28, 2021

Thanks for the nudge. Also THANK YOU for the huge work you did in reviewing all the exceptions and the catch clauses to make sure they are faithfully following the behavior of Java Lucene. That was a huge amount of super important work.

@paulirwin
Copy link
Contributor

paulirwin commented Jan 11, 2024

I have begun reviewing the core T-Z tests. So far I've made it through Automaton and it was all good apart from some very minor formatting that I'll PR once complete.

  • Util.Automaton
  • Util.Fst
  • Util.Packed
  • Util B-TestF
  • Util TestI-TestQ
  • Util TestR-TestW

Update 1/12: Reviewed Util.Fst, also mostly a few minor formatting changes
Update 1/12: Reviewed Util.Packed, same
Update 1/15: Reviewed files with names starting with B-TestF in Util
Update 1/17: Reviewed through TestQ in Util
Update 1/18: Finished last segment of tests, PR coming soon

@rclabo
Copy link
Contributor

rclabo commented Jan 11, 2024

@paulirwin Thank you Paul, much appreciated! This is such a great project, so I get excited when I see others jumping in to help. Thank you!

@paulirwin
Copy link
Contributor

paulirwin commented Jan 11, 2024

@rclabo haha I'm jumping in again after a very long absence. My colleagues and I had done a lot of work to finish getting the core of 4.8 ported about 10 years ago, then another group came along with a branch and that ended up being the current work we see today. We used my branch to create an app that supported ~1B document indexes under heavy load in production, so it was pretty rock solid at the time, but was stuck on .NET Framework so I'm happy to see the state of things today and being able to build this on arm64 macOS in .NET 6! I haven't had a need for Lucene.net in a while so I have been pretty quiet the last several years. Now, though, I have a project that is a local emulator for Azure Search which uses 4.8 beta, as well as I'm getting into RAG with LLMs so I'd love to see if I can help move the project along to be able to use the new vector database fields in modern Java Lucene.

The tests we ported at the time can be referenced if desired: https://github.com/paulirwin/lucene.net/tree/branch_4x

@rclabo
Copy link
Contributor

rclabo commented Jan 11, 2024

@paulirwin Super cool. I'm also very interested in LLMs and RAGs. Well, welcome back. Lucene and the community have missed you! :-)

@NightOwl888
Copy link
Contributor

@paulirwin - Thanks for your help on this.

This issue is certainly a good place to start after being inactive for awhile. But do note we have set up a Slack channel #lucenenet-dev where you can review some (more) of the recent activity or discuss other aspects of the project that are off-topic for a specific GitHub issue. The dev mailing list works, also, but I find it is easier to share code on Slack.

Apache docs for Slack

paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 18, 2024
- This addresses some formatting disparities between the Java and C#
tests to help make them more line-to-line matching.
- Comments added in places where our tests differ from upstream.
- A unit test assertion bug on arm64 with division by zero is fixed.
- The `build` file has been chmod'ed +x for ease of use
- Rider IDE files added
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 18, 2024
- This addresses some formatting disparities between the Java and C#
  tests to help make them more line-to-line matching.
- Comments added in places where our tests differ from upstream.
- RandomizedTesting.Generators extension method import moved behind a
  `#if !NET6_0_OR_GREATER` check, as the namespace was being ignored on
  .NET 6+ in a few files since .NET 6 added a Random.NextInt64 method.
  This will help prevent other contributors from thinking this namespace
  is not needed.
- A unit test assertion bug on arm64 with division by zero is fixed.
- The `build` file has been chmod'ed +x for ease of use
- Rider IDE files added
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 22, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 22, 2024
paulirwin added a commit that referenced this issue Jan 22, 2024
* Tests: Review T-Z Tests, #259

- This addresses some formatting disparities between the Java and C#
  tests to help make them more line-to-line matching.
- Comments added in places where our tests differ from upstream.
- RandomizedTesting.Generators extension method import moved behind a
  `#if !NET6_0_OR_GREATER` check, as the namespace was being ignored on
  .NET 6+ in a few files since .NET 6 added a Random.NextInt64 method.
  This will help prevent other contributors from thinking this namespace
  is not needed.
- A unit test assertion bug on arm64 with division by zero is fixed.
- The `build` file has been chmod'ed +x for ease of use
- Rider IDE files added

* Update AssertSorted method to better match Lucene, reduce allocations, #259

* Add feature for .NET 6+ Random methods and change #if to use it, #259

* Fix XML docs for TestBytesRefHash, #259

* Update .editorconfig to use new lines at end of file
@paulirwin
Copy link
Contributor

paulirwin commented Jan 22, 2024

T-Z done and merged, moving backwards to the J-S monster (1,158 tests):

  • Search/Payloads
  • Search/Similarities
  • Search/Spans
  • Search/B-TestB
  • Search/TestC
  • Search/TestD-TestE
  • Search/TestF-TestL
  • Search/TestM-TestN
  • Search/TestP-TestQ
  • Search/TestR-TestSh
  • Search/TestSi-TestSu
  • Search/TestT-TestW
  • Store/TestB-TestH
  • Store/TestL-TestW
  • Support/Codecs
  • Support/Diagnostics
  • Support/Document/Extensions
  • Support/IO
  • Support/Threading
  • Support/Util
  • Support

paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 23, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 25, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 25, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 25, 2024
paulirwin added a commit that referenced this issue Mar 10, 2024
* Formatting and remove extraneous comment in Search/Payloads, #259

* Array formatting in Search/Similarities to match upstream

* Search/Spans formatting cleanup, use singletons for parameterless/captureless anonymous classes

* Code cleanup in Search/B-TestB tests

* Search/TestC code cleanup and allocation improvements

* Search/TestD-TestE code cleanup

* Test review TestF-TestL, #259

* Test review for rest of Search, #259

* Test review of Store, #259

* Finish J-S test cleanup

* Fix .NET FX build failure due to missing import

* Use CompareToOrdinal instead of string.Compare

* Change Array.Empty<T> to Arrays.Empty<T> until #916 is done in a separate PR

* Null-safe disposal of resources in TestSpansAdvanced

* Make TestSpansAdvanced.AssertHits static to match upstream Java code

* Use Arrays.Empty<T> in TestBoolean2 instead of Array.Empty<T>

* Revert static instance of MockScorer in TestCachingCollector

* Revert static instances of NoOpCollector in TestCachingCollector

* Fix seealso cref to CheckHits.CheckNoMatchExplanations

* Fix ticks math in RandomGen.LuceneDate

* Fix XML doc comment paragraph tag in TestFieldCacheRangeFilter

* Revert static instance of AnalyzerAnonymousClass in TestPhraseQuery

* Comment out TestScorerPerf.terms to prevent unused warning

* Remove use of Convert.ToInt32 which boxes in CheckHits

* Specify culture for Convert.ToInt32

* Fix spelling of TransactionalThreadInterrupt class

* Add back redundant override of TestRegexps in case some test runners don't report failures correctly

* Remove unused ConcurrentDictionaryWrapper type
@paulirwin paulirwin mentioned this issue Mar 10, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment