Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category filtering weird behavior #302

Closed
nikcio opened this issue Oct 19, 2022 · 1 comment · Fixed by #304
Closed

Category filtering weird behavior #302

nikcio opened this issue Oct 19, 2022 · 1 comment · Fixed by #304

Comments

@nikcio
Copy link
Contributor

nikcio commented Oct 19, 2022

When creating a query with a category filter like searcher.CreateQuery("Content") I'm getting no results back even tough my items were indexed with the category Content. I found out that this is because of the big C. If I query with searcher.CreateQuery("content") I will get the expected results back. But shouldn't the first query return the correct results and not only the secound one?

Example:

Let's take a test from the source code:

public void NativeQuery_Single_Word()
{
var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);
using (var luceneDir = new RandomIdRAMDirectory())
using (var indexer = GetTestIndex(
luceneDir,
analyzer,
new FieldDefinitionCollection(new FieldDefinition("parentID", FieldDefinitionTypes.Integer))))
{
indexer.IndexItems(new[] {
ValueSet.FromObject(1.ToString(), "content",
new { nodeName = "location 1", bodyText = "Zanzibar is in Africa"}),
ValueSet.FromObject(2.ToString(), "content",
new { nodeName = "location 2", bodyText = "In Canada there is a town called Sydney in Nova Scotia"}),
ValueSet.FromObject(3.ToString(), "content",
new { nodeName = "location 3", bodyText = "Sydney is the capital of NSW in Australia"})
});
var searcher = indexer.Searcher;
var query = searcher.CreateQuery("content").NativeQuery("sydney");
Console.WriteLine(query);
var results = query.Execute();
Assert.AreEqual(2, results.TotalItemCount);
}
}

If you here change the category of the indexed items to cOntent you will see that the test will pass even without chaning the category in the CreateQuery statement:

public void NativeQuery_Single_Word()
{
    var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);
    using (var luceneDir = new RandomIdRAMDirectory())
    using (var indexer = GetTestIndex(
        luceneDir,
        analyzer,
        new FieldDefinitionCollection(new FieldDefinition("parentID", FieldDefinitionTypes.Integer))))
    {
        indexer.IndexItems(new[] {
            ValueSet.FromObject(1.ToString(), "cOntent",
                new { nodeName = "location 1", bodyText = "Zanzibar is in Africa"}),
            ValueSet.FromObject(2.ToString(), "cOntent",
                new { nodeName = "location 2", bodyText = "In Canada there is a town called Sydney in Nova Scotia"}),
            ValueSet.FromObject(3.ToString(), "cOntent",
                new { nodeName = "location 3", bodyText = "Sydney is the capital of NSW in Australia"})
            });

        var searcher = indexer.Searcher;

        var query = searcher.CreateQuery("content").NativeQuery("sydney");

        Console.WriteLine(query);

        var results = query.Execute();

        Assert.AreEqual(2, results.TotalItemCount);
    }
}

But as soon as you change the category parameter in ``CreateQueryto match the actual category (toCreateQuery("cOntent")`) the test will fail.

Expected result

I expected that the category would be case sensitive or case intensive and not forced lowercase.

Workaround

To workaround the issue I tried to add the field manually with the Field() statement. This seems to make the category identifier case insensitive meaning that both content and cOntent return the expected results.

Example

public void NativeQuery_Single_Word()
{
    var analyzer = new StandardAnalyzer(LuceneInfo.CurrentVersion);
    using (var luceneDir = new RandomIdRAMDirectory())
    using (var indexer = GetTestIndex(
        luceneDir,
        analyzer,
        new FieldDefinitionCollection(new FieldDefinition("parentID", FieldDefinitionTypes.Integer))))
    {
        indexer.IndexItems(new[] {
            ValueSet.FromObject(1.ToString(), "cOntent",
                new { nodeName = "location 1", bodyText = "Zanzibar is in Africa"}),
            ValueSet.FromObject(2.ToString(), "cOntent",
                new { nodeName = "location 2", bodyText = "In Canada there is a town called Sydney in Nova Scotia"}),
            ValueSet.FromObject(3.ToString(), "cOntent",
                new { nodeName = "location 3", bodyText = "Sydney is the capital of NSW in Australia"})
            });

        var searcher = indexer.Searcher;

        var query = searcher.CreateQuery().Field(ExamineFieldNames.CategoryFieldName, "cOntent").And().NativeQuery("sydney");

        Console.WriteLine(query);

        var results = query.Execute();

        Assert.AreEqual(2, results.TotalItemCount);
    }
}
@Shazwazza
Copy link
Owner

Hi, this all has to do with Lucene analysis. The StandardAnalyzer uses a LowerCase filter which means that anything that goes into the index for the category field (so long as you haven't specified a custom analyzer for that field) will be lowercased when it is analyzed. Analyzers work the opposite way as well, they not only change text on the way into the index, they also change text in your query when it is parsed.

So, for the example that this always works var query = searcher.CreateQuery("content").NativeQuery("sydney"); is because even if you are indexing "cOntent", it will be analyzed as "content" so this query matches.

The reason this CreateQuery("cOntent") will fail is because the query parser is probably not being used for that query under the hood whereas the underlying mechanism for .Field(ExamineFieldNames.CategoryFieldName, "cOntent") is using the query parser - so it will end up like "content".

Essentially, you've found a bug though. The mechanism for searching on category should also probably use the query parser.

nikcio added a commit to nikcio/Examine that referenced this issue Oct 24, 2022
@Shazwazza Shazwazza added the bug label Oct 28, 2022
Shazwazza added a commit that referenced this issue Oct 28, 2022
fix: Fixed category not using queryparser (#302)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants