Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Fixes error message when lat and lng are unparseable #763

Merged

Conversation

ahlner
Copy link
Contributor

@ahlner ahlner commented Jan 15, 2023

Pull Request

Related issue

Fixes partially #3007

What does this PR do?

  • Changes function validate_geo_from_json to return a BadLatitudeAndLongitude if lat or lng is a string and not parseable to f64
  • implemented some unittests
  • Derived PartialEq for GeoError to use assert_eq! in tests

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@ahlner ahlner changed the title Fixes error message when lat and lng unparseable Fixes error message when lat and lng are unparseable Jan 15, 2023
@loiclec loiclec self-requested a review January 18, 2023 08:18
Copy link
Contributor

@loiclec loiclec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahlner , thank you for the PR :)

After a bit more testing, I realised the bug lies somewhere else than in the validate_geo_from_json function.

On line 98-102, we have the following bit of code, giving us the internal id (u16) of the _geo field:

    // If the settings specifies that a _geo field must be used therefore we must check the
    // validity of it in all the documents of this batch and this is when we return `Some`.
    let geo_field_id = match documents_batch_index.id("_geo") {
        Some(geo_field_id) if index.sortable_fields(rtxn)?.contains("_geo") => Some(geo_field_id),
        _otherwise => None,
    };

But if _geo is only a filterableAttribute (i.e. not a sortableAttribute), then geo_field_id will be equal to None and the validate_geo_from_json function will not be called. This will cause indexing to progress further, until the extract_geo_points function is called (in milli/src/update/index_documents/extract/extract_geo_points.rs) where a second (redundant) verification is performed. This second verification is the one that returns the wrong error message. However, I don't think it's necessary to change anything in extract_geo_points, since it is not supposed to catch any more errors anyway.

The easiest/best fix is probably to replace the condition if index.sortable_fields(rtxn)?.contains("_geo") with one that checks whether the index contains _geo either in sortable_fields or in filterable_fields.

Let me know if you want to amend your PR or if you'd prefer us to make the fix :)

@ahlner
Copy link
Contributor Author

ahlner commented Jan 18, 2023

Hi @loiclec,

thank you for the answer. I would prefer to write a test like this benchmark: benches/indexing.rs and amend this PR with tests and a fix. Is there already a (integration) test for this functionality?

@loiclec
Copy link
Contributor

loiclec commented Jan 18, 2023

That sounds great! An integration test would be very appreciated 😄
I don't think there is one currently. A good place to write it would be in milli/src/index.rs. You can reference meilisearch/meilisearch#3007 in it.
The integration tests usually look like this:

#[test]
fn bug_3007() { // or a more descriptive name
    let index = TempIndex::new();

    index
        .update_settings(|settings| {
            // set the index's settings
        }).unwrap();

    // add some documents. Here you wouldn't unwrap, but check the content of the error instead.
    index.add_documents(documents!({ "a" : "id"})).unwrap();
}

We like to use inline snapshot tests with insta when we can, but you don't have to use it if you don't want to.

Adds a test for #3007: Wrong error message when lat and lng are
unparseable
@ahlner ahlner force-pushed the fix/3007_wrong_error_lat_lng_unparseable branch from 023f926 to 0e1ae75 Compare January 18, 2023 12:25
@ahlner ahlner requested a review from loiclec January 18, 2023 12:25
@ahlner
Copy link
Contributor Author

ahlner commented Jan 18, 2023

Hi @loiclec,

I've reworked the fix, please review again ;)

@ahlner
Copy link
Contributor Author

ahlner commented Jan 18, 2023

The test "update::index_documents::tests::index_all_flavour_of_geo" is failing now. The format of the longitude ({ "id": 0, "_geo": { "lat": 31, "lng": [42] } }) is now unacceptable - validate_geo_from_json is now used and it doesn't accept this format. Either we change for this format validate_geo_from_json or the format is declared technically invalid. Your decision, I'm ready for all crimes. :D

@ahlner
Copy link
Contributor Author

ahlner commented Jan 18, 2023

Another test fails now: update::index_documents::tests::geo_error

UserError(InvalidGeoField(MissingLatitude { document_id: Number(0) })) vs
UserError(InvalidGeoField(MissingLatitude { document_id: String("\"0\"") }))

The first one comes from (the wrong check) extract_geo_points where the document_id is a Value::Number()

Now it comes from validate_geo_from_json where the document_id is a Value::String() from the debug-impl of DocumentId.

@loiclec Can we change the type in [DocumentId](https://github.com/meilisearch/milli/blob/0c7d1f761e5db6d086f27d3f0f47a97c7f4a5f08/milli/src/update/index_documents/enrich.rs#L245) in Value::Number a slice of u16 or whatever? It seems to be a number but is held in this case as a string.

I've found a more convenient way to refactor debug_id to deserialize the id in the same way as if it's loaded (and not auto-generated).

@ahlner ahlner force-pushed the fix/3007_wrong_error_lat_lng_unparseable branch 2 times, most recently from 5ff0178 to 1dcd538 Compare January 18, 2023 21:25
@loiclec
Copy link
Contributor

loiclec commented Jan 19, 2023

Oh, that's interesting! As far as I can tell, it's a bug that:

{ "id": 0, "_geo": { "lat": 31, "lng": [42] } }

is still sometimes accepted (doc link). But I'll ping @gmourier to make sure, and also @irevoire to judge the impact of it on the dump feature. IMO, it's the perfect opportunity to fix it since we are going to release meilisearch v1.0 very soon :-)

Then, after you delete the whole index_all_flavour_of_geo test, the PR is ready to be merged 😄

@ahlner ahlner force-pushed the fix/3007_wrong_error_lat_lng_unparseable branch from 1dcd538 to a647b4d Compare January 19, 2023 09:05
@ahlner
Copy link
Contributor Author

ahlner commented Jan 19, 2023

Ok, perfekt. There was an formatting issue, I've (hopefully) fixed that.

@ahlner ahlner force-pushed the fix/3007_wrong_error_lat_lng_unparseable branch from a647b4d to a2cd721 Compare January 19, 2023 09:10
@curquiza
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jan 19, 2023
@bors
Copy link
Contributor

bors bot commented Jan 19, 2023

Copy link
Contributor

@loiclec loiclec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR! :)
bors merge

@loiclec loiclec added the API breaking The related changes break the milli API label Jan 19, 2023
@irevoire
Copy link
Member

Let me know if I'm wrong @loiclec, but
image

Is not fixed in this PR, right?

@bors
Copy link
Contributor

bors bot commented Jan 19, 2023

@bors bors bot merged commit 3521a3a into meilisearch:main Jan 19, 2023
@ahlner ahlner deleted the fix/3007_wrong_error_lat_lng_unparseable branch January 19, 2023 15:36
@loiclec
Copy link
Contributor

loiclec commented Jan 23, 2023

@irevoire no, it is fixed :)
Screenshot 2023-01-23 at 09 20 08

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API breaking The related changes break the milli API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants