Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Nested fields #458

Merged
merged 3 commits into from
Apr 7, 2022
Merged

Nested fields #458

merged 3 commits into from
Apr 7, 2022

Conversation

irevoire
Copy link
Member

@irevoire irevoire commented Mar 1, 2022

For the following document:

{
  "id": 1,
  "person": {
    "name": "tamo",
    "age": 25,
  }
}

Suppose the user sets person as a filterable attribute. We need to store person in the filterable obviously. But we also need to keep track of person.name and person.age somewhere.
That’s where I changed a little bit the logic of the engine.

Currently, we have a function called faceted_field that returns the union of the filterable and sortable.
I renamed this function in user_defined_faceted_field. And now, when we finish indexing documents, we look at all the fields and see if they « match » a user_defined_faceted_field.
So in our case:

  • does id match person: 🔴
  • does person.name match person: 🟢
  • does person.age match person: 🟢

And thus, we insert in the database the following faceted fields: person, person.name, person.age.

The good thing about that solution is that we generate everything during the indexing phase, and then during the search, we can access our field without recomputing too much globbing.


Now the bad thing is that I had to create a new db.

And if that was only one db, that would be ok, but actually, I need to do the same for the:

  • Displayed attributes
  • Attributes to retrieve
  • Attributes to highlight
  • Attribute to crop

@Kerollmops
Do you think there is a better way to do it?
Apart from all the code, can we have a problem because we have too many dbs?

@irevoire irevoire force-pushed the nested_fields branch 4 times, most recently from 64a6154 to 0031e6f Compare March 17, 2022 16:13
@irevoire irevoire force-pushed the nested_fields branch 3 times, most recently from 7e3811d to 1a44e33 Compare March 24, 2022 12:54
@Kerollmops
Copy link
Member

@irevoire irevoire requested a review from Kerollmops April 6, 2022 16:40
@irevoire irevoire marked this pull request as ready for review April 6, 2022 16:41
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that declaring these new databases isn't an issue, I find that this is a good way to do that. I haven't finished my review, I need to continue on the transform.rs file but I prefer publishing that than nothing.

milli/Cargo.toml Outdated Show resolved Hide resolved
milli/src/lib.rs Show resolved Hide resolved
milli/src/update/index_documents/mod.rs Outdated Show resolved Hide resolved
milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
milli/src/update/index_documents/transform.rs Show resolved Hide resolved
@irevoire irevoire added the DB breaking The related changes break the DB label Apr 6, 2022
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please publish the benchmarks you ran recently here? And remove all of the unwraps (or at least the problematic ones), this is probably something that you forgot to change 😃

milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
/// Generate the `TransformOutput` based on the given sorter that can be generated from any
/// format like CSV, JSON or JSON stream. This sorter must contain a key that is the document
/// id for the user side and the value must be an obkv where keys are valid fields ids.
pub(crate) fn output_from_sorter<F>(
self,
wtxn: &mut heed::RwTxn,
progress_callback: F,
_progress_callback: F,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could probably be an issue if we no more call this callback, don't you think? It is even easier to use it now that we have only have one loop, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the call back in the loop that compute the field distribution. Not sure it's the best but it's something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should remove this callback entirely actually? Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, remove it if you don't use it, it would be better :)

milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
milli/src/update/index_documents/transform.rs Outdated Show resolved Hide resolved
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this PR, it look good to me! Thanks for having added the flatten-serde-json crate in the repo along with the fuzzer! Now, we will see what people think about it in the RC0+ of Meilisearch 😃

bors merge

@bors
Copy link
Contributor

bors bot commented Apr 7, 2022

@bors bors bot merged commit 80ae020 into main Apr 7, 2022
@bors bors bot deleted the nested_fields branch April 7, 2022 17:09
bors bot added a commit that referenced this pull request Apr 7, 2022
487: Update version (v0.26.0) r=Kerollmops a=curquiza

breaking because of #458 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DB breaking The related changes break the DB
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants