-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Local Copy of the Open Food Facts (OFF) DB #604
Comments
What's TF-IDF? |
Sorry, Term Frequency - Inverse Document Frequency, it's probably a good starting point for ranking in a situation like this. It'll give us a score for each item matching a search query, and incorporate a notion of how valuable each search term is (if multiple search terms are provided). This may not play so nicely with typeahead, but is probably a good initial direction to investigate. |
I don't like the idea of bloating the APK. I'd rather any offline DB was part of a separate download that the user can opt into. Apart from that it all sounds like a good idea to me. |
Got it - I'd avoided that because then we'd need to host it. However, perhaps the OFF data manipulation + output could live in a separate Github project. That way Github hosts the output, and the app could download from that hosted url. I'll try to play around with this when I get a chance. |
To me, the biggest reason why the OFF search in Waistline is so bad and I rarely use it is because of its data quality and quantity issues. You only find what you're looking for if (a) the item exists in the OFF database at all, and (b) the product name or brand in OFF actually matches the one printed on the item (i.e. the search term you enter). All too often, these are not a given. That's why I always prefer to search for products via the barcode, and only resort to text based search if it's absolutely necessary. I really like your idea, but I don't think it would fix the underlying problem of the OFF database. Sorry for being so pessimistic, but I'm afraid this whole thing will just turn out to be a huge amount of work, but in the end lead to no substantial improvement in the user experience. |
By the way, it looks like the OFF search API has a |
I use the search all the time, I don't think it's that bad. Sometimes the data isn't quite right but that's the same when you scan a barcode. I agree that an offline database isn't going to make a huge difference overall, but if someone else wants to do it and it doesn't negatively impact my use of the app then I'm happy to include it. We're already using |
Sure, if someone else wants to implement it and it's an optional download, then it's fine. I'm not advocating against this being added. But I do think that it would require a lot of effort, and the result might not actually be too different from what we have now, so you have been warned ;) Also, there are a few problems with the approach outlined in the original comment:
Oh, that's interesting. But it's currently set to |
Yes I think I would want an option to disable this feature, although it could also be used to typeahead in your local DB as well as the local OFF DB.
Yeah we can play around with it. |
That's a very valid concern, and I agree with the risk. I built an offline search functionality, and was able to see a real improvement for my queries. However, the technical complexity may not be worth it. I'm treating this as an experiment with a high chance of failure :)
Great points - given this would be an optional download, size becomes less of a concern. I'm investigating an alternative approach - having a much larger file that contains everything needed for the searching + item detail view (ie, all nutritional information). This avoids the complexity of a second fetch. The data grows to ~160MB (after stripping out unneeded fields + using parquet + gzip'ing). Large, but probably acceptable for people who want this functionality. I'll continue playing around with this and see how the integration would look.
+1, I think that's a great idea. |
Quick update here - I've been playing around with getting a local index, but Cordova doesn't make it easy. Since we want to do something smart, IndexedDB isn't really sufficient for our needs. I'm instead looking into creating a new OFF API that would support this usecase. It would also hopefully help other developers. I'll circle back here if I have any luck. However, for the moment I'll close this issue. |
ahaha :-) |
@teolemon - ahaha, small world! :) For anyone else following along - the new API is a WIP at: https://github.com/openfoodfacts/openfoodfacts-search |
@teolemon Tell me more |
People are pissed, cf Twitter |
That's a crazy move, I expect they'll make a u-turn on it. Hopefully more users will move to free alternatives like Waistline. |
I will certainly tell my friends about the My Fitness Pal Apocalypse and use that to pitch Waistline to them 😄 |
Hey, this appears to be not actively developed, but I would like to add (as a user):
|
FYI, the new search API is now live at https://search.openfoodfacts.org/docs |
Is this a breaking change for existing apps? |
The 2 existing API will co-exist, but it's recommended to update |
I suspect one feature that is holding back adoption of Waistline is the search functionality. The OFF Search API has a few limitations:
We could solve these concerns through an offline version of the DB.
My main concern with doing this is the increase in APK size. I've written a simple Python script that takes the OFF csv, and populates a smaller csv with only the necessary fields (product name, serving size, calories - only what's displayed in the search results view). It does this only for items that contain calories + serving size, resulting in ~500k items. This compressed is ~10MB, but will grow over time as the DB grows.
Detailed nutritional information could be populated through a separate call if the item detail view is displayed, or the item is added to the diary.
There's some downsides:
However, the improved search functionality seems worth those downsides.
If there's sufficient interest here, I can try putting together a PR when I can find some spare time, (but couldn't commit to it right now).
The work here is non-trivial and would appear to involve:
The text was updated successfully, but these errors were encountered: