Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the scope of offline data #2461

Closed
1 task
teolemon opened this issue Jun 30, 2022 · 5 comments
Closed
1 task

Define the scope of offline data #2461

teolemon opened this issue Jun 30, 2022 · 5 comments

Comments

@teolemon
Copy link
Member

teolemon commented Jun 30, 2022

What

  • On the old iOS version, we stored: product name, brand, nutriscore, nova, ecoscore for every product we have in the db.
  • Define the scope of offline data for the new version, keeping in mind that we should probably let the user decide, with a sane default

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I
  • Create a server side solution to dump all barcodes

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image)
https://squoosh.app/editor

Part of

@monsieurtanuki
Copy link
Contributor

@teolemon That would mean extracting those product fields from the server:

  • NAME
  • BRANDS
  • BARCODE
  • ATTRIBUTE_GROUPS

That can be implemented in pure SQL with the following tables:

create table offline_product(
    id int autoincrement primary key,
    barcode text unique key not null,
    brands text,
    name text);

create table offline_attribute(
    id int autoincrement primary key,
    text_id text unique key not null);

create table offline_product_attribute(
    product_id int not null,
    attribute_id int not null,
    score real not null,
    primary key (product_id, attribute_id));

Or something more compact, like a dedicated table with all the attributes as columns.

The thing is, that's a good idea to cache tons of products locally, but you'll get very poor performances if you keep json there. What would a typical query be?
For the moment we're dumb in Smoothie, we just ask for a barcode and we get the corresponding json product. That's the primary key, fair enough.
What's the purpose of the offline database? If it's the same getProductFromBarcode, we can keep json.
If it's "get me other products from the same brand / the same category / that suit me better", we need to create other table columns. If we don't, it means that each query will have to json-decode the whole database.

We would be ignoring these ones:

  • NUTRISCORE (duplicated with ATTRIBUTE_GROUPS)
  • FRONT_IMAGE
  • IMAGE_FRONT_SMALL_URL
  • IMAGE_FRONT_URL
  • IMAGE_INGREDIENTS_URL
  • IMAGE_NUTRITION_URL
  • IMAGE_PACKAGING_URL
  • SELECTED_IMAGE
  • QUANTITY
  • SERVING_SIZE
  • STORES
  • PACKAGING_QUANTITY
  • PACKAGING
  • PACKAGING_TAGS
  • PACKAGING_TEXT_IN_LANGUAGES
  • PACKAGING_TEXT_ALL_LANGUAGES
  • NO_NUTRITION_DATA
  • NUTRIMENTS
  • NUTRIENT_LEVELS
  • NUTRIMENT_ENERGY_UNIT
  • ADDITIVES
  • INGREDIENTS_ANALYSIS_TAGS
  • INGREDIENTS_TEXT
  • LABELS_TAGS
  • LABELS_TAGS_IN_LANGUAGES
  • ENVIRONMENT_IMPACT_LEVELS
  • COMPARED_TO_CATEGORY
  • CATEGORIES_TAGS
  • CATEGORIES_TAGS_IN_LANGUAGES
  • LANGUAGE
  • STATES_TAGS
  • ECOSCORE_DATA
  • ECOSCORE_GRADE
  • ECOSCORE_SCORE
  • KNOWLEDGE_PANELS
  • COUNTRIES
  • COUNTRIES_TAGS
  • COUNTRIES_TAGS_IN_LANGUAGES
  • EMB_CODES

@monsieurtanuki
Copy link
Contributor

I'm about to start a new project called "fast food":

  • experimental flutter project
  • access to offline food data in read-only mode
  • the most simple UI, no camera, no barcode scan
  • the most compact SQL database and the best performances

Creating a project aside sounds like a good idea to me:

  • no interferences with the rest of Smoothie
  • best conditions to compute performances
  • the failed and successful tries can be an inspiration for Smoothie

@teolemon
Copy link
Member Author

teolemon commented Jul 2, 2022

Note that @AshAman999 is working on this as part of his Google Summer of Code project: https://wiki.openfoodfacts.org/GSOC_2022_-_Offline_Smoothie

@monsieurtanuki
Copy link
Contributor

@teolemon @AshAman999 Oops, then I stop.
I would suggest to do it first in a separate project and to focus on the read-only mode first.

@teolemon teolemon added this to the Offline Scan milestone Jul 4, 2022
@teolemon
Copy link
Member Author

teolemon commented Jul 7, 2022

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I
  • Create a server side solution to dump all barcodes

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image)
https://squoosh.app/editor

@openfoodfacts openfoodfacts locked and limited conversation to collaborators Jul 19, 2024
@teolemon teolemon converted this issue into discussion #5492 Jul 19, 2024
@github-project-automation github-project-automation bot moved this from 💬 To discuss and validate to 🎊 Done in 🤳🥫 The Open Food Facts mobile app (Android & iOS) Jul 19, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Development

No branches or pull requests

2 participants