Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate iD presets #4718

Closed
bhousel opened this issue Nov 20, 2020 · 26 comments
Closed

Generate iD presets #4718

bhousel opened this issue Nov 20, 2020 · 26 comments
Labels
enhancement Actionable - add an enhancement to the source code

Comments

@bhousel
Copy link
Member

bhousel commented Nov 20, 2020

Followup from openstreetmap/iD#8175
iD has spun off their presets into some separate projects. From what I can tell:

This is great for the name-suggestion-index project because now we can depend on those other packages instead of iD. We can use those to build and distribute the NSI presets as a compatible addon to iD, and we doesn't have to be on iD's release schedule (iD can fetch the latest compatible NSI preset pack at runtime).

@bhousel bhousel added the enhancement Actionable - add an enhancement to the source code label Nov 20, 2020
@westnordost
Copy link
Contributor

As presets from NSI have now been removed from id-tagging-schema/dist/presets.json, there is at the moment no place to source NSI presets from.

@bryceco
Copy link

bryceco commented Dec 4, 2020

When the new nsi-presets.json becomes available will it include the imageURL field? Currently I can't figure out where iD is pulling that information from.Do we have a timeline for this so we data consumers know whether to wait or roll our own solution?

Not trying to pressure you, I'm just looking for clarity on the situation.

@westnordost
Copy link
Contributor

westnordost commented Dec 4, 2020

Do you plan to package the brand images in GoMap!!, @bryceco ? I looked into that myself today, wrote a script that would download them all. It's about 8MB for all the images in a size of 50x50px. (Ergo it would be around 32MB for 100x100 and about 128MB for the size in which iD queries them on demand - 200x200px)

@bryceco
Copy link

bryceco commented Dec 4, 2020

My existing code downloads brand images on demand and caches them on disk, so I'd start with that. I hadn't considered packaging them with the app which is currently < 20MB, but that's a reasonable approach as well.

@bhousel
Copy link
Member Author

bhousel commented Dec 4, 2020

I'm also considering cutting the files up into regions, so for example someone working in North America doesn't need to download the European presets, etc. I don't know how big the files will end up being, but NSI has grown a lot in the past year.

@bryceco
Copy link

bryceco commented Dec 4, 2020

For me the NSI will be packaged with the app, so regions isn't necessary. Maybe dynamic updating will be supported once the update mechanism is known to be stable.

bhousel added a commit that referenced this issue Dec 4, 2020
@westnordost
Copy link
Contributor

westnordost commented Dec 4, 2020

I'm also considering cutting the files up into regions

Loading all the presets + default translation(s) currently takes 4s in StreetComplete (done in a background thread of course), the biggest part of that of course are the brand names. With the upcoming version of osmfeatures, loading it takes even 7s (it takes longer because various indices are built on initialization in order to speed up accessing the presets by term and/or tags).

So, this might be advisable. I'd even recommend splitting it up by country. However, this should be done later, because it will also increase complexity. For example brands that exist both in US and DE (but no other parts of the world) will then appear in two separate presets.jsons and thus whoever parses this needs to take care to properly merge the parsed brand features - not to duplicate them.

@bhousel
Copy link
Member Author

bhousel commented Dec 4, 2020

I'm working on this today so we'll see where the file sizes end up. NSI has > 10k presets in it now (the actual number is a bit smaller, as they haven't all been mapped to wikidata identifiers yet):

Screen Shot 2020-12-04 at 12 55 31 PM

@bhousel bhousel closed this as completed in 281bf2b Dec 5, 2020
@bhousel
Copy link
Member Author

bhousel commented Dec 5, 2020

I did this 🎊
They are generated under dist/presets/

Screen Shot 2020-12-05 at 12 01 56 AM

@bryceco
Copy link

bryceco commented Dec 5, 2020

Thanks for the fast turnaround Bryan!

@bryceco
Copy link

bryceco commented Dec 5, 2020

What is the meaning of the trailing hex number?

"amenity/fast_food/burger/mcdonalds-658eea": {

@bryceco
Copy link

bryceco commented Dec 5, 2020

Some entries have an odd location:

"locationSet": {"include": ["001"]},
"locationSet": {"include": ["150"]},

@westnordost
Copy link
Contributor

westnordost commented Dec 5, 2020

Some entries have an odd location

The schema is documented here: https://github.com/ideditor/schema-builder

locationSet with include and exclude replace countryCodes and notCountryCodes. The entries in include and exclude may now not only include ISO 3166-1 alpha 2 codes but also UN M.49 numeric codes, coordinates (with 25km radius), links to geojson files that contain boundaries, maybe more. Basically anything that the javascript library country-coder supports, even emoji flags.

The osmfeatures library and thus StreetComplete doesn't support any of this.

@westnordost
Copy link
Contributor

@bhousel
Regarding for example

  "amenity/atm/firststatebank-2790cb": {
    "name": "First State Bank (Mississippi)",
    "locationSet": {"include": ["mississippi.geojson"]},

What could be understood more easily if instead ISO 3166-2 codes were used, i.e. "locationSet": {"include": ["US-MS"]}

@bhousel
Copy link
Member Author

bhousel commented Dec 5, 2020

What could be understood more easily if instead ISO 3166-2 codes were used, i.e. "locationSet": {"include": ["US-MS"]}

We support whatever country-coder supports, and people can supply their own geojson shape for anything beyond that. Check out location-conflation for more info, and this webpage where you can play around with them.

I don't think country-coder will be tracking boundaries down to the state level anytime soon, though we did discuss it a bit here: rapideditor/country-coder#26

I also think it's totally ok to just drop presets that you can't support because they contain odd locations. The vast majority of stuff in NSI is just a normal country code include or world 001.

You'll miss out on stuff like Quebec KFC/PFK or Price Chopper Kansas City, but these are the edge cases that I built location-conflation for.

@bhousel
Copy link
Member Author

bhousel commented Dec 5, 2020

What is the meaning of the trailing hex number?

The identifiers are supposed to be unique across a key/value/locationset and stable as long as those things don't change..
They disambiguate between a whole bunch of different McDonalds concepts:

Or Tesco:

The identifiers also help us a lot because they are stable enough that we can push them to Wikidata, establishing the link from there back to our project.

More on identifiers here: #3995

@bryceco
Copy link

bryceco commented Dec 5, 2020

Okay everything makes sense and looks good.

I also think it's totally ok to just drop presets that you can't support because they contain odd locations. The vast majority of stuff in NSI is just a normal country code include or world 001.

Yeah, my initial approach will be to convert NSI locations back to the old "countryCodes" format:

  • I don't have a good way to interpret the more detailed location info
  • I want to have a single code path that works with both presets and NSI

@bryceco
Copy link

bryceco commented Dec 6, 2020

Do you plan to package the brand images in GoMap!!, @bryceco ? I looked into that myself today, wrote a script that would download them all.

Due to the fact that a large number of the images are now SVG, and iOS doesn't support SVG natively, I'm thinking I'm probably going to try packaging the images along with the presets. Can you point me to your download script? I wrote my own but it doesn't know how to properly set the correct file extension when it's ambiguous (bryceco@yahoo.com if you're prefer to communicate directly).

@bryceco
Copy link

bryceco commented Dec 6, 2020

I'm still unclear about where the imageURL originates. But I do notice that every URL with a width has width=150. Is this something we should consider a placeholder that we can change?

@kymckay
Copy link
Collaborator

kymckay commented Dec 6, 2020

I'm still unclear about where the imageURL originates.

@bryceco Think this may be what you're looking for

// P18 - Image (use this for flags)
// P154 - Logo Image
const imageProp = (meta.what === 'flag' ? 'P18' : 'P154');
let imageFile = getClaimValue(entity, imageProp);
if (imageFile) {
const re = /\.svg$/i;
if (re.test(imageFile)) {
imageFile = imageFile.replace(/\s/g, '_'); // 'Flag of Alaska.svg' -> 'Flag_of_Alaska.svg'
const hash = crypto.createHash('md5').update(imageFile).digest('hex');
const x = hash.slice(0, 1);
const xx = hash.slice(0, 2);
target.logos.wikidata = `https://upload.wikimedia.org/wikipedia/commons/${x}/${xx}/${imageFile}`;
} else {
target.logos.wikidata = 'https://commons.wikimedia.org/w/index.php?' +
utilQsString({ title: `Special:Redirect/file/${imageFile}`, width: 150 });
}
}

@westnordost
Copy link
Contributor

Yeah, my initial approach will be to convert NSI locations back to the old "countryCodes"

Note that the id-tagging-schema presets also use locationSet now. (In theory, not sure if there are any presets that are actually limited to certain countries.)

Due to the fact that a large number of the images are now SVG, and iOS doesn't support SVG natively, I'm thinking I'm probably going to try packaging the images along with the presets.

This is the download script: https://github.com/westnordost/StreetComplete/blob/existance/buildSrc/src/main/java/DownloadBrandLogosTask.kt (replace existance with master if the link does not work anymore).
So you see, nothing elaborate about it. Parses the JSON and then downloads each imageURL, uses the id as file name.

The image URLs linked in the nsi-id-presets.json all link to a brand picture of about 150x150 up to 200x200px. If you download them all, this should be about 128MB. This is why the script modifies the URL in order to download the small (~50x50px) version of the brand logos (~8MB in total).
I'm also interested to see your final download script for the brand images, maybe you can link it here when you are done with it.

@bhousel
Copy link
Member Author

bhousel commented Dec 6, 2020

Thanks @kymckay for the code pointer!
Yes the urls also all end up in that dist/wikidata.json file, which is our cache of useful wikidata values that we use to build the presets. I'm not sure whether that would be helpful to @bryceco or @westnordost ?

The wiki commons URLs use that Special:Redirect API to resize them to whatever size you want. You could run the SVG files through it too to have them rasterized (and that's actually what we were doing up until a few days ago when I added that SVG code in #4758)

I'm just now realizing I could have used the Special:Redirect URL to avoid doing the md5sum stuff to get the svg file location. They are super handy but I can't find the documentation for them anywhere. @nyurik clued me into these, so maybe he knows where the docs are for them?

@bryceco
Copy link

bryceco commented Dec 7, 2020

My download script is here: https://github.com/bryceco/GoMap/blob/master/src/presets/getBrandIcons.py
Taking the URLs as they stand my total size is 95MB.
If I rasterize the SVGs at 180x180 its 78MB (there are some huge SVGs).
Using a lossy image packer gets it to 68MB.
Still too big, but I haven't yet decided what to do about it. I might have to just shrink them to 60x60 (my display size) and take the visual hit on retina displays.

@bryceco
Copy link

bryceco commented Dec 10, 2020

Loading all the presets + default translation(s) currently takes 4s in StreetComplete

It doesn't seem it should take that long. My testing on various devices, loading, translating and building indices:

  • Last year's iPhone Pro: 0.2 seconds.
  • 2013 iPhone 5S (A7 chip): 3.2 seconds
  • 1st Gen 2015 iPad Mini (32-bit A5 chip): 7-8 seconds

@westnordost
Copy link
Contributor

I noticed IO is slowed down quite a lot when the debugger is connected. Without the debugger, to load presets + default translation + NSI:

  • Samsung S10e (specs comparable with iPhone XS): 1.3 seconds.

Still quite a bit slower though. Do you parse the json with a stream parser or a document parser?

@bryceco
Copy link

bryceco commented Mar 7, 2021

I’m using a document parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Actionable - add an enhancement to the source code
Projects
None yet
Development

No branches or pull requests

4 participants