It should be possible to prevent the use of the fallback snapshot when using the CLI #259

nicholas-plutoflume · 2022-03-17T17:39:21Z

When you initialise the cache, either through the CLI or through update(fetch_now=True), it calls it on a TLDExtract object that has the default suffix list and fallback_to_snapshot set to True. This causes problems if you later try to use a TLDExtract object that points to the same cache directory, but with fallback_to_snapshot set to False.
Because fallback_to_snapshot is one of the hashed arguments, it will generate a totally different cache path, and miss the cache set previously through update.

I believe it would make sense to make one of two modifications:

Allow the user to explicitly pass through fallback_to_snapshot to the update function / CLI.
Or, ignore the fallback_to_snapshot parameter when constructing the cache path.

~~I believe option 2 is better for the following reasons:~~

It doesn't make sense to call update in a scenario where the intent is to use fallback_to_snapshot, because you can use tldextract without pre-seeding the cache, and just use the snapshot instead.
~~The pivotal element should only be the set of public suffix urls. If those change, it makes sense to use a different cache file for lookups.~~

~~To summarise (assuming the prefix list urls are the same):~~

~~If fallback_to_snapshot is turned on or off after you cache items it won't matter because you've already cached the suffix lists, and that's where it should get the data from.~~
~~I can't think of a scenario where you would have the cache off and have two different TLDExtract objects (with the same prefix list urls) where for one the fallback is on, for the other it is off.~~

~~I'll make a PR to fix this using method 2. But I'm curious to hear if you agree with my analysis.~~
~

The text was updated successfully, but these errors were encountered:

nicholas-plutoflume · 2022-03-17T17:56:35Z

Actually, I'm not sure I agree with what I just wrote.

If you run something with the fallback turned on, it makes sense that if you later run an app with the fallback turned off, you wouldn't want it grabbing items from the cache which have previously had the fallback turned on.

Instead I propose passing the fallback option through the CLI. This allows you to explicitly state when populating the cache whether it should be allowed to use the fallback as a source of truth.

john-kurkowski · 2022-03-19T19:43:17Z

Related: #251

nicholas-plutoflume · 2022-03-21T11:51:06Z

Interesting, hadn't read that issue.
I'd argue it's worth adding that CLI option, because with the change in caching behaviour previously working code no longer works.

Previously, it was totally valid to run tldextract --update --private_domains, then in your code create the object with TLDExtract(suffix_list_urls=None, fallback_to_snapshot=False).
The intention there was to prevent the code from ever using the internet or snapshot if the cache isn't populated.

But under version 3.0, because it considers the fallback_to_snapshot as True when running --update, you then have to initialise your object with fallback_to_snapshot=True, even if you really, really don't want it to fallback to the snapshot. With the PR I linked this should be fixed, because you can now populate your cache with the same options as the object that you later create, preemptively preventing you from ever falling back on the snapshot.

But then again I don't shoulder the burden of maintaining any extra command-line options, so I can't exactly judge 😅

john-kurkowski · 2022-05-03T19:36:42Z

Wow, thank you for thinking through that! I'm sold.

I did previously consider maintaining additional CLI args a burden. Perhaps it's actually a fun challenge. Like automatically generating the interface, if it became truly burdensome. But that's way in the distance. Your fix today is simple.

nicholas-plutoflume mentioned this issue Mar 17, 2022

Add CLI option to not fall back to the snapshot #260

Merged

nicholas-plutoflume changed the title ~~fallback_to_snapshot should not influence the cache file name~~ It should be possible to prevent the use of the fallback snapshot when using the CLI Mar 17, 2022

john-kurkowski closed this as completed in #260 May 3, 2022

john-kurkowski mentioned this issue May 3, 2022

control fallback_to_snapshot behavior from cli #251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It should be possible to prevent the use of the fallback snapshot when using the CLI #259

It should be possible to prevent the use of the fallback snapshot when using the CLI #259

nicholas-plutoflume commented Mar 17, 2022 •

edited

Loading

nicholas-plutoflume commented Mar 17, 2022 •

edited

Loading

john-kurkowski commented Mar 19, 2022

nicholas-plutoflume commented Mar 21, 2022 •

edited

Loading

john-kurkowski commented May 3, 2022

It should be possible to prevent the use of the fallback snapshot when using the CLI #259

It should be possible to prevent the use of the fallback snapshot when using the CLI #259

Comments

nicholas-plutoflume commented Mar 17, 2022 • edited Loading

nicholas-plutoflume commented Mar 17, 2022 • edited Loading

john-kurkowski commented Mar 19, 2022

nicholas-plutoflume commented Mar 21, 2022 • edited Loading

john-kurkowski commented May 3, 2022

nicholas-plutoflume commented Mar 17, 2022 •

edited

Loading

nicholas-plutoflume commented Mar 17, 2022 •

edited

Loading

nicholas-plutoflume commented Mar 21, 2022 •

edited

Loading