-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It should be possible to prevent the use of the fallback snapshot when using the CLI #259
Comments
Actually, I'm not sure I agree with what I just wrote. If you run something with the fallback turned on, it makes sense that if you later run an app with the fallback turned off, you wouldn't want it grabbing items from the cache which have previously had the fallback turned on. Instead I propose passing the fallback option through the CLI. This allows you to explicitly state when populating the cache whether it should be allowed to use the fallback as a source of truth. |
Related: #251 |
Interesting, hadn't read that issue. Previously, it was totally valid to run But under version 3.0, because it considers the But then again I don't shoulder the burden of maintaining any extra command-line options, so I can't exactly judge 😅 |
Wow, thank you for thinking through that! I'm sold. I did previously consider maintaining additional CLI args a burden. Perhaps it's actually a fun challenge. Like automatically generating the interface, if it became truly burdensome. But that's way in the distance. Your fix today is simple. |
When you initialise the cache, either through the CLI or through
update(fetch_now=True)
, it calls it on aTLDExtract
object that has the default suffix list andfallback_to_snapshot
set toTrue
. This causes problems if you later try to use aTLDExtract
object that points to the same cache directory, but withfallback_to_snapshot
set toFalse
.Because
fallback_to_snapshot
is one of the hashed arguments, it will generate a totally different cache path, and miss the cache set previously throughupdate
.I believe it would make sense to make one of two modifications:
fallback_to_snapshot
to theupdate
function / CLI.fallback_to_snapshot
parameter when constructing the cache path.I believe option 2 is better for the following reasons:It doesn't make sense to callupdate
in a scenario where the intent is to usefallback_to_snapshot
, because you can usetldextract
without pre-seeding the cache, and just use the snapshot instead.The pivotal element should only be the set of public suffix urls. If those change, it makes sense to use a different cache file for lookups.To summarise (assuming the prefix list urls are the same):Iffallback_to_snapshot
is turned on or off after you cache items it won't matter because you've already cached the suffix lists, and that's where it should get the data from.I can't think of a scenario where you would have the cache off and have two differentTLDExtract
objects (with the same prefix list urls) where for one the fallback is on, for the other it is off.I'll make a PR to fix this using method 2. But I'm curious to hear if you agree with my analysis.~
The text was updated successfully, but these errors were encountered: