-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider that TLDExtract cache will be used by default when evaluating WPAD #74
Comments
I have resolved the issue in my project by overriding the TLDExtract DiskCache class with my own cache implementation. My implementation reads from a user defined file and keeps the data. It also allows for setting the file contents. I also added a feature to check if the file contents have changed every certain time interval. In my project's case, this will allow us to specify the filename where we want the TLD data stored. The stored data can now be accessed with a privileged writer, unprivileged reader pattern, as the readers will never try to write to the file. Fetching and maintaining the data can then be outsourced to a different module. I will try to contribute to tldextract with such an implementation, then I can try to contribute here based off of that. |
Does this relate to #64? The intent was for PyPAC to use tldextract such that it never goes online and never updates its TLD list. |
We got a crash in our project because we are packaging everything with pyinstaller and the directory structure is different. Hence, why we had to come up with a hack. |
Hi @KarelChanivecky, we're facing the exact same issue but within a packaged application with PyInstaller. Could you share your workaround/hack please? |
@Guts First try setting TLDEXTRACT_CACHE env var to a dir with a list of top level domains. before any pypac imports. In that dir you should have a json file with a list of top level domains. You can find reputable lists online. If that did not work, you can always create your own cache implementation and set it to:
You cache implementation should match the interface of DiskCache defined here: |
@KarelChanivecky thanks for your quick reply and hint. Actually, I'm trying something else: embedding tld files into the final package and set the path using the env var. |
TLDExtract performs an HTTP query to fetch valid top level domains. This is fine, except that this library will be mostly run within the context of a domain where proxy is enforced.
Enterprises that enforce proxying, are also likely to block requests that are not dispatched per policy. For this reason, it doesn't make sense to dispatch an HTTP request with the purpose of evaluating the proxy, as the proxy URL is more likely than not to be needed to dispatch such request.
For this reason, it should be considered that the base case for the library is that TLDExtract will not be able to dispatch this request and that it will fallback to the file with the TLDs.
Hence this library should:
Some of the options used by TLDExtract are not bad at all, however, they are not able to accommodate all cases. For example, within a pyinstaller executable, in which the package directory itself will be the location where the executable is located. In such cases where the application is being distributed on scale, the application may choose to contain a specific directory for such uses. Thus, applying one of the recommendations would be meaningful, and avoid the implementer a deep-dive into foreign code.
The text was updated successfully, but these errors were encountered: