-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
None of the cache directory is writable #61
Comments
Hi, thanks for reporting this issue. OK there is option for setting up this directory when you are using the class it self within your python code. I missed this and forget to add this feature as parameter to the Thanks! |
@lipoja do you mind sharing how to do it in python code as I saw in code URLExtract takes a CacheFile type argument |
It can be set during object initialization: |
I am developing a tool, I use pyinstaller, with the noconsole option and it gives cache problems. |
@B16f00t Does it crash? Is there any error message shown? Or it just does not work at all? |
My code:
ERROR: |
@B16f00t Turn on logging and see if there are any other messages. It looks like the path "D:\" does not exist, or is not writable for the user. Let me know if it logs some other errors (or info messages), thanks. |
Hi @lipoja, I have the same issue. The script when uploaded as AWS lambda, the script fails. It seems we set the "cache_dir" in the initialization step. But the "cache_dir" wasn't given in def _load_cached_tlds(self):
"""
Loads TLDs from cached file to set.
:return: Set of current TLDs
:rtype: set
"""
# check if cached file is readable
if not os.access(self._tld_list_path, os.R_OK):
self._logger.error("Cached file is not readable for current "
"user. ({})".format(self._tld_list_path))
raise CacheFileError(
"Cached file is not readable for current user."
)
set_of_tlds = set()
with filelock.FileLock(self._get_cache_lock_file_path()):
line 204 may has the same problem as well Thank you. |
Hi @lipoja, is there a fix for this yet? There's a Below are my observations before I saw that branch ... There appear to be two intermingled issues here:
I was going to suggest that we shouldn't bother getting a lock on the distributed file, but I see that you allow the module to update the distributed file. I feel like that's an anti-pattern and we should have a cache file somewhere in user-space (or even /tmp if we can't write somewhere more permanent). The default file should be a read-only failover. Unless that is changed, then the only solution is to create a lock file somewhere we are (reasonably) always allowed to write. Given it's a lock file and thus very ephemeral we could just write that to /tmp or the OS equivalent. Or we could allow the user to specify a |
Case in point why we should not be modifying a distributed file:
|
I've grabbed the
|
Any update on this bug? Would love the use this in AWS. Are folks just using a package from the fix_cache branch with Ricks work-around? Is there a way to PIP install that branch? |
@paulfdietrich said:
You can use this, but I'm not going to be tracking any changes to the official repo. Caveat emptor.
|
Hi, thank you both for your patience and also for time spend on reporting this issue. I kept the solution that was in the Thank you! |
I think this can be solved by adding tlds-alpha-by-domain.txt to .gitignore. I did that already so it should be fine once your .gitignore is updated from latest master branch. |
This fix is released on pypi. If somebody has a chance to run it on aws it would be graet. |
… used in lambda, so change the logic to regular expressions Please refer lipoja/URLExtract#61
We are using URLExtract in one of our python projects. The python script works fine locally but when uploaded as AWS lambda the script fails as none of the cache directories is writable.
Ideally, there should have been a way to provide the cache directory path as a URL parameter in the URLExtract constructor itself and the default should be whatever is currently
The text was updated successfully, but these errors were encountered: