Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more weird caching issues. #2

Merged
merged 6 commits into from
Oct 26, 2018
Merged

No more weird caching issues. #2

merged 6 commits into from
Oct 26, 2018

Conversation

brycedrennan
Copy link

@brycedrennan brycedrennan commented Sep 10, 2018

Our pull request to the parent library has been ignored. Lets deploy to our private repo.

john-kurkowski#144

This is already deployed to pypi.cu and in use now.
http://pypi.cu/root/circleup/tldextract/3.0.0.circleup

Copy link

@jessevogt jessevogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not comfortable with forking at this time since we would either lose access to new fixes from upstream OR would need to take on the work of manually keeping in sync. I would rather investigate implementing these fixes on our side external to tldextract.

@brycedrennan
Copy link
Author

I do wish I had explored that route more as that would have been easier to maintain. There is some reason to think its not possible though because of how the library is designed.

I still think we should fork, as we have done in other cases because:

  • This solves a problem that has come up repeatedly for multiple people on our team and is very confusing when it happens. The problem is that whether you choose to include private domains is cached "forever". If the next call to the library specifies something different than the first call, then it's silently ignored, and incorrect results are returned. This is totally unexpected and hard to debug. It also totally precludes different parts of code using different lists for different purposes.
  • I think its unlikely we'll need the upstream changes. The library is an overcomplicated way of downloading a text file of regular expressions. It's already doing what we need.
  • It's not clear there will even be upstream changes. There have been no commits for 9 months. No response to this PR.
  • These changes have been live for a month.

Copy link

@jessevogt jessevogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If backing out these changes and fixing via the method I mentioned in my initial comment is not an option, please update readme for this repo to mention how/when this fork happened + additional relevant background around the changes we introduced. We should either remove or update the badges as well since they still point to the upstream version.

if not os.path.isfile(cache_path):
result = func(*args, **kwargs)
with open(cache_path, 'w') as cache_file:
json.dump(result, cache_file)

with open(cache_path) as cache_file:
return json.load(cache_file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the read also be protected by the lock?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch will fix

@jessevogt
Copy link

Pleas also update permissions on this repo. I am currently seeing the following message:

image

@brycedrennan
Copy link
Author

good feedback. will do.

@brycedrennan
Copy link
Author

@jessevogt It locks for reads now and the team has access to the repo.

@brycedrennan brycedrennan merged commit f337143 into master Oct 26, 2018
@brycedrennan brycedrennan deleted the better-cache branch October 26, 2018 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants