Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache attributes #220

Merged
merged 15 commits into from
Jan 2, 2018
Merged

Cache attributes #220

merged 15 commits into from
Jan 2, 2018

Conversation

alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented Dec 24, 2017

This PR adds support for caching user attributes, enabled by default. Resolves #218.

TODO:

@alimanfoo alimanfoo added this to the v2.2 milestone Dec 24, 2017
@alimanfoo
Copy link
Member Author

cc @jhamman, @mrocklin

zarr/attrs.py Outdated
self.synchronizer = synchronizer

def _get(self):
if self.key in self.store:
d = json.loads(text_type(self.store[self.key], 'ascii'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general note, in the future for remote datasets we might want to minimize multiple touches of self.store. I can imagine wanting to change things like this to code like the following instead:

try:
    data = self.store[self.key[
except KeyError:
    d = {}
else:
    d = json.loads(text_type(data))

This is just a general comment though, not something that should necessarily be done here. I'll bring it up again if we find this a challenge when profiling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, asking for permission is a hard habit to break :-)

@jhamman
Copy link
Member

jhamman commented Dec 26, 2017

@alimanfoo - is it possible or could it be possible to load all the attributes of an array/store at once? Caching is great but accessing each attribute individually will still be pretty costly.

@alimanfoo
Copy link
Member Author

alimanfoo commented Dec 26, 2017 via email

@jakirkham
Copy link
Member

Is there a way to toggle caching on an existing object? Also is there a way to clear the cache?

@alimanfoo
Copy link
Member Author

Btw I've also now exposed a put() method on the Attributes class in the public API, which should be even more efficient than update() in the case where you know you want to initially set or completely replace all attributes. put() requires only a single write operation (overwrites whatever .zattrs is there) whereas update() requires a read and a write operation (retrieves .zattrs then applies update before writing).

@alimanfoo
Copy link
Member Author

@jakirkham you can turn caching off with o.attrs.cache = False. Also I've added o.attrs.refresh() to force reloading the cached attributes from the store.

@jakirkham
Copy link
Member

Thanks @alimanfoo. That sounds great.

@alimanfoo alimanfoo merged commit 5af7ace into master Jan 2, 2018
@alimanfoo alimanfoo deleted the cache-attrs-20171224b branch January 2, 2018 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache attributes DOC: Attributes class
4 participants