-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset aliases #1075
Dataset aliases #1075
Conversation
Looks good though as you say, needs a lot more testing. The unit tests on travis suggest something went wrong in groupby:
I suspect Once the unit tests pass, we'll get the notebooks passing before adding more unit tests... |
f65f3b5
to
3970188
Compare
3970188
to
acbbfc9
Compare
Ready for initial review, need to add to the |
By far the most important implications of this PR are regarding the changes to
As is often the case it comes down to naming. I quite liked the idea of We decided that the current best suggestion is
One possible disadvantage is that all dimensions will have |
@@ -677,7 +677,7 @@ def dimensions(self, selection='all', label=False): | |||
else: | |||
raise KeyError("Invalid selection %r, valid selections include" | |||
"'all', 'value' and 'key' dimensions" % repr(selection)) | |||
return [(dim.name if label == 'long' else dim.alias) | |||
return [(dim.name if label == 'long' else dim.key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have label='key'
I am wondering if the alternative isn't label='long'
but label='name'
?
Note that if we make 1 - If 2 - I don't think having multiple approaches is necessarily an issue: the Option 2. is probably the most problematic as you need to remember the order of the tuple is This leads me to the following recommendation: 1 - Of course if we don't want to support option 3. in this revised list, we can leave |
One annoyance is that the order between options 2 and 4 is switched: 2 - Dimension(('Spectrum', 'Frequency spectrum')) : Tuple format. I now wish I had reversed the tuple so that these would be equivalent: 2 - Dimension(('Frequency spectrum', 'Spectrum')) : Tuple format. I can still switch the tuple ordering around but it would break backwards compatibility. At least if I made a matching reversal in I am tempted by this change as it would look more consistent and I never really recommended the tuple format precisely because you have to always keep those declarations consistent (and remember the right order). One other approach would be to have
Dimension( hv.util.Aliases(Spectrum='Frequency spectrum').Spectrum)
from collections import namedtuple
alias = namedtuple('alias', ['name','key']) # Might as well be in HoloViews!
Dimension(alias('Frequency spectrum', 'Spectrum')) # Explicit and consistently ordered.
Dimension(hv.util.alias('Frequency spectrum', 'Spectrum'))
Dimension(hv.util.alias(name='Frequency spectrum', key='Spectrum')) # Equivalent I hope you can follow what I'm proposing - I think I like it! |
As one last note, you can disable implicit named tuples (mapping based on order) by defining def alias(**kwargs):
if set(kwargs.keys()) != set(['name', 'key']):
raise KeyError('Alias requires a name and a key')
nt = namedtuple('alias', ['name','key'])
return nt(name=kwargs['name'], key=kwargs['key']) This disables the top version of option 3. above so you would have to be explicit and use: Dimension(hv.util.alias(name='Frequency spectrum', key='Spectrum')) |
I think it's perfectly natural to declare the name of the data in your datasource first and the label second so I'm hesitant about making the proposed change. The real issue as you point out above that ideally we'd have declared a Overall I feel explaining that you supply an alias for the name of your data in the datasource as a tuple is much more intuitive than having to grasp aliases and the difference between In a simple example I feel this: df = pd.DataFrame({'x': range(10), 'y': np.random.rand(10)})
hv.Curve(df, kdims=[('x', 'X-Label')], vdims=[('y', 'Y-label')]) is much clearer than this: df = pd.DataFrame({'x': range(10), 'y': np.random.rand(10)})
hv.Curve(df, kdims=[hv.util.alias(name='X-label', key='x')], vdims=[hv.util.alias(name='Y-label', key='y')]) If you really do think this is unclear I will investigate whether we can't have |
The recommended thing would be: al = hv.util.Aliases(x='X-label', y='Y-label')
df = pd.DataFrame({'x': range(10), 'y': np.random.rand(10)})
hv.Curve(df, kdims=[al.x], vdims=[al.y]) I only showed the underlying named-tuple approach to illustrate what would be happening behind the scenes. |
Opened a separate branch to check whether we can't just switch them. Overall it doesn't seem too bad: https://travis-ci.org/ioam/holoviews/builds/194476686 so perhaps it's doable with a bit more work. |
I agree -- people usually want long/complex/typeset names for the viewable labels in figures, and rarely want long/complex/encoded names as Python attributes or Pandas column names. So the curent aliases support seems backwards, and if we're ever to have a chance to fix that, it seems like now is the time. |
I'm also in agreement. My main concerns are:
Just to make sure we are on the same page - the idea would be that |
Hopefully we can maintain compatibility here, but there's a definitely a few issues to figure out here. Currently the
Agreed, not hugely attached to either, but I don't think description is quite right because it's mostly used as a readable label while description sounds like something longer. @jbednar Any other suggestions? |
Seems like a label to me. |
I like |
I've discussed this again with Philipp and I think this is might come together quite nicely! In particular, I think it would be good to use You could use the following to define the dimension name/label: al = hv.util.Aliases(x='X-label', y='Y-label') Or dictionaries for the kwargs e.g: al = hv.util.Aliases(x=dict(label='X-label', range=(0,1)) Now when you refer to All this would need to be documented in a new aliases tutorial (that should probably discuss One of the goals of the new tutorial would be to lay out the recommended approach for different cases e.g library authors, users trying to define things once quickly and users who expect to be reusing dimension definitions over and over again. |
Closing in favor of #1083, turns out it was fairly easy to switch the name and the alias. |
Addresses #1001. Needs more checking and a lot of unit tests.