-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Requests for DimStack: 1) @rtransform 2) enum as dimension 3) Common dimensions 4) to DataFrame #410
Comments
Thanks, these are interesting suggestions! There is already a It should be easier to add layers to dimstacks, and your macro ideas are pretty cool. But personally I would prefer the macro-free syntax was better before diving into macros. So, as plan of action:
How does that sound? (As for enums, I'm not sure how that would work... the dimension object is pretty tightly integrated here, and the fact they are type wrappers is how everything compiles away. There is also the constraint that a dimension used in a |
Ultimately I would like to extend something like these methods for But we could already make |
on enums. I can write
I wanted a way to have a struct field type linked to a dimension by having similar name and restricting possible values to those of the dimension (PK-FK). This seems a good way. Unless you know something better? macro-free syntax - suggestionsAllow new layers to be created. orthogonal dimensions
is equivalent to More generally for DA1 ... DA4 defined over some combination of X,Y,Z an expression like More difficult |
This is imperative: s[:four] = 4s[:one] DimensionalData.jl is written largely with functional style, because, besides array indexing and metadata everything is immutable. this means it will work on GPU, which is a core design goal. But you cant directly change any of the objects like that. What you can already do is this (I think?): s2 = merge(s, rebuild(4s[:one]; name=:four))
# or
s2 = merge(s, DimStack((; four=4s[:one]))) But there is probably an easier way. What we need to move this issue forward is getting your ideas written out how they currently work, so we can point out the real weaknesses in the existing syntax and take small, actionable steps towards improvement. To be very clear if you want these features you will need to do this work (but it will be very much appreciated by me and other users of this package). This is one of over 30 packages for me, and bugfixes and core functionality has to have priority over features, so I wont personally have time to write this out until I have a direct need for it. |
Thanks Rafael. I'll have a go. One last comment (then I'll shut up). Re GPUs - I'm a novice but a common practice seems to be declaring arrays with blank data before populating them. With a DimStack you can set a cell-tuple e.g. |
No worries, don't shut up this is a super useful discussion. I just want to be clear that more comes of this if you put in the time to map out a plan with clear pointers to the current shortfalls than to post big ideas far from the current implementation for me to implement, because I'm really unlikely to have time to think through the design for that. But I can fix But about the stack. A stack is essentially a Arrays are the privileged mutable part of this package. What you really seem to want is a We can instead make the immutable syntax better, like this really should work and is a tiny fix: s = merge(s, (one=d1,)) |
You can't change fields of a NamedTuple but here (I think) the fields are pointers / references to the underlying Arrays. |
Well you can always update a whole array with a broadcast if it already exists: s[:one] .= d1 But this is attempting to change a pointer to point to a different array, by running s[:one] = d1 Thats just not possible: we need to make a new |
Would you consider allowing broadcast_dims to be applied to Arrays and NamedTuples with dimension indices coming from the array or tuple values ? Something like
Use case: create a DimArray to hold connections across different Servers and Databases.
The LH_Dimension function is messy because I have to create a name for each dimension, which has to be a variable name in a NamedTuple. (maybe there's a way around this I'm not aware of). |
Closing this as too long and complicated to be actionable. If you have any single contained feature requests, please write them up one at a time. Closing for now. |
1 @rtransform
@rtransform in DataFramesMeta.jl supports row level expressions that create new columns in an existing dataframe.
Example
Could a similar macro @layer_transform add new layers to an existing DimStack. The DimStack documentation example has three layers. With such a macro it could be created as:
2 enum dimensions
Could an enum be used as a dimension. Example
The advantage of this is an enum can then be used as a structure field type. e.g.
The possible values of x1 and x2 are then restricted to the values of the dimension. (A Primary Key - Foreign Key constraint).
3 Automatic joining on common dimensions.
Consider the DimStack above. and another DimStack sX defined over dimension X with layers l1 l2
since sX is has dimension X which is common to sXY could the layers of sX be used within sXY. something like:
And could the layers of sXY be used in sX within aggregation statements, as the result would be aggregated over dimension Y.
4 to and from DataFrame
DataFrame
stack
and 'unstack' pivot columns to rows and vice versa.Similarly, could there be a function like
That takes a DataFrame and a list of columns that become dimensions, with the remaining columns becoming layers ?
and
That converts all dimensions and layers to DataFrame columns with the number of rows being the product of the dimensions.
The text was updated successfully, but these errors were encountered: