-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: automatic rpy2 instance conversion #7385
Conversation
def __getattribute__(self, attr): | ||
if attr == 'assign': | ||
return _assign | ||
return robj.r.__getattribute__(attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to use the interface provided, i.e., instead of robj.r.__getattribute__(attr)
, just do getattr(robj.r, attr)
. Same for the below methods: just call their respective toplevel functions or behavior as you would if you were a user. Sometimes Python itself performs ops on the result of a special method call, e.g., for rich comparisons Python will automatically compare the id
s of two objects if either of their comparison methods of the same name return NotImplemented
. This is done internally in Python, but if you directly call the method like __eq__
you don't get this convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, modified.
See also comment of @sinhrks here: #7309 (comment):
I think we have to decide where we want this conversion machinery to live (because now you have one in ipython magic (but that is moved to rpy2), rpy2 and pandas):
|
I'm leaving town for a week, so I'll pick this up next weekend, but wanted to let folks know that rpy2 needs to have some machinery for R -> python conversion (obviously), and so it makes the most sense to me to have the code live there, and I'm pretty sure any reasonable patch would be happily accepted. You can see that the rmagic code (in the process of being deprecated in IPython, now living in rpy2.ipython) hands all conversion over to ro.conversion.ri2ro. So, to do this in rpy2, the idea would be to make pandas2ri.activate() set up better conversion in the dynamically patched ri2ro function. I actually opened up an issue about this, as my memory was that things were better than they currently are! I haven't had time to go digging though: https://bitbucket.org/lgautier/rpy2/issue/206/numpy2ri-pandas2ri-no-longer-properly For what it's worth, I think if we have pandas installed (and invoke pandas2ri.activate()), a pandas.Series is a much better choice for conversion of R lists and vectors than a numpy object, as you get a proper index. |
(...)
So am I. |
Maybe interface and conversion logic should be discussed separately. Conversion FunctionsCurrently pandas conversion looks better for me. I agree it should be merged in the future, and it should be decided on which module the conversion function maintained. I think the conversion more rely on the type of Conversion InterfaceIn my use case, sometimes I want to handle |
@sinhrks @jorisvandenbossche what's the status on this? |
Saw this question was unanswered while checking into another issue. It should be noted that @lgautier fixed an issue with already existing code to convert pandas DataFrames automatically into rpy2 wrapped function calls. The logic for the direction rpy2 has moved is that conversion to (other) python objects has been deprecated in favor of rpy2 proxy objects (wrapped R objects) supporting the array interface so numpy calls work directly on rpy2.robjects objects. And if you want a true It's less obvious how to do that in pandas as there's nothing equivalent to the standard array / buffer API for tables of data. The other piece is that we've been talking about moving to a generics approach to handling conversion on the rpy2 end in the future. So, that's the state of things on the rpy2 side. Probably in any case it's good to have the code that inspects the guts of R objects live in rpy2. If folks want to coordinate, that'd be great. In particular, no one has asked for anything on the rpy2 side, right? |
Conversion functions @davclark Do you mean that the future of the @sinhrks I think you could also say the conversion depends more on the internals of the rpy2 objects and so rpy2 version, and should only use public pandas API. But if more contributors of pandas are interested in keeping this up to date, it is maybe easier to do it here. @davclark What do you think of the conversion interface issue raised by @sinhrks above? |
The functionality of My feeling is that advanced users like @sinhrks would be better served by using the conversion functions directly ( @sinhrks - is there a reason that simply using the functions directly doesn't work for you? Can someone provide a conceptual diff on those @jorisvandenbossche, sorry if I came across as snarky. Does someone want to provide a PR against rpy2? We had a strange |
@davclark Ah, I didn't interpreted you as snarky! Sorry if I implied that I did :-) Your input is certainly valued! |
@davclark Ah, what I meant is I want to perform automatic conversion in separate ways, sometimes numpy and otherwise pandas, etc. And I'm not willing to to call each raw function like And agreed to |
Thanks @sinhrks. That clarifies your concerns. It strikes me that this might be best expressed via a context manager... Can you provide the two use-cases or user models that would differentiate between the rpy2 model and the pandas model? It would be good to be clear on that as we coordinate. |
"automatic" conversion that would change its conversion logic is possible with the existing conversion infrastructure in rpy2. You just have to make your own conversion logic and register it. Should you want to have you own conversion rules that disregards existing conversion, this is also possible. As a module owner you can decide on the way it should be done: this is between you and your users. In the present case, may be worth considering looking at how the existing conversion in The case of explicitly parallel and active conversion rules is not very well addressed by the current design in rpy2 (as it is using the fact that imported modules are singletons, and the active conversion is always at |
Using a context manager would be an elegant idea. The only potential is issue would be with if several threads are used, as the conversion system would be modified "globally", even if encapsulated in a context. |
Just to touch base, I'm spending some time with @mrocklin thinking about how to do general conversion. He and some folks at Continuum are working on a project you've likely heard of called blaze, which in particular contains a simple conversion system called into that exercises @mrocklin's multiple dispatch mechanism. There's a related package called dynd, which we're looking at as a way to handle sensible handling of things like missing data for conversion to R. We're also discussing difficulties that arise with multi-indices. But he seems willing to break out into as a separate project, and this could perhaps be a way to coordinate conversion between data-frame (and other) packages like pytables, pandas, R, SQL, etc. In any case, I'd still love to hear a bit more about what kind of API people would like to see. |
@sinhrks can you rebase / update what is the status of this? |
@sinhrks what's the status of this? |
@jreback @jorisvandenbossche Based on #9187, direct conversion is maintained in |
see #9602 we are deprecating in 0.16.0. and redirecting to |
Derived from #7309. Create a wrapper for
robjects.r
inpandas.rpy.common
to perform automatic pandasDataFrame
andSeries
conversion.Series
will be converted to R data.frame to preserve rownames (index).If looks OK, I'll modify the doc (#7309) based on following API.