Issue
I am using rpy2 through the rmagic to interleave R code with python3 code in a jupyter notebook. A simple code cell as this:
%%R -i df -o df_out
df_out <- df
returns some column names changed, e.g. CTB-102L5.4
becomes CTB.102L5.4
. I think this is related with read.table
or similar (as per this answer). However I didn't find a way to specify this in the rmagic extension.
The only workaround I could think is to change the column names before passing them to R and reverting back them when the dataframe is back in python, but I'd like to find a better solution.
Solution
Whenever using the parameter -i <name>
to "import" a Python object into R, conversion rules are applied (see here). The default converter is ending up calling R's function data.frame
, which will sanitize the column names (parameter check.names=TRUE
by default, see https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/data.frame) to valid-yet-unquoted symbol names. In your example, CTB-102L5.4
would otherwise be parsed as the expression CTB - 102L5.4
.
This default behaviour is not necessarily desirable in every situation, and a custom converter can be passed to the R magic %%R
.
The documentation contains a short introduction to writing custom conversion rules (https://rpy2.github.io/doc/v2.9.x/html/robjects_convert.html).
Assuming that your input is a pandas
DataFrame, you could proceed as follows:
1- implement a variant of py2ri_pandasdataframe that does not sanitize names. Ideally by just setting check.names
to FALSE
, although currently not possible because of https://bitbucket.org/rpy2/rpy2/issues/455/add-parameter-to-dataframe-to-allow).
def my_py2ri_pandasdataframe(obj):
res = robjects.pandas2ro.py2ri_pandasdataframe(obj)
# Set the column names in `res` to the original column names in `obj`
# (left as an exercise for the reader)
return res
2- create a custom converter derived from the ipython converter
import pandas
from rpy2.ipython import rmagic
from rpy2.robjects.conversion import Converter, localconverter
my_dataf_converter = Converter('my converter')
my_dataf_converter.py2ri.register(pandas.DataFrame,
my_py2ri_pandasdataframe)
my_converter = rmagic.converter + my_dataf_converter
3- Use %%R
with --converter=my_converter
.
Answered By - lgautier
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.