rstudio / reticulate Goto Github PK

R Interface to Python

Home Page: https://rstudio.github.io/reticulate

License: Apache License 2.0

R 62.51% Python 3.17% C++ 33.77% Shell 0.10% C 0.09% CSS 0.17% Dockerfile 0.19%

reticulate's Introduction

R Interface to Python

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. The package includes facilities for:

Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.
Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).
Flexible binding to different versions of Python including virtual environments and Conda environments.

Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability. If you are an R developer that uses Python for some of your work or a member of data science team that uses both languages, reticulate can dramatically streamline your workflow!

Getting started

Installation

Install the reticulate package from CRAN as follows:

install.packages("reticulate")

Python version

By default, reticulate uses an isolated python virtual environment named "r-reticulate".

The use_python() function enables you to specify an alternate python, for example:

library(reticulate)
use_python("/usr/local/bin/python")

The use_virtualenv() and use_condaenv() functions enable you to specify versions of Python in virtual or Conda environments, for example:

library(reticulate)
use_virtualenv("myenv")

See the article on Python Version Configuration for additional details.

Python packages

You can install any required Python packages using standard shell tools like pip and conda. Alternately, reticulate includes a set of functions for managing and installing packages within virtualenvs and Conda environments. See the article on Installing Python Packages for additional details.

Calling Python

There are a variety of ways to integrate Python code into your R projects:

Python in R Markdown — A new Python language engine for R Markdown that supports bi-directional communication between R and Python (R chunks can access Python objects and vice-versa).
Importing Python modules — The import() function enables you to import any Python module and call it’s functions directly from R.
Sourcing Python scripts — The source_python() function enables you to source a Python script the same way you would source() an R script (Python functions and objects defined within the script become directly available to the R session).
Python REPL — The repl_python() function creates an interactive Python console within R. Objects you create within Python are available to your R session (and vice-versa).

Each of these techniques is explained in more detail below.

Python in R Markdown

The reticulate package includes a Python engine for R Markdown with the following features:

Run Python chunks in a single Python session embedded within your R session (shared variables/state between Python chunks)
Printing of Python output, including graphical output from matplotlib.
Access to objects created within Python chunks from R using the py object (e.g. py$x would access an x variable created within Python from R).
Access to objects created within R chunks from Python using the r object (e.g. r.x would access to x variable created within R from Python)

Built in conversion for many Python object types is provided, including NumPy arrays and Pandas data frames. For example, you can use Pandas to read and manipulate data then easily plot the Pandas data frame using ggplot2:

Note that the reticulate Python engine is enabled by default within R Markdown whenever reticulate is installed.

See the R Markdown Python Engine documentation for additional details.

Importing Python modules

You can use the import() function to import any Python module and call it from R. For example, this code imports the Python os module and calls the listdir() function:

library(reticulate)
os <- import("os")
os$listdir(".")

 [1] ".git"             ".gitignore"       ".Rbuildignore"    ".RData"
 [5] ".Rhistory"        ".Rproj.user"      ".travis.yml"      "appveyor.yml"
 [9] "DESCRIPTION"      "docs"             "external"         "index.html"
[13] "index.Rmd"        "inst"             "issues"           "LICENSE"
[17] "man"              "NAMESPACE"        "NEWS.md"          "pkgdown"
[21] "R"                "README.md"        "reticulate.Rproj" "src"
[25] "tests"            "vignettes"

Functions and other data within Python modules and classes can be accessed via the $ operator (analogous to the way you would interact with an R list, environment, or reference class).

Imported Python modules support code completion and inline help:

See Calling Python from R for additional details on interacting with Python objects from within R.

Sourcing Python scripts

You can source any Python script just as you would source an R script using the source_python() function. For example, if you had the following Python script flights.py:

import pandas
def read_flights(file):
  flights = pandas.read_csv(file)
  flights = flights[flights['dest'] == "ORD"]
  flights = flights[['carrier', 'dep_delay', 'arr_delay']]
  flights = flights.dropna()
  return flights

Then you can source the script and call the read_flights() function as follows:

source_python("flights.py")
flights <- read_flights("flights.csv")

library(ggplot2)
ggplot(flights, aes(carrier, arr_delay)) + geom_point() + geom_jitter()

See the source_python() documentation for additional details on sourcing Python code.

Python REPL

If you want to work with Python interactively you can call the repl_python() function, which provides a Python REPL embedded within your R session. Objects created within the Python REPL can be accessed from R using the py object exported from reticulate. For example:

Enter exit within the Python REPL to return to the R prompt.

Note that Python code can also access objects from within the R session using the r object (e.g. r.flights). See the repl_python() documentation for additional details on using the embedded Python REPL.

Type conversions

When calling into Python, R data types are automatically converted to their equivalent Python types. When values are returned from Python to R they are converted back to R types. Types are converted as follows:

R	Python	Examples
Single-element atomic vector	Scalar	`1`, `1L`, `TRUE`, `"foo"`
Unnamed list or multi-element atomic vector	List	`c(1.0, 2.0, 3.0)`, `c(1L, 2L, 3L)`
Named list	Dict	`list(a = 1L, b = 2.0)`, `dict(x = x_data)`
Matrix/Array	NumPy ndarray	`matrix(c(1,2,3,4), nrow = 2, ncol = 2)`
Data Frame	Pandas DataFrame	`data.frame(x = c(1,2,3), y = c("a", "b", "c"))`
Function	Python function	`function(x) x + 1`
NULL, TRUE, FALSE	None, True, False	`NULL`, `TRUE`, `FALSE`

If a Python object of a custom class is returned then an R reference to that object is returned. You can call methods and access properties of the object just as if it was an instance of an R reference class.

Learning more

The following articles cover the various aspects of using reticulate:

Calling Python from R — Describes the various ways to access Python objects from R as well as functions available for more advanced interactions and conversion behavior.
R Markdown Python Engine — Provides details on using Python chunks within R Markdown documents, including how call Python code from R chunks and vice-versa.
Python Version Configuration — Describes facilities for determining which version of Python is used by reticulate within an R session.
Installing Python Packages — Documentation on installing Python packages from PyPI or Conda, and managing package installations using virtualenvs and Conda environments.
Using reticulate in an R Package — Guidelines and best practices for using reticulate in an R package.
Arrays in R and Python — Advanced discussion of the differences between arrays in R and Python and the implications for conversion and interoperability.
Python Primer — Introduction to Python for R users.

Why reticulate?

From the Wikipedia article on the reticulated python:

The reticulated python is a species of python found in Southeast Asia. They are the world’s longest snakes and longest reptiles…The specific name, reticulatus, is Latin meaning “net-like”, or reticulated, and is a reference to the complex colour pattern.

From the Merriam-Webster definition of reticulate:

1: resembling a net or network; especially : having veins, fibers, or lines crossing a reticulate leaf. 2: being or involving evolutionary change dependent on genetic recombination involving diverse interbreeding populations.

The package enables you to reticulate Python code into R, creating a new breed of project that weaves together the two languages.

reticulate's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala jankuper1970 hehuanshu96 xiaoyaogong terrytangyuan rlugojr trestletech allensmile grapheneintelligentsystems bwlewis neveroldmilk kevinykuo mitchellakeba dph002 goldingn pourzanj russellpierce the-r2 strategist922 egnha dexterbox chas-mellish woodhaha benmarwick uraboer nathania sooheang minghao2016 kevinushey nanaakwasiabayieboateng randy3k ilkerersoy csu-mapping javierluraschi huapeng01016 dataxujing cderv guoyu07 jkyzwh guhjy seankross mhamine radovankavicky gapdata tmastny nsm120 drninjamommy lstmemery aespar21 mutual-ai yufenwang82 yenchih noahcse karawoo wangguojie 123saga detrident gowrineeli jaykimbravekjh suitgeeks kawtch rossholmberg aliciaschep sthagen kc17 zhwj7552 classy-org magellen ejyang06 bellamkondaprakash davisvaughan mikajoh irsadarief mrmaher avitahcapital rubythonode gidden nmatare iglm konradzdeb adfi mtmorgan j450h1 cristiano74 leipzig samuelmacedo83 hoardboard rahamrahimi hitfuture ab-si gyd1990 afcarl adanvr takewiki emmanuelcharpentier sgpohlj87 mikekiwa gazimahmud algoskynet bjungbogati

reticulate's Issues

append all Python class bases to R class attribute

When executing a python function with reticulate, class information from the python object is appended to the resulting R object's class attribute. This is done in the CPP function py_ref.

If I understand this code correctly, py_ref grabs the __class__ attribute of the python object (the class of which this is an instance), as well as the parent class from which this class inherits, via the class object's .__bases__ attribute. However this is only done for the immediate parent class, not for the full inheritance.

I am working on a port of the GPflow Python module, and want to map its nice kernel creation syntax to R. Kernel objects can be added, multiplied etc., and the GPflow module enables this by defining addition for all objects of class Kern. for example, in python:

import GPflow as gp
k1 = gp.kernels.Constant(1)
k2 = gp.kernels.Bias(1)
k3 = gp.kernels.Cosine(1)
K = k1 + k2 + k3

The full class inheritance for these objects is:
k1: object>Parentable>Parameterized>Kern>Static>Constant
k2: object>Parentable>Parameterized>Kern>Static>Constant>Bias
k3: object>Parentable>Parameterized>Kern>Stationary>Cosine
The + operator is a wrapper around GPflow.kernels.Add((k1, k2, k3)), defined in the Kern class; it works for any class instance inheriting from Kern.

To enable the same syntax in an R port using reticulate, it would be really helpful for GPflow.kernels.Kern to be listed in the class attribute for the corresponding R objects, so we can do S3 dispatch. But because only the immediate parent class is returned this isn't currently possible. I.e. in R:

library(reticulate)
gp <- import('GPflow')
k1 <- gp$kernels$Constant(1)
k2 <- gp$kernels$Bias(1)
k3 <- gp$kernels$Cosine(1)
class(k1); class(k2); class(k3)

[1] "GPflow.kernels.Constant" "GPflow.kernels.Static"   "python.builtin.object"  
[1] "GPflow.kernels.Bias"     "GPflow.kernels.Constant" "python.builtin.object"
[1] "GPflow.kernels.Cosine"     "GPflow.kernels.Stationary" "python.builtin.object"

A fix would be to have this section of py_ref recurse through the parent classes, finding their parents (via the .__bases__ attribute) and adding them to the R object's class attribute, until hitting either object or an empty string (object's parent class).

As far as I can tell, there is no sane way of doing this from the reticulate-wrapping GPflow-specific R package.

If that sounds like behaviour you would like (or tolerate) in reticulate, I'd be happy to do a PR (though my C++ is decidedly rusty so very happy if someone else wants to do it!).

call `getitem` when python class is indexed with `[]`

Is there a way to access a classes __getitem__ method using [] notation from within R?

For example:

library(reticulate)
pptx <- import("pptx")
prs <- pptx$Presentation()
title_slide_layout <- prs$slide_layouts[0L]

This fails with the error:

Error in prs$slide_layouts[0L] : 
  object of type 'environment' is not subsettable

My current workaround is to call __getitem__ directly like so:

title_slide_layout <- prs$slide_layouts$`__getitem__`(0L)

Something like this seems to work OK. I'm not a confident enough programmer to know if this would work well in reticulate:

`[.python.builtin.object` <- function(obj, i) {
  obj$`__getitem__`(as.integer(i))
}

Amazing package!

reticulate cannot parse windows registry key to get versions

I have anaconda python installation
My registry key tree under HCU/Software is Python/ContinuumAnalytics/Anaconda_4.1.1_64-bit/...
When I try to install tensorflow, I get the following error and message.
I am sad :(
tensorflow::install_tensorflow("conda")
Error in if (arch == "32") arch <- "i386" else if (arch == "64") arch <- "x64" :
missing value where TRUE/FALSE needed
In addition: Warning message:
In read_python_versions_from_registry("HCU", key = "ContinuumAnalytics", :
Unexpected format for Anaconda version: Anaconda_4.1.1_64-bit

Numpy Arrays and R Arrays are printed differently

Say you have a (4,2) array in R, and you convert to Python then print. This works as expected.

> R_4_2 <- array(1:prod(4,2), dim = c(4,2))
#     [,1] [,2]
#[1,]    1    5
#[2,]    2    6
#[3,]    3    7
#[4,]    4    8
> dim(R_4_2)
#[1] 4 2
> Np_4_2 <- np$array(R_4_2)
> Np_4_2
#[[1 5]
# [2 6]
# [3 7]
# [4 8]]
> Np_4_2$shape
#(4, 2)

Now say we'd like to extend our original R array so instead of being a single (4,2) matrix we want it to be a 3 (4,2) matrices. In R we tack on the extra dimension to the end of the shape array.

> R_4_2_3 <- array(1:prod(4,2,3), dim = c(4,2,3))
#, , 1
#
#     [,1] [,2]
#[1,]    1    5
#[2,]    2    6
#[3,]    3    7
#[4,]    4    8
#
#, , 2
#
#     [,1] [,2]
#[1,]    9   13
#[2,]   10   14
#[3,]   11   15
#[4,]   12   16
#
#, , 3
#
#     [,1] [,2]
#[1,]   17   21
#[2,]   18   22
#[3,]   19   23
#[4,]   20   24
> dim(R_4_2_3)
#[1] 4 2 3

This time when we convert to Numpy the shape is the same, but instead of printing 3 (4,2) matrices, we print 4 (3,2) matrices

> Np_4_2_3 <- np$array(R_4_2_3)
> Np_4_2_3$shape
#(4, 2, 3)
> Np_4_2_3
#[[[ 1  9 17]
#  [ 5 13 21]]
#
# [[ 2 10 18]
#  [ 6 14 22]]
#
# [[ 3 11 19]
#  [ 7 15 23]]
#
# [[ 4 12 20]
#  [ 8 16 24]]]

The equivalent shape in numpy that would print out the same way as R would be a (3,4,2). I'm not sure what the best way to solve this. Could we change the default behavior of arrays in R?

RStudio crashes when browsing certain methods

First of all, thank you so much for a fantastic package!

I am using Reticulate to call a Python Storage-module for Azure. This works perfectly. However, I have discovered that RStudio crashes for certain methods when they are displayed in the dropdown menu. In my case, it seem to happen only for methods containing the word "copy". Browsing copy-methods from the azure module crashes RStudio (example 1 and 2). Browsing copy-methods from the shutil-module gives an error message (example 3). I am unable to reproduce this for copy-methods from pythons copy-module which are displayed correctly.

I have tried both the CRAN-version and Github-version of reticulate. Please see attached examples.

Example 1 (Crashes when browsing CopyProperties of asb)

library(reticulate)
asb  <- import("azure.storage.blob")

# This is the method which crashes when browsing
asb$CopyProperties

Example 2 (Crashes when browsing method copy_blob)

library(reticulate)

asb <- import("azure.storage.blob")

# Setup credentials (Disguised for the bug report)
#account_name <- '###'
#account_key  <- '###'

# Create blob object
bb <- asb$BlockBlobService(account_name = account_name, account_key = account_key)

# This is the method which crashes when browsing
bb$copy_blob

Example 3 (Does not actually crash RStudio but gives an error)

library(reticulate)

shutil <- import("shutil")

# This is the method which crashes when browsing
shutil$copy

Returns the following error when browsing shutil$copy

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: Function has keyword-only parameters or annotations, use getfullargspec() API which can support them

Session info

R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian (Bokmål)_Norway.1252   
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C                              
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_0.7.0.9002

loaded via a namespace (and not attached):
[1] tools_3.3.3  Rcpp_0.12.10

SSL errors when importing a python module in RStudio's ression

OS : Linux Mint 18.1, a Ubuntu 16.04 derivative.

Python Distribution: Intel Python 3.5.3 (This is basically conda with python compiled using Intel MKL, otherwise identical in behavior to conda).

Python Library required to import: docker (https://docker-py.readthedocs.io/en/stable/)

I first created a conda env for docker

$> conda create -n docker python=3.5 numpy

Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /home/bhaskar/.conda/envs/docker:

The following NEW packages will be INSTALLED:

    mkl:        2017.0.3-0   
    numpy:      1.13.1-py35_0
    openssl:    1.0.2l-0     
    pip:        9.0.1-py35_1 
    python:     3.5.3-1      
    readline:   6.2-2        
    setuptools: 27.2.0-py35_0
    sqlite:     3.13.0-0     
    tk:         8.5.18-0     
    wheel:      0.29.0-py35_0
    xz:         5.2.2-1      
    zlib:       1.2.8-3

$> source activate docker
$> pip install docker

Next test this out from command line

$> R
> library(reticulate)
> docker <- import("docker")

This works and I can now access the Python SDK from this point on wards , but when I do the R part in a Rstudio R session I get

docker <- import("docker")

I get ...

Error in py_module_import(module, convert = convert) : 
  AttributeError: module 'websocket._ssl_compat' has no attribute 'ssl'

The rstudio bin is started from the command-line after activating the conda environment. Here's the output of py_config()

python:         /home/bhaskar/.conda/envs/docker/bin/python
libpython:      /home/bhaskar/.conda/envs/docker/lib/libpython3.5m.so
pythonhome:     /home/bhaskar/.conda/envs/docker:/home/bhaskar/.conda/envs/docker
version:        3.5.3 |Intel Corporation| (default, Apr 27 2017, 18:08:47)  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
numpy:          /home/bhaskar/.conda/envs/docker/lib/python3.5/site-packages/numpy
numpy_version:  1.12.1
docker:         /home/bhaskar/.conda/envs/docker/lib/python3.5/site-packages/docker

python versions found: 
 /home/bhaskar/.conda/envs/docker/bin/python
 /usr/bin/python
 /usr/bin/python3

I even tried with use_condaenv("docker") but again no luck.

After this I also installed the 'websocket' python module, which I don't think is necessary but just to rule that possibility out

$> pip install websocket
$> r
> library(reticulate)
> c <- import("docker")

And this time I got a different error

docker <- import("docker")
Error in py_module_import(module, convert = convert) : 
  ImportError: /home/bhaskar/.conda/envs/docker2/lib/python3.5/lib-dynload/_ssl.cpython-35m-x86_64-linux-gnu.so: undefined symbol: SSLv2_method

FWIW I get no error when running the above from command line with or without 'websocket' being explicitly installed.

Obligatory sessionInfo()

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.1

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_0.9 magrittr_1.5  

loaded via a namespace (and not attached):
[1] compiler_3.4.1    clisymbols_1.2.0  tools_3.4.1       withr_1.0.2      
[5] yaml_2.1.14       Rcpp_0.12.11

Attempting to access non-existent key in 'dict' segfaults

Repro:

library(reticulate)
d <- dict()
d$hello

Trace:

* thread #1: tid = 0x269028, 0x00000001188871fe libpython2.7.dylib`PyObject_GetAttrString + 13, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x00000001188871fe libpython2.7.dylib`PyObject_GetAttrString + 13
    frame #1: 0x0000000118887263 libpython2.7.dylib`PyObject_HasAttrString + 11
    frame #2: 0x00000001185d32eb reticulate.so`py_ref(libpython::_object*, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 107
    frame #3: 0x00000001185d8ee2 reticulate.so`py_dict_get_item(PyObjectRef, Rcpp::RObject_Impl<Rcpp::PreserveStorage>) + 194
    frame #4: 0x00000001185c935c reticulate.so`reticulate_py_dict_get_item + 156

Document the conversion between R and Python functions/closures

See title

Can I close a Python session?

I must say that I am new to Python but quite experienced in R - so bare with me, if this is a simple Python question.

I create a python session by using py_run_file() which works fine. But I would like to close / delete the session and free the associated space - how can I do that?

Thanks,

Rainer

incorrect signature when function parameters include list literals

For the following function wrapper invocation:

reticulate::py_function_wrapper("tf$contrib$keras$layers$Conv2D")

I get this signature:

Conv2D <- function(1L, 1L), activation = NULL, use_bias = TRUE, kernel_initializer = "glorot_uniform", bias_initializer = "zeros", kernel_regularizer = NULL, bias_regularizer = NULL, activity_regularizer = NULL, kernel_constraint = NULL, bias_constraint = NULL, ...) {
}

It looks like the first incidence of a list literal results in the function signature up to that point getting consumed.

@terrytangyuan

Output Not Captured

See this part of the code: https://github.com/rstudio/tflearn/blob/master/tests/testthat/test-hooks.R#L17
where the output doesn't seem to be captured using py_capture_stdout. Is this the correct way to use it? The capture.output from R only captures the output for the return value of linear_dnn_combined_regression. Is py_capture_stdout able to capture loggings from TF?

Wrappers for python packages?

It may be not the best channel to ask this, but...
Do you think it makes sense a package likes this: https://github.com/dfalbel/rsk
Importing all functions from scikit-learn using reticulate and using R functional style?

Thanks

Case where default value of an argument is some object/methods from a module

reticulate::py_function_wrapper("tf$contrib$layers$sparse_column_with_keys") yields:

sparse_column_with_keys <- function(column_name, keys, default_value = -1L, combiner = "sum", dtype = tf.string) {
  tf$contrib$layers$sparse_column_with_keys(
    column_name = column_name,
    keys = keys,
    default_value = default_value,
    combiner = combiner,
    dtype = dtype
  )
}

Currentlydtype's default value is tf.string. It would be nice to generate tf$string automatically, like the following:

sparse_column_with_keys <- function(column_name, keys, default_value = -1L, combiner = "sum", dtype = tf$string) {
...  # save as above
}

Support for Sparse Matrices

Currently the x object accepted by the Keras fit function is a

Vector, matrix, or array of training data...

It would be great if there is also support for sparse matrices, as these X matrices can grow very large on disk, although they can be very sparse. For example, the Nietzsche example runs fine on my 32GB RAM laptop, but I run into RAM problems even with twice as many rows. And this is a classic case for a sparse matrix.

@gsimchoni I moved this issue from the Keras repo over here because this is where it would need to be addressed.

Pandas DataFrame and R data.frame translation

Wonderful work so far! Just dreaming of the future...

doc refers to missing function: import_from_package()

The helpfile for import() etc. has this under details:

The import_from_package function imports a Python module defined within an R package (the module must be located within the inst/python directory of the R package).

but import_from_package() isn't defined - should this be import_from_path() instead?

Support for Python Nix environments

Would it be possible to add support for Nix environments? Unlike conda, Nix provides a way to specify the dependencies all the way down to libc in a consistent reproducible manner. On an older system like RHEL-6.x which is still widely used in enterprise environments conda packages break fairly often because of some system shared library being too old. Nix was a lifesaver for us when we needed complex packages like Tensorflow or Keras.

The official documentation:
http://python-on-nix.readthedocs.io/en/latest/index.html
in particular
http://python-on-nix.readthedocs.io/en/latest/tutorial.html#declarative-environment-using-myenvfun
also
http://datakurre.pandala.org/2015/10/nix-for-python-developers.html

The main site https://nixos.org/nix/

Cannot install with Microsoft R Open 3.3.3 under Rstudio

When I use the following line to install packages "reticulate" under RStudio, it show me following Error message:
install.packages("reticulate")
Installing package into C:/Users/M248168/RStudio/Library¡¦
(as ¡¥lib¡¦ is unspecified)
Package which is only available in source form, and may need
compilation of C/C++/Fortran: ¡¥reticulate¡¦
These will not be installed

py_config unable to find vcruntime140.dll

I'm trying to access a Conda environment through RStudio, but am getting an error message I don't understand.

I've installed the reticulate package under RStudio 1.0.136 (R version 3.3.0) on a Windows 7 machine. Because of the way my computer is configured, the default commands for the use_condaenv() function don't work, but I can run it without errors like this:

use_condaenv("dl3.5", conda = "c:/Users/xarxziux/AppData/Local/Continuum/Anaconda3/Scripts/conda.exe", required = TRUE)

After a few seconds, the R prompt returns with no warnings or errors. However if I now try to verify the command with the py_config() function, I get the following dialog box:

Clicking on this box, I get the following error message in the terminal:

Error in py_initialize(config$python, config$libpython, config$pythonhome,  : 
c:/Users/xarxziux/AppData/Local/Continuum/Anaconda3/envs/dl3.5/python35.dll - The specified module could not be found.

The directory C:/Users/xarxziux/AppData/Local/Continuum/Anaconda3/envs/dl3.5/ contains both vcruntim140.dll and python35.dll., but the function cannot find either. Do you have any idea what is going on here, or if there's anything else I can check?

can't find python27.dll when python installed for all users on windows

It's in windows\system and we aren't looking for it there.

Possible issue with Python to R object conversion

I don't know much about Python to understand what is going on here. In this example, it looks like the object returned by Python is an array of arrays, which after conversion to R, returns a list of lists. However, based on how I'm doing it here, looks information contained in the Python array is not reflected in the R object.

### Attach/Load required packages
library(reticulate)
use_python("/anaconda/bin/python")
library(kerasR)
require(grid)

### load the dataset but only keep the top n words, zero the rest

mod <- new.env(parent=emptyenv())
mod$keras.datasets <- reticulate::import("keras.datasets", convert = FALSE)
mod$np <- reticulate::import("numpy")

int32 <- function(x) {
  mod$np$int32(x)
}

load_imdb2 <- function(num_words = NULL, skip_top = 0, maxlen = NULL, seed = 113,
                      start_char = 1, oov_char = 2, index_from = 3) {
  
  z <- mod$keras.datasets$imdb$load_data(path="imdb_full.pkl",
                                         num_words = num_words,
                                         skip_top = int32(skip_top),
                                         maxlen = maxlen,
                                         seed = int32(seed),
                                         start_char = int32(start_char),
                                         oov_char = int32(oov_char),
                                         index_from = int32(index_from))
  z
}


imdb2 <- load_imdb2(num_words = 5000)
imdb2_r <- py_to_r(imdb2)
imdb2

((array([ [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32],
[1, 194, 1153, 194, 2, 78, 228, 5, 6, 1463, 4369, 2, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 2, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 2, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 2, 2, 349, 2637, 148, 605, 2, 2, 15, 123, 125, 68, 2, 2, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 2, 5, 2, 656, 245, 2350, 5, 4, 2, 131, 152, 491, 18, 2, 32, 2, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95], ...

Now, if you look inside imdb2_r, it is a list of lists only containing 0/1's (none of the numbers above are reflected in this object as far as I see.

Thanks,
A.

Importing keras fails due to sys.stderr.write in initpy

Trying to import the keras library fails supposedly because it contains a call to sys.stderr.write in init.py

keras <- import( "keras" )
Error in py_module_import(module, convert = convert) :
AttributeError: 'NoneType' object has no attribute 'write'

Detailed traceback:
File "C:\Users\MHELDM1\AppData\Local\conda\conda\envs\tf\lib\site-packages\keras_init_.py", line 3, in
from . import activations
File "C:\Users\MHELDM1\AppData\Local\conda\conda\envs\tf\lib\site-packages\keras\activations.py", line 4, in
from . import backend as K
File "C:\Users\MHELDM~1\AppData\Local\conda\conda\envs\tf\lib\site-packages\keras\backend_init_.py", line 72, in
sys.stderr.write('Using TensorFlow backend.\n')

commenting out the call to sys.stderr.write in init.py solves the problem. Might there be an issue with how sys.stdout and sys.stderr is dealt with in the initialization phase of a python module?

conda env support

This package is brilliant. One of the ways that we've seen users leverage python in DS is by using the conda package manager, specifically conda environments. Does this package have the ability to say something like "enable conda environment foo" and have your R code leverage that particular build/stack?

Missing /

/ missing between "/usr/local/Cellar/gcc/6.3.0_1" and "bin"

`devtools::install_github("rstudio/reticulate")

Using GitHub PAT from envvar GITHUB_PAT
Downloading GitHub repo rstudio/reticulate@master
from URL https://api.github.com/repos/rstudio/reticulate/zipball/master
Installing reticulate
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD
INSTALL
'/private/var/folders/vq/2fchgnps1hxd5j_1jlkvvvnwmh074h/T/RtmpIYx0Ik/devtools73a2a512344/rstudio-reticulate-fcca860'
--library='/Users/dpaulsen/Library/R/3.3/library' --install-tests

installing source package ‘reticulate’ ...
** libs
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Users/dpaulsen/Library/R/3.3/library/Rcpp/include" -fPIC -Wall -mtune=core2 -g -O2 -c RcppExports.cpp -o RcppExports.o
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Users/dpaulsen/Library/R/3.3/library/Rcpp/include" -fPIC -Wall -mtune=core2 -g -O2 -c event_loop.cpp -o event_loop.o
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Users/dpaulsen/Library/R/3.3/library/Rcpp/include" -fPIC -Wall -mtune=core2 -g -O2 -c libpython.cpp -o libpython.o
/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Users/dpaulsen/Library/R/3.3/library/Rcpp/include" -fPIC -Wall -mtune=core2 -g -O2 -c python.cpp -o python.o
/usr/local/Cellar/gcc/6.3.0_1bin/g++-6 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o reticulate.so RcppExports.o event_loop.o libpython.o python.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
/bin/sh: /usr/local/Cellar/gcc/6.3.0_1bin/g++-6: No such file or directory
make: *** [reticulate.so] Error 127
ERROR: compilation failed for package ‘reticulate’
removing ‘/Users/dpaulsen/Library/R/3.3/library/reticulate’
Error: Command failed (1)
`

Mapping string/unicode dtypes in numpy arrays

Not trying to bring up #2 again, but how much work would it be to support just character arrays (not arbitrary python objects)? I'm trying to wrap https://github.com/ratal/mdfreader to work with industrial sensor log data and sometimes strings come up.

Extending kerasR using reticulate

If possible, I’d appreciate if you can provide any pointers to the error message I’m getting below. I’m trying to extend the package kerasR by adding one function currently not supported (keras.preprocessing.image.ImageDataGenerator). The error seems to be when trying to use .flow(X, y) method (it expects a tuple, but the output seems to be an iterator/generator). I'm using reticulate for this purpose, and I cannot figure out since many days ago what I'm doing wrong.

Thanks

### Attach/Load required packages
library(reticulate)
use_python("/anaconda/bin/python")
library(kerasR)

### Get cifar10 data and plot sample instances
cifar <- load_cifar10()
X_train <- cifar$X_train
Y_train <- cifar$Y_train
X_test <- cifar$X_test
Y_test <- cifar$Y_test

# Set parameters
batch_size = 32
num_classes = 10
epochs = 200

# Pre-process data
Y_train <- to_categorical(Y_train, num_classes)
Y_test <- to_categorical(Y_test, num_classes)
X_train <- X_train / 255
X_test <- X_test / 255

# Create model
model <- Sequential()
model$add(Conv2D(filters = 32, kernel_size = c(3, 3), padding  = "same", input_shape = c(32, 32, 3)))
model$add(Activation("relu"))
model$add(Conv2D(filters = 32, kernel_size = c(3, 3)))
model$add(Activation("relu"))
model$add(MaxPooling2D(pool_size= c(2, 2)))
model$add(Dropout(0.25))
model$add(Conv2D(filters = 64, kernel_size = c(3, 3), padding  = "same"))
model$add(Activation("relu"))
model$add(Conv2D(filters = 64, kernel_size = c(3, 3)))
model$add(Activation("relu"))
model$add(MaxPooling2D(pool_size= c(2, 2)))
model$add(Dropout(0.25))
model$add(Flatten())
model$add(Dense(units = 512))
model$add(Activation("relu"))
model$add(Dropout(0.50))
model$add(Dense(units = num_classes))
model$add(Activation("softmax"))

# Initiate RMSprop optimizer
opt <- RMSprop(lr = 0.0001, decay = 1e-6)

# Compile model
keras_compile(model, loss = "categorical_crossentropy", optimizer = opt, metrics = "accuracy")
#model$count_params()

### ImageDataGenerator
mod <- new.env(parent=emptyenv())
mod$keras.preprocessing.image <- reticulate::import("keras.preprocessing.image")
image_data_generator <- function(featurewise_center = FALSE,
                                samplewise_center = FALSE,
                                featurewise_std_normalization = FALSE,
                                samplewise_std_normalization = FALSE,
                                zca_whitening = FALSE,
                                rotation_range = 0.,
                                width_shift_range = 0.,
                                height_shift_range = 0.,
                                shear_range = 0.,
                                zoom_range = 0.,
                                channel_shift_range = 0.,
                                fill_mode = "nearest",
                                cval = 0.,
                                horizontal_flip = FALSE,
                                vertical_flip = FALSE,
                                rescale = NULL,
                                preprocessing_function = NULL,
                                data_format = "channels_last") {

 mod$keras.preprocessing.image$ImageDataGenerator(

   featurewise_center = featurewise_center,
   samplewise_center = samplewise_center,
   featurewise_std_normalization = featurewise_std_normalization,
   samplewise_std_normalization = samplewise_std_normalization,
   zca_whitening = zca_whitening,
   rotation_range = rotation_range,
   width_shift_range = width_shift_range,
   height_shift_range = height_shift_range,
   shear_range = shear_range,
   zoom_range = zoom_range,
   channel_shift_range = channel_shift_range,
   fill_mode = fill_mode,
   cval = cval,
   horizontal_flip = horizontal_flip,
   vertical_flip = vertical_flip,
   rescale = rescale,
   preprocessing_function = preprocessing_function,
   data_format = data_format

 )
}

# This will do preprocessing and realtime data augmentation:
datagen <- image_data_generator(
 featurewise_center = FALSE,  
 samplewise_center = FALSE,  
 featurewise_std_normalization = FALSE,  
 samplewise_std_normalization = FALSE, 
 zca_whitening = FALSE, 
 rotation_range = 0.,  
 width_shift_range = 0.1, 
 height_shift_range = 0.1,  
 horizontal_flip = TRUE,  
 vertical_flip = FALSE) 

# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen$fit(X_train)

model$fit_generator(datagen$flow(X_train, Y_train,
                                batch_size = batch_size),
                   steps_per_epoch = floor(dim(X_train)[1] / batch_size),
                   epochs = epochs,
                   validation_data = list(X_test, Y_test))

Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

Missing some part of doc for certain parameter

devtools::load_all("~/reticulate")
library(tensorflow)
reticulate::py_function_wrapper("tf$estimator$Estimator")

generates:

#' Estimator class to train and evaluate TensorFlow models.
#' 
#' The `Estimator` object wraps a model which is specified by a `model_fn`,
#' which, given inputs and a number of other parameters, returns the ops
#' necessary to perform training, evaluation, or predictions. All outputs (checkpoints, event files, etc.) are written to `model_dir`, or a
#' subdirectory thereof. If `model_dir` is not set, a temporary directory is
#' used. The `config` argument can be passed `RunConfig` object containing information
#' about the execution environment. It is passed on to the `model_fn`, if the
#' `model_fn` has a parameter named "config" (and input functions in the same
#' manner). If the `config` parameter is not passed, it is instantiated by the
#' `Estimator`. Not passing config means that defaults useful for local execution
#' are used. `Estimator` makes config available to the model (for instance, to
#' allow specialization based on the number of workers available), and also uses
#' some of its fields to control internals, especially regarding checkpointing.
#' The `params` argument contains hyperparameters. It is passed to the
#' `model_fn`, if the `model_fn` has a parameter named "params", and to the input
#' functions in the same manner. `Estimator` only passes params along, it does
#' not inspect it. The structure of `params` is therefore entirely up to the
#' developer. NULL of `Estimator`'s methods can be overridden in subclasses (its
#' constructor enforces this). Subclasses should use `model_fn` to configure
#' the base class, and may add methods implementing specialized functionality.
#' 
#' @param model_fn Model function. Follows the signature:     * Args:       * `features`: single `Tensor` or `dict` of `Tensor`s              (depending on data passed to `train`),       * `labels`: `Tensor` or `dict` of `Tensor`s (for multi-head              models). If mode is `ModeKeys.PREDICT`, `labels=NULL` will be              passed. If the `model_fn`'s signature does not accept              `mode`, the `model_fn` must still be able to handle              `labels=NULL`.       * `mode`: Optional. Specifies if this training, evaluation or              prediction. See `ModeKeys`.       * `params`: Optional `dict` of hyperparameters.  Will receive what              is passed to Estimator in `params` parameter. This allows              to configure Estimators from hyper parameter tuning.       * `config`: Optional configuration object. Will receive what is passed              to Estimator in `config` parameter, or the default `config`.              Allows updating things in your model_fn based on configuration              such as `num_ps_replicas`, or `model_dir`.
#' @param model_dir Directory to save model parameters, graph and etc. This can     also be used to load checkpoints from the directory into a estimator to     continue training a previously saved model. If `NULL`, the model_dir in     `config` will be used if set. If both are set, they must be same. If     both are `NULL`, a temporary directory will be used.
#' @param config Configuration object.
#' @param params `dict` of hyper parameters that will be passed into `model_fn`.           Keys are names of parameters, values are basic python types.
#' 
#' @export
Estimator <- function(model_fn, model_dir = NULL, config = NULL, params = NULL) {
  tf$estimator$Estimator(
    model_fn = model_fn,
    model_dir = model_dir,
    config = config,
    params = params
  )
}

However, it seems like the parameter model_fn is missing a lot of information. Original doc here.

Use of endsWith not compatible with R 3.2

See: https://www.r-project.org/nosvn/R.check/r-oldrel-windows-ix86+x86_64/reticulate-00check.html

Potential unstable detection of Python modules

I am having a weird problem in RStudio where sometimes it cannot find a particular module in a Python package, for example:

 Error in py_module_import(module, convert = convert) : 
  ImportError: No module named tensorflow.contrib.learn

 Error in py_module_import(module, convert = convert) : 
  ImportError: No module named tensorflow.python.ops.random_ops

This error seems random. I had to re-open RStudio 3 or more times and finally it will find all the modules correctly.

Is there any potential piece of code that would cause this instability?

Note that the Python modules can be find via python on console and this problem only appears in RStudio (Rscript runs fine).

Importing submodules?

Suppose I have a package pack with a submodule mod that has a function func. To call func in python:

from pack import mod
mod.func()

How do I import mod in reticulate, and run func?

Feature request: pdb.pm() support

I'd like to be able to use the python pdb module from R, but it appears that tracebacks are currently lost. This is important, since it can be difficult to reproduce the context of an error outside of R.

Desired behavior Python

>>> import pdb
>>> abs(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for abs(): 'NoneType'
>>> pdb.pm()  # Start post-mortem debugger.
<stdin>(1)<module>()
(Pdb) 
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'pdb': <module 'pdb' from '/usr/lib/python2.7/pdb.py'>, '__doc__': None, '__package__': None}
>>>

Current behavior in R

> library(reticulate)
> py <- import_builtins()
> pdb <- import('pdb')
> py$abs(NULL)
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: bad operand type for abs(): 'NoneType'
> pdb$pm()
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  AttributeError: 'module' object has no attribute 'last_traceback'

Detailed traceback: 
  File "/usr/lib/python2.7/pdb.py", line 1270, in pm
    post_mortem(sys.last_traceback)
>

Thanks for a great package!

What is under the hood ?

Hello,

Is it possible to get some (minimal) documentation on how exactly reticulate works under the hood ?

(Congratulation on developing reticulate, it is an impressive package :) )

Can't load venvs that aren't in ~/.virtualenvs

I've placed my virtual environments in a different folder (~/virtualenvs as I dont' want them hidden) to the default (~/.virtualenvs), but I'm having problems getting them to load in reticulate.

Running use_virtualenv("~/virtualenvs/foo", required=T") works fine, but it evidently hasn't loaded as when I try to use packages that are only installed in this venv they can't load.

Running py_config() only displays the virtual environments that I have still in the default folder, but after running the use_virtualenv() call py_discover_config() correctly indicates that it's found the virtual env.

My WORKON_HOME is correctly set to ~/virtualenvs but it doesn't appear that this is being picked up by reticulate. From the code, it seems that this is hardcoded in R/config.R on line 110, rather than picking up WORKON_HOME.

[Off-Topic] Use sympy to provide symbolic computation in R

Firstly I want to really thank Allaire and RStudio people for all their significant works, including this remarkable package. It is also good to know that reticulate is already on CRAN.

An idea in my mind is to bring symbolic computation capacity to R, possibly by wrapping the sympy python library. We know that R is not designed for symbolic computation, however it does not limit the attempt to bring such capacity. One example that I previously looking at is the SymPy package for julia, but I think it is possible to provide a more seamlessly integration of sympy to R, with reticulate. And I think it is even possible that the R's wrapper can be more convenient than the python package itself -- due to the lazy evaluation nature of R, we may use some syntax magic to define a symbolic expression without declaring variables, etc..

In fact, reticulate already provides seamlessly interface to python functions, we can further substitute R's binary operators and math functions using group generics in order to work with sympy variables. We may also provide ways to convert a symbolic expression to a R function (as the julia package SymPy does) and other convenient functions to work with symbolic computation.

I have done some experiment with the idea (see https://github.com/Marlin-Na/symbolicR), due to the limited time and also my limited experience, I have not managed to provide a relative complete prototype so far.

But my question is:

How do you think about the idea that bring sympy to R with reticulate? Or did you have such plan when you created the reticulate package?

Thanks!

How to deal with infinite iterators / generators ?

Some Python functions return infinite iterators / generators ( keras::flow_images_from_data(), for example).
For such iterators / generators, reticulate::iterate() gets into an infinite loop, and it cannot be used in R's for loops as in native Python's for loops.

An obvious solution to the problem is to use Python's built-in next() function, but it seems a bit awkward because next is a keyword in R and you must quote it to call from R.

Do you have any better idea for the problem ?

library(reticulate)

py <- import_builtins()

# create an infinite generator
main <- py_run_string("
def foo():
    n = 0
    while True:
        yield n
        n += 1
")
it <- main$foo()

# This gets into an infinite loop
for(i in iterate(it)) {
  print(i)
  if(i >= 10) break
}

# It works, but seems a bit awkward
while(TRUE) {
  i <- py$`next`(it)
  print(i)
  if(i >= 10) break
}

Change path to Python binary

Dear reticulate-developers,

thanks for developing such a fantastic package!

Is it possible to change the path to another Python binary after already having run py_config or py_run_string?
In my case I would like to point reticulate to the QGIS Python binary (which is a bit of a hazzle under Windows). This works, i.e. reticulate accepts the QGIS Python binary (mostly to be found under C:\OSGEO4~1\bin\python.exe).

However, before running any RQGIS function I would like to check if the correct Python binary is in use. Assuming, the user has not set up correctly the QGIS Python environment, running py_config might point to another Python binary such as Anaconda Python ("...\AppData\Local\CONTIN~~1\ANACON~~1\python.exe). In this case, I would like to change to the QGIS Python binary. I tried use_python but it didn't work. Or is there any other way to find out which Python binary reticulate would use at a specific moment but without actually setting it.

Python object instantiation fails for classes defined in an R session

Thank you for creating an amazing package!

It would be more useful if we could easily extend Python classes in existing packages.
I tried to create a new class with Python's builtin type() function, but the class caused segfault when it was instantiated.

library(reticulate)
py <- import("__builtin__")
Foo1 <- py$type("Foo1", tuple(), dict(a=1, f=function(self, x) x+1))
Foo1
#> <class 'Foo1'>
foo1 <- py_call(Foo1, list())
#> *** caught segfault ***
#>address 0x8, cause 'memory not mapped'
#>
#>Traceback:
#> 1: .Call("reticulate_py_call", PACKAGE = "reticulate", x, args,     keywords)
#> 2: py_call(Foo1, list())

Instantiation works if the class is created directly in Python's __main__ module:

main <- py_run_string("
Foo2 = type('Foo2', (), {'a': 1, 'f': lambda self, x: x+1})
")
Foo2 <- environment(main$Foo2)$attrib
Foo2
#> <class '__main__.Foo2'>
foo2 <- py_call(Foo2, list())
foo2$a
#> [1] 1
foo2$f(1)
#> [1] 2

My environment is Ubuntu 16.04 LTS and the following:

py_config()
#> python:         /usr/bin/python
#> libpython:      /usr/lib/python2.7/config-x86_64-linux-gnu/libpython2.7.so
#> pythonhome:     /usr:/usr
#> version:        2.7.12 (default, Nov 19 2016, 06:48:10)  [GCC 5.4.0 20160609]
#> numpy:          /home/nakamura/.local/lib/python2.7/site-packages/numpy
#> numpy_version:  1.12.0
#> __builtin__:    __builtin__
#> available:      TRUE

#> python versions found: 
#> /usr/bin/python
#> /usr/bin/python3

devtools::session_info()
#> Session info -------------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.2.3 (2015-12-10)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language ja                          
#>  collate  ja_JP.UTF-8                 
#>  tz       Japan                       
#>  date     2017-02-09                  
#> 
#> Packages -----------------------------------------------------------------------
#>  package    * version date       source                             
#>  Rcpp         0.12.9  2017-01-14 cran (@0.12.9)                     
#>  devtools     1.12.0  2016-12-05 CRAN (R 3.2.3)                     
#>  digest       0.6.9   2016-01-08 CRAN (R 3.2.3)                     
#>  memoise      1.0.0   2016-01-29 CRAN (R 3.2.3)                     
#>  reticulate * 0.6.0   2017-02-08 Github (rstudio/reticulate@6478043)
#>  withr        1.0.2   2016-06-20 CRAN (R 3.2.3)

4 Byte Unicode (UCS4) Broken

(Pardon my ignorance of both Python and C++, but...)

I installed reticulate on a Raspberry Pi (raspbian distro -- a debian variant) and when attempting to import I get the following:

> picamera <- import("picamera")
Error in py_initialize(config$python, config$libpython, config$pythonhome,  : 
  PyUnicodeUCS2_FromString - /usr/lib/python2.7/config-arm-linux-gnueabihf/libpython2.7.so: undefined symbol: PyUnicodeUCS2_FromString

It seems that the Python 2.7 that comes on this distro is compiled to use 4 byte unicode characters (UCS4, rather than UCS2)

When altering the constant in libpython.cpp (master...trestletech:master) I'm then able to import successfully.

Travis documentation

I've been building a package using reticulate and am writing up some notes on how to integrate reticulate-based packages with Travis, since I expect many R developers will only be familiar with R travis procedures and not Python. Any interest in me formalising it, formatting it in the same fashion as reticulate's existing package documentation, and submitting a PR?

Feature request: integrate py$help() with R's help()

Currently python help messages can be printed to the console via

library(reticulate)
py <- import_builtins()
py$help(py$format)  # Help on `format`, for example.

However some python objects have very long documentation. It would be great if py#help() could be shown in the RStudio Help window.

Does this seem feasible? I'm happy to attempt a PR if anyone could provide hints.

Thanks for a great package!

Create a dictionary of numpy arrays

Is there a way to create a dictionary of numpy arrays? I tried the following but didn't work as expected:

> dict(length=matrix(iris$Sepal.Length))
function (x)  .Primitive("length")
Error in py_dict(keys, values, convert = convert) : 
  Unable to convert R object to Python type

Basically, I am looking for a way to convert an R data.frame to a dictionary of numpy arrays

Error: R is already initialized

This is a minor issue I ran into when executing an R script that uses reticulate, in a project that uses rpy2 to interface with R from python elsewhere.

Posting here in case others come across the same issue.

I have a test function in R called validate_setup:

library(reticulate)
validate_setup <- function(verbose = TRUE, test_packages = c('numpy', 'cohorts')) {
  # report on py_config
  if (verbose)
     print(reticulate::py_config())

  # check availability of package(s)
  for (pkg in test_packages) {
    if (verbose)
       print(paste0('checking that ', pkg, ' is available'))
    if (!reticulate::py_available(pkg))
       stop(paste0('python package could not be found: ', pkg))
  }

  # check python initialized
  if (verbose)
     print('checking that python is initialized')
  reticulate::py_config()
  reticulate:::ensure_python_initialized()

  # load package
  for (pkg in test_packages) {
    if (verbose)
      print(paste0('loading ', pkg))
    dummy <- reticulate::import(pkg, convert = FALSE)
  }
  print("setup OK")
}

I also have a trivial command-line wrapper script to execute this called run_validate_setup.R

I can execute this using in one of two python environments:

testenv1: includes only cohorts:

 conda create -n testenv1 python=3 scipy numpy pandas
 source activate testenv1
 pip install cohorts
 Rscript run_validate_setup.R

testenv2: includes cohorts & rpy2:

 conda create -n testenv2 --clone testenv1
 source activate testenv2
 pip install rpy2
 Rscript run_validate_setup.R

In the first virtualenv, I have no problems:

(testenv1) jacquelineburos@jacki1:~/projects/test-reticulate$ Rscript run_validate_setup.R
python:         /home/jacquelineburos/miniconda3/envs/testenv1/bin/python
libpython:      /home/jacquelineburos/miniconda3/envs/testenv1/lib/libpython3.6m.so
pythonhome:     /home/jacquelineburos/miniconda3/envs/testenv1:/home/jacquelineburos/miniconda3/envs/testenv1
version:        3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
numpy:          /home/jacquelineburos/miniconda3/envs/testenv1/lib/python3.6/site-packages/numpy
numpy_version:  1.12.1

python versions found:
 /home/jacquelineburos/miniconda3/envs/testenv1/bin/python
 /usr/bin/python
 /usr/bin/python3
[1] "checking that numpy is available"
[1] "checking that cohorts is available"
[1] "checking that python is initialized"
[1] "loading numpy"
[1] "loading cohorts"
/home/jacquelineburos/miniconda3/envs/testenv1/lib/python3.6/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
[1] "setup OK"
[1] "Finished"

However, in the second I get an error message: R is already initialized

(testenv2) jacquelineburos@jacki1:~/projects/test-reticulate$ Rscript run_validate_setup.R
python:         /home/jacquelineburos/miniconda3/envs/testenv2/bin/python
libpython:      /home/jacquelineburos/miniconda3/envs/testenv2/lib/libpython3.6m.so
pythonhome:     /home/jacquelineburos/miniconda3/envs/testenv2:/home/jacquelineburos/miniconda3/envs/testenv2
version:        3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
numpy:          /home/jacquelineburos/miniconda3/envs/testenv2/lib/python3.6/site-packages/numpy
numpy_version:  1.12.1

python versions found:
 /home/jacquelineburos/miniconda3/envs/testenv2/bin/python
 /usr/bin/python
 /usr/bin/python3
[1] "checking that numpy is available"
[1] "checking that cohorts is available"
[1] "checking that python is initialized"
[1] "loading numpy"
[1] "loading cohorts"
/home/jacquelineburos/miniconda3/envs/testenv2/lib/python3.6/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
R is already initialized

It's interesting to me that I only get this error when loading cohorts, not numpy. Also that this manifests when when running reticulate::import, and not on the validation-steps earlier.

Not surprisingly, if I remove rpy2 from the testenv2 condaenv, the error is resolved.

I recognize that this is a pretty marginal edge case, but again posting here in case it's helpful.

Provide 'as.dict' helper for explicitly constructing dictionary from list / environment?

Some Python code I'm translating to R looks like this:

def feature_columns_to_placeholders(feature_columns, batch_size=None):
  return {
    column.name: tf.placeholder(tf.float32, [batch_size])
    for column in feature_columns
  }

Note that the dictionary key column.name is actually evaluated and its result is used as the key, which is out-of-line with how R does things. For that reason, it would be useful to be able to construct a dictionary directly from an R list or environment, so we could write the roughly equivalent R code:

placeholders <- lapply(feature_columns, function(column) {
  tf$placeholder(tf$float32, list(batch_size))
})
as.dict(placeholders)

Inconsistency between tf.matmul in Python and tf$matmul in R

From @G-Lynn on February 13, 2017 23:31

Attached is a simple example of an issue in using matmul to multiply an array of matrices in R. I think the issue is a discrepancy in the way that the shape argument works in the R and python version.

I am trying to use the tf$matmul function to multiply an array of matrices by an array of vectors. As an example, I am trying to multiply [1, 2; 3, 4] * [1; 2] and [5,6; 7,8] * [3; 4]. The example works as expected in Python, but in the TensorFlow API for R, a dimension error is generated.

In Python, the code for the example is:
############# Beginning of Python code
import tensorflow as tf
a = tf.constant(np.arange(1, 9, dtype=np.int32),shape=[2, 2, 2]) #create array of 2 matrices each 2x2: (1,2; 3,4) and (5,6; 7,8)
b = tf.constant(np.arange(1, 5, dtype=np.int32),shape=[2, 2, 1]) #multuply each of the matrices by a 2x1 vector (1,2)' and (3,4)'
sess = tf.Session()
sess.run(a)
c = tf.matmul(a, b)
sess.run(a)
sess.run(b)
sess.run(c) #the answer is the correct set of 2x1 vectors (5,11)' and (39,53)'
############# End of Python Code

When I try to implement this same example in R, an error is produced due to a difference in the way the dimensions of the arrays are indexed in the shape argument.

##################### Begin R Code
rm(list = ls())
devtools::install_github("rstudio/tensorflow")
library(tensorflow)

#Create an array of 3 matrices
A = list(matrix(1:4, nrow=2, byrow=T), matrix(5:8, nrow=2, byrow=T))
A = array(unlist(A), dim=c(2,2,2) ) #2 matrices of dimension 2x2
B = array(1:4, dim = c(2,1,2) ) #2 vectors of 2x1
A_tf = tf$constant(A, dtype="float64", shape=c(2,2,2))
B_tf = tf$constant(B, dtype="float64", shape=c(2,1,2))
sess = tf$Session()
sess$run(A_tf)
sess$run(B_tf)
sess$run(tf$matmul(A_tf,B_tf))

################## End R code

I believe the error is because of a discrepancy in the way that the shape argument works in tf$constant (R) and the way the shape argument works in tf.constant (R). In R, the number of elements in the array is the last argument in shape so that shape = c(2,1,3) means an array with 3 2x1 vectors. In the python implementation, the number of array elements is the first argument so that shape=[3, 2, 1] means 3 vectors of 2x1.

When the function tf$matmul(A_tf,B_tf) is called, I think the difference in indexing the shapes of the array is causing an error.

Thanks for your attention.

Copied from original issue: rstudio/tensorflow#88

undefined symbol: mkl_pds_lp64_pardiso_getenv_f when using anaconda python

The package installs fine but can't initialize python:

> reticulate:::ensure_python_initialized()
Error in eval(substitute(expr), envir, enclos) : 
  ImportError: /home/joe/anaconda3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so: undefined symbol: mkl_pds_lp64_pardiso_getenv_f

Detailed traceback: 
  File "/home/joe/anaconda3/lib/python3.6/site-packages/numpy/__init__.py", line 146, in <module>
    from . import add_newdocs
  File "/home/joe/anaconda3/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/home/joe/anaconda3/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/home/joe/anaconda3/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/home/joe/anaconda3/lib/python3.6/site-packages/numpy/core/__init__.py", line 14, in <module>
    from . import multiarray

All works fine with the stock ubuntu python though, thanks for a very nice package!

use_python appears to have no effect

Please excuse my poor or incorrect usage, if any, in the following.

> library(reticulate)
> py_config()
python:         /home/<my id>/anaconda3/bin/python
libpython:      /home/<my id>/anaconda3/lib/libpython3.5m.so
pythonhome:     /home/<my id>/anaconda3:/home/<my id>/anaconda3
version:        3.5.2 |Anaconda custom (64-bit)| (default, Oct 20 2016, 03:10:33)  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
numpy:          /home/<my id>/anaconda3/lib/python3.5/site-packages/numpy
numpy_version:  1.11.2

python versions found: 
 /usr/bin/python
 /usr/bin/python3
 /home/<my id>/anaconda3/bin/python

Attempting to change python version:

> use_python("/usr/bin/python", required = T)

Observe - no apparent change:

> py_config()
python:         /home/<my id>/anaconda3/bin/python
libpython:      /home/<my id>/anaconda3/lib/libpython3.5m.so
pythonhome:     /home/<my id>/anaconda3:/home/<my id>/anaconda3
version:        3.5.2 |Anaconda custom (64-bit)| (default, Oct 20 2016, 03:10:33)  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
numpy:          /home/<my id>/anaconda3/lib/python3.5/site-packages/numpy
numpy_version:  1.11.2

python versions found: 
 /usr/bin/python
 /usr/bin/python3
 /home/<my id>/anaconda3/bin/python
> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          3.3                         
year           2017                        
month          03                          
day            06                          
svn rev        72310                       
language       R                           
version.string R version 3.3.3 (2017-03-06)
nickname       Another Canoe               
>

Shouldn't the result of use_python be reflected in the result of py_config()?

Please let me know if you need more info, thanks.

Best,
CB

"import" function cannot make use of bash environment variable

This issue is related to rstudio/tensorflow#121

I'm running on Ubuntu 16.04 with CUDA 8.0 and CUdnn 5.1 installed.

I was trying to run the R Tensorflow package, I had it installed correctly (and working), and then I upgraded the underlying Python Tensorflow package to support GPU. After the upgrade with pip install --upgrade, if I run R directly from bash, import("tensorflow") works correctly, but when I use Rstudio, import("tensorflow") returns this error:

Error in py_module_import(module, convert = convert) : 
  ImportError: Traceback (most recent call last):
  File "/home/user/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/user/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/user/anaconda3/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/home/user/anaconda3/envs/r-tensorflow/lib/python3.6/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/user/anaconda3/envs/r-tensorflow/lib/python3.6/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.8.0: cannot op

The libcublas.so.8.0 file is stored in /usr/local/cuda-8.0/lib64, and I have my LD_LIBRARY_PATH environment variable pointing to it.

I suspected that it might be due to RStudio not having the correct environment variables set up as in my ~/.bashrc, but

system("python -c 'import tensorflow as tf; sess = tf.InteractiveSession()'")

within RStudio would show the correct initialization message.

I also tried to manually set up all the environment variables (e.g. LD_LIBRARY_PATH, CUDA_HOME, PATH) through Sys.setenv, and it still doesn't work.

Would you mind helping me with it? I think this will enable a huge performance potential for Tensorflow on RStudio.

Use in Jupyter notebooks?

I'm trying to use reticulate in Jupyter notebooks. I'm new to both.

First, I don't seem to be able to set which python to use.
Second, I don't seem to be able to capture print output.
Thoughts?

printing dictionary with integer keys crashes R

Observed on Windows w/ Python 3.5:

py_run_string("x = {0:1, 1:4}")$x

On Mac w/ Python 2.7 an error indicating dictionaries with integer keys aren't supported is thrown.

Mapping uint32/ulong in numpy arrays

These came up and py_ref_to_r complains. Could we map these to double around here https://github.com/rstudio/reticulate/blob/master/src/python.cpp#L396?

rstudio / reticulate Goto Github PK

reticulate's Introduction

R Interface to Python

Getting started

Installation

Python version

Python packages

Calling Python

Python in R Markdown

Importing Python modules

Sourcing Python scripts

Python REPL

Type conversions

Learning more

Why reticulate?

reticulate's People

Contributors

Stargazers

Watchers

Forkers

reticulate's Issues

Recommend Projects

Recommend Topics

Recommend Org