Git Product home page Git Product logo

connections's Introduction

connections

Lifecycle: experimental R-CMD-check Codecov test coverage CRAN status

The main goal of connections is to integrate DBI-compliant packages with the RStudio IDE’s Connection Pane. Packages such as RPostgres, RSQLite, RMariaDB and bigrquery connect R to those databases, but do not provide a direct integration with the Connections Pane. connections reads the configuration of the connection and creates the integration with RStudio.

A second goal is to provide integration with the pins package. The connections package allows you to pin database connections and dplyr table objects.

Installation

Install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("rstudio/connections")

Functions

The two main functions added by connections are:

  • connection_open() - Opens the database connection. Use instead of dbConnect(), but use the exact same arguments. It also automatically starts the Connections pane.
  • connection_close() - Closes the database connection.
library(connections)
library(RSQLite)

con <- connection_open(SQLite(), "local.sqlite")


The connection can now be closed by using the appropriate button in the Connections pane, or by using connection_close()

connection_close(con)


The connection code is parsed when connecting to the database, and it is visible once the connection is closed.

Uploading and referencing tables with dplyr

connections integrates with dplyr by supporting the following two functions:

  • tbl() - To create a pointer to a table or view within the database.
  • copy_to() - To copy data from the R session to the database.

The version of copy_to() inside connections automatically updates the Connections pane, so the new table automatically shows up.

con <- connection_open(SQLite(), "local.sqlite")

copy_to(con, mtcars, temporary = FALSE, overwrite = TRUE)
#> # Source:   table<mtcars> [?? x 11]
#> # Database: sqlite 3.41.2 [/Users/edgar/r_projects/connections/local.sqlite]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ more rows

To use an existing table inside the database use tbl().

db_mtcars <- tbl(con, "mtcars")


The tbl() function opens the rest of the already available dplyr database integration.

db_mtcars %>%
  group_by(am) %>%
  summarise(avg_mpg = mean(mpg, na.rm = TRUE))
#> # Source:   SQL [2 x 2]
#> # Database: sqlite 3.41.2 [/Users/edgar/r_projects/connections/local.sqlite]
#>      am avg_mpg
#>   <dbl>   <dbl>
#> 1     0    17.1
#> 2     1    24.4

pins

The connections package integrates with pins. It adds the ability to “pin” database connections and queries. It follows the same approach as the vetiver package. connections now has two new functions:

  • connection_pin_write()
  • connection_pin_read()

Pin a database connection

The connection_pin_write() function does not save the R object. It records the code necessary to recreate the connection.

library(pins)
board <- board_folder("~/pins")

connection_pin_write(board, con, name = "my_conn")
#> Creating new version '20231218T234042Z-8d9ce'
#> Writing to pin 'my_conn'


If you wish to see the code that connections will use when recreating the conneciton from the pin, you can use connection_code():

connection_code(con)
#> library(connections)
#> library(RSQLite)
#> con <- connection_open(SQLite(), "local.sqlite")

connection_pin_read() will replay the exact same code used to initially connect to the database. Assign the output to a variable, such as con1. The variable will work just like any connection variable.

con1 <- connection_pin_read(board, "my_conn")

The con1 variable is now a regular database connection variable.

db_mtcars <- tbl(con1, "mtcars") %>%
  group_by(am) %>%
  summarise(avg_mpg = mean(mpg, na.rm = TRUE))

db_mtcars
#> # Source:   SQL [2 x 2]
#> # Database: sqlite 3.41.2 [/Users/edgar/r_projects/connections/local.sqlite]
#>      am avg_mpg
#>   <dbl>   <dbl>
#> 1     0    17.1
#> 2     1    24.4

Pin a dplyr database query

When dplyr works with database data, the resulting query is not executed until the data is explicitly collected into R, or when printing the top results to the R Console. The pin records two things:

  • The dplyr R object that contains all of the transformations. It does not save the actual results.

  • The necessary information to recreate the database connection. This is to make sure that the data is being retrieved from the original database connection.

connection_pin_write(board, db_mtcars, name = "avg_mpg")
#> Creating new version '20231218T234042Z-ae23e'
#> Writing to pin 'avg_mpg'


connection_pin_read() will connect to the database, and return the dplyr object. Without assigning it to a variable, the pin will immediately print the results of the database. Those results are being processed at the time connection_pin_read() runs.

connection_pin_read(board, "avg_mpg")
#> # Source:   SQL [2 x 2]
#> # Database: sqlite 3.41.2 [/Users/edgar/r_projects/connections/local.sqlite]
#>      am avg_mpg
#>   <dbl>   <dbl>
#> 1     0    17.1
#> 2     1    24.4

Full pins example

The way pins integrates with databases, via the connections package, allows to open the connection from a pin, and pipe all of the subsequent code into a new pin. Afterwards, that pin can be used to collect or to continue using the dplyr object.

board <- board_folder("~/pins")

con <- connection_pin_read(board, "my_conn")

tbl_summary <- con %>%
  tbl("mtcars") %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg, na.rm = TRUE))


connection_pin_write(board, tbl_summary, name = "cyl_mpg")
#> Creating new version '20231218T234042Z-835f4'
#> Writing to pin 'cyl_mpg'

connection_close(con)

connection_pin_read(board, "cyl_mpg")
#> # Source:   SQL [3 x 2]
#> # Database: sqlite 3.41.2 [/Users/edgar/r_projects/connections/local.sqlite]
#>     cyl avg_mpg
#>   <dbl>   <dbl>
#> 1     4    26.7
#> 2     6    19.7
#> 3     8    15.1


Back-end examples

There are a couple of examples of how the Connections pane will look when opening the connection via connections.

BigQuery, via bigrquery

library(connections)
library(bigrquery)

con <- connection_open(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "austin_311",
  billing = "my_project_billing",
  use_legacy_sql = FALSE
)


connection_close(con)


PostgreSQL, via RPostgres

library(connections)
library(RPostgres)
con <- connection_open(
  Postgres(),
  host = "localhost",
  dbname = "datawarehouse",
  user = "[user id]",
  password = "[password]",
  bigint = "integer",
  port = "5432"
)


DBI connections

It is possible to integrate DBI connections not opened via connection_open(). To do that, use connection_view() and pass it the variable containing the existing database connection.

library(DBI)

con <- dbConnect(RSQLite::SQLite(), ":memory:")

connection_view(con)


Changes to the database will not automatically load in the Connections pane. The connection_update() function will refresh the pane with the latest.

dbWriteTable(con, "mtcars", mtcars)

connection_update(con)


connection_close(con)


connections's People

Contributors

edgararuiz avatar edgararuiz-zz avatar hadley avatar javierluraschi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

connections's Issues

bigrquery() dataset specification

Hi,

Great package! Recently started exploring this and find it useful when working with multiple different datasets located on different platforms (BGQ, Redshift, Postgres, etc.).

A question in relation to BGQ. In the bigrquery package these days (2.0.0 and beyond), I can do without specifying a dataset to connect to. All I need to do is to put down the dataset name when connecting the tables:

tbl_a <- con %>% tbl("dataset_a.table_a")

It looks like connection_open() right now is forcing me to choose. Since I have many datasets within a BGQ project, is there a way to do without specifying 10 connections for 10 different datasets within the same project?

Thanks for your time.
T

Missing DBI methods

Certain DBI functions don't have methods for connConnection objects. I've found the following:

  • dbListTables()
  • dbRemoveTable()
  • dbDisconnect() - although connections provides connection_close(), which is the preferred method of disconnect

Is it possible to pin an SQL statement?

Hi,

Wonderful package!

I was wondering if it is possible to pin an explicit SQL statement, similar to the example of pinning a dplyr database command

Example use cases would be

  • a query that has been optimized by hand for speed with domain knowledge
  • a query that involves manipulating postgres jsonb objects

thanks

Iain

Obscured connection details are odd in the connections pane

If you obscure connection details, by using config for example, then the resulting connections pane can be a bit odd. The following connection uses config to store the credentials:

con <- connection_open(
  RPostgres::Postgres(),
  host =  get("host"),
  user = get("user"),
  password = get("pwd"),
  port = get("port"),
  dbname = get("database")",
  bigint = "integer"
)

Then this connection is made, the following shows up in the RStudio IDE:

image

Notice that get("database") persists in the connections pane. The name of the database is "retail", as evidenced when the following code is used:

con <- dbConnect(
  odbc::odbc(),
  driver = "PostgreSQL",
  host =  get("host"),
  user = get("user"),
  password = get("pwd"),
  port = get("port"),
  database = get("database"),
  bigint = "integer"
)

image

In this case, the actual database name ("retail") is shown in the connections pane.

NoSQL connections

Hi folks,
Any plans to support NoSQL via nodbi connections? I believe it would be a very nice addition.
Thank you

`connection_view()` not displaying tables in the DB

Minimal example:

tmp <- tempfile()
dir.create(tmp)
dbplyr::nycflights13_sqlite(path = tmp)
con <- DBI::dbConnect(RSQLite::SQLite(), database = file.path(tmp, "nycflights13.sqlite"))
connections::connection_view(con)

The connection is properly created and appears in the Connections Pane, but you can't interact with it.

How can I execute instant SQL query on connected database?

Hi,

Thanks for providing such a useful package. I successfully connected the the database using snippet file (refer: https://rstudio.github.io/rstudio-extensions/rstudio-connections.html ). But, cannot find a way to execute query on connected database. How can I execute instant SQL query on connected database?

I referred several blogs where people can use SQL button to execute instant queries (refer : https://blog.rstudio.com/2018/10/02/rstudio-1-2-preview-sql/ ) after connecting to SQL database.

Could you please help executing instant queries using connections package?
Thanks,
Prasad Manedeshmukh.

Dropped from CRAN

Looks like the package was dropped from CRAN a couple of weeks ago for unaddressed check failures. I use the package for teaching to wrap RPostgres and a couple of students ran into install problems. Would be nice to have it on CRAN despite the experimental status, although obviously installing from GitHub works.

Feature/Functon Request: connection_rm() or connection_close(con, rm_pointer = TRUE)

Hello,

I have been testing out the connections packages at work. I was able to create a connection in the connections pane with connection_open() and then close the connection with connection_close(). The issue is that there is a reference that remains to the connection in the connections pane, and when you click on it, it has all the connection code including the secrets that you use to make the connection.

Leaving secrets readily available is a little uncomfortable for work stuff, which is why I didn't paste a screen shot. If you reference your README.md you will see what I mean in your connection_close() of your SQLite database.

Solution #1: create a connection_rm(con) that removes the reference to the database after you have closed it.
Solution #2: add a parameter to the connection_close(con, rm_pointer = TRUE) to the current function that completely clears the connection reference from the connections pane.
Solution #3: add a parameter to the connection_open(con, don't_save = TRUE) that mimics the behavior of the work around solution below where none of the code is saved even though the reference to the connection remains.


Work around solution (solution #3 above) Postgres:

  1. First create a connection with RPostgres: pg_con <- RPostgres::dbConnect(...)
  2. Use connection_view(pg_con) to add the connection to the connections pane.
  3. Use connection_close(pg_con) to close the connection. In this case, none of the R code used to make the connection is saved in the reference that remains in the connections pane.

If I was a better tidyverse coder, I would supply a pull request, sorry!

Thanks!

`connection_open()` could show more information in the display name

Here's the display name shown by opening the same postgres connection using ODBC vs connection_open():

image

Or a similar but SQLite connection:

image

tmp <- tempfile()
dir.create(tmp)
dbplyr::nycflights13_sqlite(path = tmp)
con <- connections::connection_open(RSQLite::SQLite(), file.path(tmp, "nycflights13.sqlite"))
con <- DBI::dbConnect(odbc::odbc(), driver = "SQLite", database = file.path(tmp, "nycflights13.sqlite"))

Broken README link

The link to connections from the README returns a 404:

[Connection Pane](https://db.rstudio.com/rstudio/connections/)

Are connection pins encrypted?

Hello,

First this package is really cool! I love that you can pin a connection and then recall that connection, as this would be very useful when running code on a schedule.

My question is, are the pins pointing to a connection encrypted? For work this would be best practices to have the details of the connection obscured.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.