Git Product home page Git Product logo

crrri's Introduction

crrri

Lifecycle: experimental Codecov test coverage CRAN status R build status

Work in progress

The goal of crrri is to provide a native Chrome Remote Interface in R using the Chrome Debugging Protocol. This is a low-level implementation of the protocol heavily inspired by the chrome-remote-interface JavaScript library written by Andrea Cardaci.

This package is intended to R packages developers who need to orchestrate Chromium/Chrome: with crrri, you can easily interact with (headless) Chromium/Chrome using R. We worked a lot to provide the most simple API. However, you will have the bulk of the work and learn how the Chrome DevTools Protocol works. Interacting with Chromium/Chrome using the DevTools Protocol is a highly technical task and prone to errors: you will be close to the metal and have full power (be cautious!).

This package is built on top of the websocket and promises packages. The default design of the crrri functions is asynchronous: they return promises. You can also use crrri with callbacks if you prefer.

We are highly indebted to Miles McBain for his seminal work on chradle that inspired us. Many thanks!

System requirements

First of all, you do not need a node.js configuration because crrri is fully written in R.

You only need a recent version of Chromium or Chrome. A standalone version works perfectly well on Windows. By default, crrri will try to find a chrome binary on your system to use, using the find_chrome_binary(). You can tell crrri to use a specific version by setting the value of the HEADLESS_CHROME environment variable to the path of Chromium or Chrome (this is the same environment variable that is used in decapitated). You can check it is set correctly by executing Sys.getenv("HEADLESS_CHROME") in your R console.

Otherwise, you can also use the bin argument of the Chrome class new() method to provide the path directly.

chrome <- Chrome$new(bin = "<path-to-chrome-binary->")

Note that if ever you don’t know where your binary is, you can use directly the find_chrome_binary() function, which will try to guess where your binary is (you might neeed to install the package).

This two calls are equivalent

chrome <- Chrome$new(bin = find_chrome_binary())
# the default
chrome <- Chrome$new(bin = NULL)

Installation

You can install the development version of crrri from GitHub with:

remotes::install_github('rlesur/crrri')

Using crrri interactively

The crrri package is a low-level interface and is not intended to be used interactively: the goal of crrri is to provide to R developers a set of classes and helper functions to build higher levels functions.

However, you can discover headless Chrome automation interactively in your R session using crrri. This will help you to learn the Chrome DevTools Protocol, the crrri design and develop higher level functions.

A short-tour

Assuming that you have configured the HEADLESS_CHROME environment variable (see above), you can start headless Chrome:

library(crrri)
chrome <- Chrome$new()

The Chrome class constructor is a synchronous function. That means the R session is on hold until the command terminates.

The $connect() method of the Chrome class will connect the R session to headless Chrome. As the connection process can take some time, the R session does not hold[1]: this is an asynchronous function. This function returns a promise which is fulfilled when R is connected to Chrome.

However, you can pass a callback function to the $connect() method using its callback argument. In this case, the returned object will be a connection object:

client <- chrome$connect(callback = function(client) {
  client$inspect()
})

The $inspect() method of the connection object opens the Chrome DevTools Inspector in RStudio (>= 1.2.1335) or in your default web browser (you can have some trouble if the inspector is not opened in Chromium/Chrome). It is convenient if you need to inspect the content of a web page because all that you need is in RStudio.

DevTools Inspector in RStudio viewer

In order to discover the Chrome DevTools Protocol commands and events listeners, it is recommended to extract one of the domains[2] from the connection object:

Page <- client$Page

The Page object represents the Page domain. It possesses methods to send commands or listen to specific events.

For instance, you can send to Chromium/Chrome the Page.navigate command as follows:

Page$navigate(url = "http://r-project.org")

Once the page is loaded by headless Chrome, RStudio looks like this:

R Project website in headless Chrome

You will see in the R console:

<Promise [pending]>

This is a promise object that is fulfilled when Chromium/Chrome sends back to R a message telling that the command was well-received. This comes from the fact that the Page$navigate() function is also asynchronous. All the asynchronous methods possess a callback argument. When the R session receives the result of the command from Chrome, R executes this callback function passing the result object to this function. For instance, you can execute:

Page$navigate(url = "https://ropensci.org/", callback = function(result) {
  cat("The R session has received this result from Chrome!\n")
  print(result)
})

Once the page is loaded, you will see both the web page and the result object object in RStudio:

rOpenSci website in headless Chrome

To inspect the result of a command you can pass the print function to the callback argument:

Page$navigate(url = "https://ropensci.org/", callback = print)
#> $frameId
#> [1] "3BB38B10082F28A946332100964486EC"
#> 
#> $loaderId
#> [1] "9DCF07625678433563CB03FFF1E8A6AB"

The result object sent back from Chrome is also the value of the promises once fulfilled. Recall that if you do not use a callback function, you get a promise:

async_result <- Page$navigate(url = "http://r-project.org")

You can print the value of this promise once fulfilled with:

async_result %...>% print()
#> $frameId
#> [1] "3BB38B10082F28A946332100964486EC"
#> 
#> $loaderId
#> [1] "7B2383E8F2F39273E18E4D918F1852A0"

As you can see, this leads to the same result as with a callback function.

To sum up, these two forms perform the same actions:

Page$navigate(url = "http://r-project.org", callback = print)
Page$navigate(url = "http://r-project.org") %...>% print()

If you interact with headless Chrome in the R console using crrri, these two forms are equivalent.
However, if you want to use crrri to develop higher level functions, the most reliable way is to use promises.

Do not forget to close headless Chrome with:

chrome$close()

Since the RStudio viewer has lost the connection, you will see this screen in RStudio:

headless Chrome closed

Now, you can take some time to discover all the commands and events of the Chrome DevTools Protocol. The following examples will introduce some of them.

Domains, commands and events listeners

While working interactively, you can obtain the list of available domains in your version of Chromium/Chrome.
First, launch Chromium/Chrome and connect the R session to headless Chromium/Chrome:

chrome <- Chrome$new()
client <- chrome$connect(~ .x$inspect())

Once connected, you just have to print the connection object to get informations about the connection and availables domains:

client
#> <CDP CONNECTION>
#> connected to: http://localhost:9222/
#>  target type: "page"
#>    target ID: "9A576420CADEA9A514C5F027D30B410D"
#> <DOMAINS>
#> 
#> Accessibility (experimental)
#> 
#> Animation (experimental)
#> 
#> ApplicationCache (experimental)
#> 
#> Audits (experimental): Audits domain allows investigation of page violations and possible improvements.
#> 
#> Browser: The Browser domain defines methods and events for browser managing.
#> 
#> CacheStorage (experimental)
#> 
#> Cast (experimental): A domain for interacting with Cast, Presentation API, and Remote Playback API functionalities.
...

These informations are directly retrieved from Chromium/Chrome: you may obtain different informations depending on the Chromium/Chrome version.

In the most recent versions of the Chrome DevTools Protocol, more than 40 domains are available. A domain is a set of commands and events listeners.

In order to work with a domain, it is recommended to extract it from the connection object. For instance, if you want to access to the Runtime domain, execute:

Runtime <- client$Runtime

If you print this object, this will open the online documentation about this domain in your browser:

Runtime # opens the online documentation in a browser

Presentations about crrri

  • uros2019 - 20/05/2019 (slides)
  • useR!2019 - 12/07/2019 (slides)

Examples

Generate a PDF

Here is an example that produces a PDF of the R Project website:

library(promises)
library(crrri)
library(jsonlite)

perform_with_chrome(function(client) {
  Page <- client$Page

  Page$enable() %...>% { # await enablement of the Page domain
    Page$navigate(url = "https://www.r-project.org/") 
    Page$loadEventFired() # await the load event
  } %...>% {
    Page$printToPDF() 
  } %...>% { # await PDF reception
    .$data %>% base64_dec() %>% writeBin("r_project.pdf") 
  }
})

All the functions of the crrri package (commands and event listeners) return promises (as defined in the promises package) by default. When building higher level functions, do not forget that you have to deal with promises (those will prevent you to fall into the Callback Hell).

For instance, you can write a save_as_pdf function as follow:

save_url_as_pdf <- function(url) {
  function(client) {
    Page <- client$Page

    Page$enable() %...>% {
      Page$navigate(url = url)
      Page$loadEventFired()
    } %...>% {
      Page$printToPDF()
    } %...>% {
      .$data %>%
        jsonlite::base64_dec() %>%
        writeBin(paste0(httr::parse_url(url)$hostname, ".pdf"))
    }
  }
}

You can pass several functions to perform_with_chrome():

save_as_pdf <- function(...) {
  list(...) %>%
    purrr::map(save_url_as_pdf) %>%
    perform_with_chrome(.list = .)
}

You have created a save_as_pdf() function that can handle multiple URLs:

save_as_pdf("http://r-project.org", "https://ropensci.org/", "https://rstudio.com")

Transpose chrome-remote-interface JS scripts: dump the DOM

With crrri, you should be able to transpose with minimal efforts some JS scripts written with the chrome-remote-interface node.js module.

For instance, take this JS script that dumps the DOM:

const CDP = require('chrome-remote-interface');

CDP(async(client) => {
    const {Network, Page, Runtime} = client;
    try {
        await Network.enable();
        await Page.enable();
        await Network.setCacheDisabled({cacheDisabled: true});
        await Page.navigate({url: 'https://github.com'});
        await Page.loadEventFired();
        const result = await Runtime.evaluate({
            expression: 'document.documentElement.outerHTML'
        });
        const html = result.result.value;
        console.log(html);
    } catch (err) {
        console.error(err);
    } finally {
        client.close();
    }
}).on('error', (err) => {
    console.error(err);
});

Using crrri, you can write:

library(promises)
library(crrri)

async_dump_DOM <- function(client) {
  Network <- client$Network
  Page <- client$Page
  Runtime <- client$Runtime
  Network$enable() %...>% { 
    Page$enable()
  } %...>% {
    Network$setCacheDisabled(cacheDisabled = TRUE)
  } %...>% {
    Page$navigate(url = 'https://github.com')
  } %...>% {
    Page$loadEventFired()
  } %...>% {
    Runtime$evaluate(
      expression = 'document.documentElement.outerHTML'
    )
  } %...>% (function(result) {
    html <- result$result$value
    cat(html, "\n")
  }) 
}

perform_with_chrome(async_dump_DOM)

If you want to write a higher level function that dump the DOM, you can embed the main part of this script in a function:

dump_DOM <- function(url, file = "") {
  perform_with_chrome(function(client) {
    Network <- client$Network
    Page <- client$Page
    Runtime <- client$Runtime
    Network$enable() %...>% { 
      Page$enable()
    } %...>% {
      Network$setCacheDisabled(cacheDisabled = TRUE)
    } %...>% {
      Page$navigate(url = url)
    } %...>% {
      Page$loadEventFired()
    } %...>% {
      Runtime$evaluate(
        expression = 'document.documentElement.outerHTML'
      )
    } %...>% (function(result) {
      html <- result$result$value
      cat(html, "\n", file = file)
    }) 
  })
}

Now, you can use it for dumping David Gohel’s blog:

dump_DOM(url = "http://www.ardata.fr/post/")
# or to a file
dump_DOM(url = "http://www.ardata.fr/post/", file = "export-ardata-blog.html")

You can find many other examples in the wiki of the chrome-remote-interface module.

Development

Logging Messages

In crrri, there are two types of messages:

  • Those sent during connection/disconnection (mainly for crrri debugging)
  • Those tracking the exchanges between the R websocket client and the remote websocket server. These lasts are essential for R devs to develop higher levels packages, either during the development process and for debugging purposes.

crrri uses debugme for printing those messages. It is disable by default and you won’t see any messages

  • as a user we think it is fine. However, if you are a developer, you would expect some information on what is going on.

You need to add "crrri" to the DEBUGME environment variable before loading the package to activate the messaging feature. Currently in crrri there is only one level of message.Also, debugme is a Suggested dependency and you may need to install it manually if not already installed.

Credits

Andrea Cardaci for chrome-remote-interface.

Miles McBain for chradle.

Bob Rudis for decapitated.

  1. most of R users should think that this behavior is weird but it is extremely powerful!

  2. a domain is a set of commands, events listeners and types.

crrri's People

Contributors

cderv avatar colinfay avatar rlesur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crrri's Issues

check if bin exist chr_init

if not, an error should be thrown as soon as we know it will fail.
Right now, it fails late

chr_connect()

throws

Cannot find the websocket entrypoint of Chrome headless.
Closing headless Chrome...
<Promise [rejected: simpleError]>
Warning messages:
1: In open.connection(con, "rb") :
  InternetOpenUrl failed: 'Impossible d'établir une connexion avec le serveur'
2: In chr_kill(chr_process, work_dir) : attempt to apply non-function
Unhandled promise error: Command not found

I would expect a nice message about google-chrome not found

Improve verbosity for user about what is happening

We have DEBUGME output for developer. See what we can do to improve verbosity on what is going on for the user and where to put it.

Async mode and promise is not the easier, we may improve the verbosity.

Idea on how

  • A wrapper with a global option to deactivate easily
get_verbose <- function(msg) {
    is_verbose <- getOption("crrrri.verbose", FALSE)
    if (is_verbose) message(msg)
}

Use rlang to manage environment stuff

Example:

  • get("method_to_be_sent", envir = parent.env(environment())) will become rlang::env_get(nm = "method_to_be_sent", inherit = TRUE)

As we are importing rlang, I think it could be better to use it where needed to take advantage of consistency and helper functions.

Create other Remotes classes

It shouldn't be difficult now to create classes inheriting from CDPRemote for Opera, Node.js, Safari and Edge.

Add a wrapper for base64 encoded data

Currently, this is possible using jsonlite::base64_dec but you need to know how to write this

Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }
  Page.captureScreenshot(format = "png", fromSurface = TRUE) %...T>% {
    .$result$data %>% jsonlite::base64_dec() %>% writeBin("test.png")
  }

Something like crrri:::write_base64

write_base64 <- function(promise, con) {
    promise %...T>% {
        .$result$data %>% jsonlite::base64_dec() %>% writeBin(con)
    }
}

or without pipe

write_base64 <- function(promise, con) {
    data <- promise$result$data
    write <- promises::then(promise,
        raw <-  jsonlite::base64_dec(data) 
        writeBin(raw, con)
   promise::then(write, 
       function(value) promise
    )
}

It would be a side effect function, (like purrr::walk) and return the promise from the LHS.

Add tests in the package

Next very important task !

  • Add and improve manual test because it is useful to have at least those
  • Add non regression test where we can to have some coverage and a better fail safe.

Offer functions to setup and find chrome

inspired by

  • pagedown::find_chrome() (would remove a suggest just for that in vignette)
  • decapitated 📦 that offers to download a portable chrome. I use that and it works very well.

For crrri, as a low level 📦 we could also assume that installation and configuration must be done by the user without any help from crrri. Just some documentation in README.

Better Identify file created automatically

I find it difficult to identify easily which files are created by the generator and which files are for our functions. We can't use subdirectories inside the R folder so it needs to be in the name.

This is small improvement I know, but what do you think ?

We could either reverse the naming scheme from the generated files:

  • ***_commands ➡️ command_***
  • ***_events ➡️ events_***

That way we'll have files regrouped in explorer.

Other solution:

  • add a new prefix like DP_*** to identify files with functions from devtools protocol.
  • add a prefix like crrri_*** to files that are not generated automatically...

What do you think?

gives information about the connexion state to chrome

We create a connexion using chrome <- chr_connect(). If we use it then call sometime chr_disconnect(chrome) the connexion is closed. Currently, chrome object does have this new information. It would be nice to know quickly if the chrome connexion we have created is still usable.

A parallel can be made with DBI and the object con. After dbDisconnect the con has a new state DISCONNECTED

Add a setTimeout mechanism

It seems useful to have a timeout to reject a promise. Currently, the use seems to be like in the readme example:

promise_race(
    timeout(delay),
    chrome %>% 
      Page.enable() %>%
      Page.navigate(url = url) %>% 
      Page.frameStoppedLoading(frameId = ~ .res$frameId) %>%  
      Page.printToPDF() %...T>% { 
        .$result$data %>% base64_dec() %>% writeBin(paste0(names(url), ".pdf")) 
      }
  )

we may be able to provide a wrapper function to provide this feature more easily.

IDEA: add a onFinally arg in hold

That would be passed to promises::finally to execute before returning the promise value

promises::then(
    promise,
    onFulfilled = function(value) {
      state$pending <- FALSE
      state$fulfilled <- TRUE
      state$value <- value
    },
    onRejected = function(error) {
      state$pending <- FALSE
      state$fulfilled <- FALSE
      state$reason <- error
    }
  ) %>%
  promises::finally(onFinally = fun)

I have thought about that while working on chrome_execute thinking that chrome could be closed at the very end with

invisible(hold(results_available, timeout = total_timeout, onFinally = ~ chrome$close()))

Thoughts ?

Add removeListener for event emitter

All the function for an event emitter object has not yet be implemented. Especially, removeListener.
See spec: https://nodejs.org/api/events.html#events_emitter_removelistener_eventname_listener

it is included in once() because the listener is removed at first occurrence. But when register with on(), there is no way yet to remove the listener.

Easy way :

Consider one listener per event only. Removing the listener is equivalent to removing the event.

Hard way:

Consider several listeners per event. removeListener should be able to remove the correct listener.

Ideas:

  • Currently callbacks objects return a rm function at registration.
  • This function could be kept somewhere (in a list or env) and called when needed to remove the listener.
  • This would require a key so that the correct listener is removed. 🤔 use digest::digest() ?

examples

myEmitter <- EventEmitter$new()
myEmitter$on('event',
    function() {
        message("an event occured")
    }
)

how to remove this listener with anonymous function ?

  • myEmitter$removeListener('event", function() { message("an event occured") }) ?

Only allow it with named function?

myEmitter <- EventEmitter$new()
my_fun <-  function() {
        message("an event occured")
    }
myEmitter$on('event', my_fun)
  • myEmitter$removeListener('event', "myfun") ?
  • myEmitter$removeListener('event', myfun) ?

Rethink API toward OOP ?

OOP was on the table but not chosen for this first draft. Mentioned in #1

Advantage of OOP

  • More secure
  • More suited for low-level api
  • Adapted to Domain and methods per domain (ex Page$captureSnapshot())

Advantage of current approach

  • The closest from the protocol and how it is used in JS
  • Better documentation in R
  • Easier to generate "automagically" 😉
  • works well with Promises and Websockets

Main Question about OOP :

  • How would it work with promises and websocket ?

Leaving as is for now - we'll see on the run how this first draft is doing.

[Idea] navigateToFile

Right now the Page object allows to navigate to a url, and if you want to navigate to a file you have to Page$navigate(url = sprintf("file://%s", normalizePath(file_path)) ).

It could be nice to have a $navigateToFile method that takes a relative path, normalize it, and dooes the sprintf("file://%s").

Wrong code to stop httpuv server

In utils.R, is_available_port function, this line:
on.exit(srv$stop())

should be:
on.exit(httpuv::stopServer(srv))

otherwise, it gives error $ can't be used on atomic something, because srv is a string.

After fixed this one, I got an error that says:
Error: 'current_env' is not an exported object from 'namespace:rlang'
Called from: getExportedValue(pkg, name)

No idea how to fix.

Export all promises functions or Depends ?

Currently, we need systematically to do

library(crrri)
library(promises)

in order to access finally() for example, or promice_race.

crrri works heavily with promises and require this other 📦 . Should we

  • leave as is
  • make crrri depends on promises
  • export a few more functions we know are required for good use of crrri (like finally)

Confusion between arguments and values in events man pages ?

Hi,

When I look at the man page of a function of type "Event", such as Network.requestIntercepted, it seems to me that the elements described in Arguments are in fact the elements returned as Values. In this case, for example, the interceptionId element is returned after the event is fired, but it is not an argument of Network.requestIntercepted.

It's possible I'm completely misunderstanding, my apologies if this is the case.

Implement an API for event listeners

Implementing event listeners is not so easy.

Assume that we want to navigate on a website and ensure that the frame is loaded. Since multiple frames can be opened, we want to ensure that the main frame is loaded.
In order to achieve this task, we have to send Page.navigate, retrieve the frameId of the response and register a listener on the event Page.frameStoppedLoading for the given frameId.

With the current version of crrri, we have to write this (ugly) code:

library(crrri)
library(promises)
con <- chr_connect()

page <- con %>% Page.enable() # active the events

google_loaded <- page %>%
  Page.navigate(url="http://google.fr") %>% 
  then(onFulfilled = function(value) {promise(function(resolve, reject) {
    ws <- value$cnx$ws
    ws$onMessage(function(event) {
      data <- jsonlite::fromJSON(event$data)
      if (!is.null(data$method) & !is.null(data$params$frameId))
        if (data$method == "Page.frameStoppedLoading" & data$params$frameId == value$result$frameId)
          resolve(list(cnx = value$cnx, result = data$params))
    })
  })})

So, we need to implement high level functions in order to register callbacks on events.
For instance, we could do that:

Page.navigate(url="htttp://google.fr") %>%
  Page.frameStoppedLoading(frameId = ~frameId) # or frameId = ~ result$frameId

I wonder what would be the best API?

Fix ids generation for commands

When I run the example in the README:

r_project <- 
  chrome %>% 
  Page.enable() %>%
  Page.navigate(url = "https://www.r-project.org/")

The same id is used for the Page.enable and for the Page.navigate commands.

Use of Network.setRequestInterception

Hi,

If I want to check a page and get some informations about network operations, I can do something like the following :

url <- "https://www.r-project.org/"

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.responseReceived() %...T>% {
      print("received")
      print(.$result$response$url)
    }
)

However, I'd like to use Network.setRequestInterception to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
    Network.requestIntercepted() %...T>% {
      print("intercepted")
    }
)

Would you have any idea of what I'm doing wrong ?

Thanks !

do not write work dir for chrome in R work dir?

currently work_dir is generated with a sample name in the current working directory. Is this necessary to be in the R working dir or can we move it ?

I think it would be better to put this folder, either in temporary directory for Rsession if not persistence is needed or in user apps directory (using rappdir

I think about that because after playing a few times with the package I have plenty of folder I need to remove and that are in the middle of other folder.

Temp directory would be nice but it was tried for decapitated and seems to be issue with it. see
https://github.com/hrbrmstr/decapitated#working-around-headless-chrome--os-security-restrictions
We could also borrow decapitated choice to write everything in a special folder in user home.

I can work on something with rappdir otherwise

Clarify generator code with templating like logic ?

I think it is not an heavy dependency and that maybe it could help clarify the generator code.
Clarifying the generation for R file could help for maintenance in the long run.
I'll try to look into it and PR something to see how it looks.

Errors for Browser domain methods

In the current version, the chr_connect() function connects to the websocket entrypoint page found at http://localhost:9222/json, see

crrri/R/chr_connect.R

Lines 241 to 246 in c873002

open_debuggers <- tryCatch(
jsonlite::read_json(sprintf("http://localhost:%s/json", debug_port), simplifyVector = TRUE),
error = function(e) list()
)
address <- open_debuggers$webSocketDebuggerUrl[open_debuggers$type == "page"]

However, methods of the Browser domain can only be sent at the browser websocket entrypoint that can be found at http://localhost:9222/json/version

I don't know whether other domains are concerned.

Change the name of the await() function

I regret the name I gave to the await() function.

In JS, await is a keyword that can only be used inside an async function (an async function is a function that always returns a promise). Using await outside an async function is illegal.
I think that makes sense.

Here, the await() function is a wrapper over later::run_now(). For instance, the httpuv package also wraps later::run_now() with the httpuv::service() function.

I think we should rename the await() function.
The only proposal I have would be hold(promise). We could add a delay argument here: hold(promise, delay = 30).

Implement a verbose option

In the current version, a lot of messages are written to the log. It could be annoying for higher level development. We have to implement a verbose option.

Can't access `.$result$root$nodeId` from `DOM.getDocument()`

The following code aims to dump the DOM of a web page :

url <- "https://www.r-project.org/"
chrome <- chr_connect(bin = "google-chrome", headless = FALSE)
gd <- chrome %>% 
  Page.enable() %>% 
  DOM.enable() %>%
  Page.navigate(url) %>% 
  Page.loadEventFired() %>%
  DOM.getDocument()

gd %>%   
  DOM.getOuterHTML(nodeId = .$result$root$nodeId)

But it fails at DOM.getOuterHTML(nodeId = .$result$root$nodeId) with Error: Invalid parameters(code -32602)

It does work when giving a nodeId directly :

gd %>%   
  DOM.getOuterHTML(nodeId = 1)

And the .$result$root$nodeId value is correct :

gd %...>% {
  print(.$result$root$nodeId)
}

IDEA: Separate EventEmitter Class in an independant package

As discussed before, the eventemitter feature is generic and still non existing in the R ecosystem. We could separate this class to make it portable so that it could live in its own 📦

No timing on this but I open issue to know we thought about it.

Offer examples based on CRI wiki

There are a lot of example in CRI wiki.

The aim is to be able to reproduce them all (more or less) using crrri.

The way to do it is still to determine. Ideas:

  • Another repo like knitr examples
  • A demo folder, from R package structure, and as httr
  • Vignettes - but with a workaround to use promises
  • just a folder in the github repo, ignore by .Rbuildignore

We'll begin with a choice easy to change if needed.

WIP: Rewrite crrri based on EventEmitter - a puppeeter like

This issue is there to follow work based on rewriting crrri to change API toward a more puppeeter-like 📦 .

This follows and relates to #8, #15, #27 .

The first idea is to have a 📦 that do not use promises at all.

The steps are in order :

There is still a choice to make on what features from puppeeter we like included in crri. The puppeeter code is rather complex with a mix between EventEmitter inherited class and use of promises.
This is something to discuss.

add an option to deactivate chrome echo_cmd

I think this is not required in all case and it would allow to deactivate echoing during test.

We are talking about this

tryCatch(processx::process$new(bin, chrome_args, echo_cmd = TRUE),

and not use TRUE but getOption("crrri.echo_cmd", TRUE) or a more generic crrri.verbose option.

Debugme log in double sometimes

Following #10, there seems to be an issue with how !DEBUG line are called.

> chrome <- chr_connect() 
crrri Trying to launch Chrome  
crrri Trying to launch Chrome in headless mode ... +2ms 
Running "C:/Users/chris/Documents/Chrome/chrome-win32/chrome.exe" \
  --no-first-run --headless "--user-data-dir=chrome-data-dir-jtfwycqz" \
  "--remote-debugging-port=9222" --disable-gpu --no-sandbox
crrri +-Chrome succesfully launched  +15ms 
crrri Chrome succesfully launched in headless mode. +1ms 
crrri +-It should be accessible at http://localhost: +0ms 
crrri It should be accessible at http://localhost:9222 +1ms 
crrri Trying to find  +1ms 
crrri Trying to find http://localhost:9222 +1ms 
crrri +-attempt  +0ms 
crrri attempt 1... +1ms 
crrri +- +310ms 
crrri ... http://localhost:9222 found +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost: +1ms 
crrri Retrieving Chrome websocket entrypoint at http://localhost:9222/json ... +1ms 
crrri +-...found websocket entrypoint  +1387ms 
crrri ...found websocket entrypoint ws://localhost:9222/devtools/page/AB2AC45BB31BB240C8B28DCF725F331B +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri Configuring the websocket connexion... +1ms 
crrri +-...websocket connexion configured. +6ms 
crrri ...websocket connexion configured. +0ms 
crrri Connecting R to Chrome... +4ms 
crrri Connecting R to Chrome... +0ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1110ms 
crrri ...R succesfully connected to headless Chrome through DevTools Protocol. +1ms 

some messages are duplicated and I don't know why... Some are printed before the value in backtick is evaluated. It is not really important regarding how the 📦 works but an improvement to look into on the long run. And see if it happens to other people too...

There are other solution to this if debugme as an issue...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.