Git Product home page Git Product logo

cloudera-parcel's Introduction

Hi, I'm @chezou 👋

I'm a Staff Software Engineer at Treasure Data.

I'm curious about Machine Learning and MLOps.

See detail at: https://chezo.uno/

I'm an O'Reilly author about ML and MLOps.

📗 My books:

I'm also maintaining data related open sources, especially, tabula-py, which extracts tables from PDF in Python.

chezou's GitHub stats

cloudera-parcel's People

Contributors

chezou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cloudera-parcel's Issues

Error with `broom::tidy`

I tried to run broom::tidy with CDSW, but it didn't work, because of lack of libicui18n.so.55. It works on Docker container and conda when I create the parcel.

sc <- spark_connect(master = "yarn-client", config = config)

sdf_len(sc, 5, repartition = 1) %>%
  spark_apply(function(e) I(e))
    
iris_tbl <- sdf_copy_to(sc, iris)

spark_apply(
  iris_tbl,
  function(e) broom::tidy(lm(Petal_Length ~ Petal_Width, e)),
  names = c("term", "estimate", "std.error", "statistic", "p.value"),
  group_by = "Species")
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is waiting using lock for RScript to complete
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is starting rscript
17/08/18 01:32:47 INFO sparklyr: Gateway (9751) is waiting for sparklyr client to connect to port 8880
17/08/18 01:32:47 INFO sparklyr: Worker (9751) using source file /data1/yarn/nm/usercache/clouderanA/appcache/application_1500605980576_8118/container_1500605980576_8118_01_000002/tmp/sparkworker/7f429444-529f-4110-836f-931d0966a220/sparkworker.R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) launching command /opt/cloudera/parcels/CONDAR/lib/conda-R/bin/Rscript --vanilla <source-file> 9751 FALSE;8880;localhost
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var RHOME and value /opt/cloudera/parcels/CONDAR/lib/conda-R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_INCLUDE_DIR and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R/include
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_HOME and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_SHARE_DIR and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R/share
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is starting R process
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is starting 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connecting to backend using port 8880 
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) accepted connection
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is waiting for sparklyr client to connect to port 8880
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is querying ports from backend using port 8880 
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) received command 0
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) found requested session matches current session
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is creating backend and allocating system resources
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) created the backend
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is waiting for r process to end
17/08/18 01:32:48 INFO sparklyr: RScript (9751) found redirect gateway port 8880 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected to backend 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connecting to backend session 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected to backend session 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) created connection 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved worker context id 4 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved worker context 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) using bundle /tmp/RtmpM4zrrj/packages.tar 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) updated .libPaths with bundle packages 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) working over grouped data 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) found 3 rows 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved 3 rows 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) computing closure 
17/08/18 01:32:49 ERROR sparklyr: RScript (9751) list(message = "unable to load shared object '/data1/yarn/nm/usercache/clouderanA/appcache/application_1500605980576_8118/container_1500605980576_8118_01_000002/sparklyr-bundle/stringi/libs/stringi.so':\n  libicui18n.so.55: cannot open shared object file: No such file or directory", call = dyn.load(file, DLLpath = DLLpath, ...)) 
17/08/18 01:32:49 ERROR sparklyr: RScript (9751) collected callstack: 
18: stop(e)
17: value[[3L]](cond)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
15: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatch(loadNamespace(name), error = function(e) stop(e))
13: getNamespace(ns) 
17/08/18 01:32:49 INFO sparklyr: Gateway (9751) is terminating backend
17/08/18 01:32:49 INFO sparklyr: Worker (9751) completed wait using lock for RScript
17/08/18 01:32:49 INFO sparklyr: Gateway (9751) is shutting down with expected SocketException

Support for Apache Arrow

Work to support Apache Arrow in sparklyr is on its way, see https://arrow.apache.org/blog/2019/01/25/r-spark-improvements/.

Therefore, it would be ideal to support Arrow in this parcel as well with something like:

sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools.repo
sudo yum install -y --enablerepo=red-data-tools arrow-devel

Ideally, using the following mirror that will be update only to the latest supported version of sparklyr:

sudo yum install -y https://arrowlib.rstudio.com/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools.repo
sudo yum install -y --enablerepo=red-data-tools arrow-devel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.