revolutionanalytics / rhdfs Goto Github PK
View Code? Open in Web Editor NEWA package that allows R developers to use Hadoop HDFS
A package that allows R developers to use Hadoop HDFS
Hi,
I set up R and Hadoop using cloudera quick start VM CDH 5.3.
R version 3.1.2. VirtualBox Manager 4.3.20 running on MacOSX 10.7.5
I followed the blog
http://www.r-bloggers.com/integration-of-r-rstudio-and-hadoop-in-a-virtualbox-cloudera-demo-vm-on-mac-os-x/
to set up R and Hadoop and turned of MR2/YARN. Instead I Am using MR1.
Everything seems to work fine but the from.dfs function.
I am using the simple example in R:
small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
df <- as.data.frame(from.dfs(out))
from.dfs produces the following error. If you could be of any hep, I'd greatly appreciate it. Thank you very much. -EK
When I use it I get the error:
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/128432
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/422
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/122
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Opened on behalf of @yoonus786
Hi, I am able to run hdfs.init() in master node. But I am getting error when I am running in the slave node.
hdfs.init()
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.IOException: failure to login
Please let me know what will be the problem here.
Thanks,
Yoonus
hi,
I am getting error while loading rhdfs in R.
I added succesfully library(rJava).
And I cannot solve it, Could someone help ? Thanks in advance
library("rhdfs")
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for ‘rhdfs’
I tried to reconf rJava. But it did not change anything.
canil@ubuntu:/$ sudo R CMD javareconf
Java interpreter : /usr/bin/java
Java version : 1.7.0_13
Java home path : /usr/lib/jvm/java-7-oracle/jre
Java compiler : /usr/bin/javac
Java headers gen.: /usr/bin/javah
Java archive tool: /usr/bin/jar
NOTE: Your JVM has a bogus java.library.path system property!
Trying a heuristic via sun.boot.library.path to find jvm library...
Java library path:
JNI linker flags : -L$(JAVA_HOME)/lib/amd64 -L$(JAVA_HOME)/lib/amd64/server -ljvm
JNI cpp flags : -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux
Updating Java configuration in /etc/R
Done.
And I did already set the HADOOP_CMD which is my path tho hadoop.
such as ,
Export HADOOP_CMD=/usr/local/hadoop/bin/hadoop
What should I do ?
I am facing any issue when interacting with HDFS from R shell. RHDFS is properly installed. Does rdhfs support kerberos?
The underlying cluster is using Pivotal HD as the hadoop distribution and its secured using kerberos. The error message is below.
14/01/30 04:47:50 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
14/01/30 04:47:50 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
14/01/30 04:47:50 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
Environment:
HortonWorks 2.1 cluster integrated with Kerberos and Active Directory
R version: 3.1.3
Issue:
I am trying to run a simple MR job using R on a Kerberos enabled Hadoop cluster. The R code is given below:
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.4.0.2.1.5.0-695.jar")
Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf")
library(rhdfs)
library(rmr2)
hdfs.init()
ints = to.dfs(1:100)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
Till this point the mapreduce job runs successfully but when I try to access the results using the following command, an error is thrown:
from.dfs(calc)
The error is "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 8 elements".
The same error is thrown while accessing output of any MR job [wordcount, pi value].
The traceback() function displays the following:
7: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
flush = flush, encoding = encoding, skipNul = skipNul)
6: read.table(textConnection(hdfs("ls", fname, intern = TRUE)),
skip = 1, col.names = c("permissions", "links", "owner",
"group", "size", "date", "time", "path"), stringsAsFactors = FALSE)
5: hdfs.ls(fname)
4: part.list(fname)
3: lapply(src, function(x) system(paste(hadoop.streaming(), "dumptb",
rmr.normalize.path(x), ">>", rmr.normalize.path(dest))))
2: dumptb(part.list(fname), tmp)
1: from.dfs(calc)
Please let me know how to resolve this issue.
Hi Antonio,
I've faced a scenario where I call a mapreduce (rmr) from a Shell script inside a Mapper (This is how Oozie launches a Shell action)
Here is the flow:
Oozie Launcher Job -> Lancher Map only task where shell script (Rscript myscript.r) executes -> StreamJob -> Mapper / Reducers
Sys.setenv(JAVA_HOME="/usr/jdk64/jdk1.7.0_67")
Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.6.0-2800/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/hdp/2.2.6.0-2800/hadoop-mapreduce/hadoop-streaming.jar")
library("rhdfs")
library("rmr2")
hdfs.init()
library(Matrix)
Launcher Job (Job that launches Map only task)
On this map, the following shell script is executed as a system call (Rscript myscript.r) - Here HADOOP_CMD is set correctly, but an error on the mr function is logged (Probably due the Streaming Mapper error when a hdfs function is called from inside the mr function). Launcher (Mapper) log:
Loading required package: methods
Loading required package: rJava
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
Please review your hadoop settings. See help(hadoop.settings)
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 15
Calls: getTags -> mapreduce -> mr
The rmr streaming starts a StreamJob which can have Mappers and Reducers. StreamJob (Mapper) log:
Log Type: stderr
Log Upload Time: Thu Aug 06 19:24:41 -0400 2015
Log Length: 2722
Loading objects:
Loading objects:
backend.parameters
combine
Please review your hadoop settings. See help(hadoop.settings)
combine.file
combine.line
debug
default.input.format
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
tempfile
vectorized.reduce
verbose
work.dir
Loading required package: methods
Loading required package: rJava
Loading required package: rhdfs
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(X[[i]], ...) : can't load rhdfs
Loading required package: rmr2
Loading required package: Matrix
However, calling the myscript.r from command line works fine.
Here is my question:
Should rmr propagate the environment envs in this case, or this should be responsibility of the enviroment to provide the value for the HADOOP_CMD variable?
Hi,
I have many huge csv files(more 20GB) on my hortonworks HDP 2.0.6.0 GA cluster,
I use the following code to read file from HDFS:
Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar")
Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/usr/lib/hadoop/lib/native/")
library(rmr2);
library(rhdfs);
library(lubridate);
hdfs.init();
f = hdfs.file("/etl/rawdata/201202.csv","r",buffersize=104857600);
m = hdfs.read(f);
c = rawToChar(m);
data = read.table(textConnection(c), sep = ",");
When I use dim(data) to verify, it show me as following:
[1] 1523 7
But actually, it should be "134279407" instead of "1523".
I found the value of m show in RStudio is "raw [1:131072] 50 72 69 49 ...", and there is
a thread in hadoop-hdfs-user mailing list(why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?) .
Ref.
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201403.mbox/%3CCAGkDawm2ivCB+rNaMi1CvqpuWbQ6hWeb06YAkPmnOx=8PqbNGQ@mail.gmail.com%3E
Is it a bug of hdfs.read() in rhdfs-1.0.8?
Best Regards,
James Chang
I know this has to do with the the difference between the Java versions during compile and runtime, however I think I have set all the environments variables properly so I don't really know that is still causing this issue.
$ java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
$ javac -version
java 1.7.0_79
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home
$ hadoop version
Hadoop 2.7.1
> Sys.getenv("JAVA_HOME")
[1] "/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home"
> library(rhdfs)
Loading required package: rJava
HADOOP_CMD=/usr/local/Cellar/hadoop/2.7.1/bin/hadoop
Be sure to run hdfs.init()
Warning message:
package ‘rJava’ was built under R version 3.1.3
> hdfs.init()
Error in .jnew("org/apache/hadoop/conf/Configuration") :
java.lang.UnsupportedClassVersionError: org/apache/hadoop/conf/Configuration : Unsupported major.minor version 51.0
Also I set the $JAVA_HOME in Hadoop's hadoop-env.sh
to 1.7.0 as well
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home
I would really appreciate if someone can point out what's going on here.
Hi,
I have run the kmeans.R and the job finished successfully.
But the outpu file cannot be read or reached. It always say to me it is not permitted. I tried even sudo .
The output is in /tmp/Rtmpm1Mc34/filef281df27152.
How can I see the output ? from R or from terminal ?
and secondly how Can I set the ouput file in the hdfs . I mean, when I do hdfs.ls(".") it should be listed.
13/02/24 21:57:18 INFO streaming.StreamJob: Job complete: job_201302242120_0006
13/02/24 21:57:18 INFO streaming.StreamJob: Output: /tmp/Rtmpm1Mc3/filef281df27152
Thanks in advance
(I hope this time the place is OK. )
library(rhdfs, quietly=TRUE)
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
[186946@01HW524744 hadd]$ sudo R CMD INSTALL rhdfs_1.0.8.tar.gz
when I try[186946@01HW524744 hadd]$ sudo -E R CMD INSTALL rhdfs_1.0.8.tar.gz
HADOOP_CMD is set in .bashrc
[186946@01HW524744 hadd]$ echo $HADOOP_CMD
Please help
I cannot read the output of a mapreduce job.
The code:
data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
print(res())
[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"
And then....
from.dfs(res)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
...
...
Finally,
hdfs.ls("/tmp/Rtmpr5Xv1g/file34916a6426bf")
permission owner group size modtime
1 -rw------- daniel supergroup 0 2013-05-13 18:24
2 drwxrwxrwt daniel supergroup 0 2013-05-13 18:23
3 -rw------- daniel supergroup 448 2013-05-13 18:24
4 -rw------- daniel supergroup 122 2013-05-13 18:23
file
1 /tmp/Rtmpr5Xv1g/file34916a6426bf/_SUCCESS
2 /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs
3 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00000
4 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00001
I note that /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs is a directory
Why does the program search the file "_logs" when it is a directory??????
Thanks in advance
Alfonso
Hello I could help with this error does not take me the data of the mapreduce and I mark error some idea of which may be the fault
This is the code I'm using
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-7-openjdk-i386")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar")
library('rmr2')
data=to.dfs(1:10)
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
from.dfs(res)
This is what appears to me on the console
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-7-openjdk-i386")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar")
library('rmr2')
data=to.dfs(1:10)
OpenJDK Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
17/04/10 18:12:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/10 18:12:46 INFO compress.CodecPool: Got brand-new compressor [.deflate]
res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v))
OpenJDK Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
17/04/10 18:12:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/10 18:12:50 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [/tmp/hadoop-unjar5133688999707817678/] [] /tmp/streamjob6112579330814301418.jar tmpDir=null
17/04/10 18:12:51 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.24:8050
17/04/10 18:12:52 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.24:8050
17/04/10 18:12:53 INFO mapred.FileInputFormat: Total input paths to process : 1
17/04/10 18:12:53 INFO mapreduce.JobSubmitter: number of splits:2
17/04/10 18:12:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491851865431_0014
17/04/10 18:12:54 INFO impl.YarnClientImpl: Submitted application application_1491851865431_0014
17/04/10 18:12:54 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1491851865431_0014/
17/04/10 18:12:54 INFO mapreduce.Job: Running job: job_1491851865431_0014
17/04/10 18:13:02 INFO mapreduce.Job: Job job_1491851865431_0014 running in uber mode : false
17/04/10 18:13:02 INFO mapreduce.Job: map 0% reduce 0%
17/04/10 18:13:10 INFO mapreduce.Job: map 50% reduce 0%
17/04/10 18:13:11 INFO mapreduce.Job: map 100% reduce 0%
17/04/10 18:13:12 INFO mapreduce.Job: Job job_1491851865431_0014 completed successfully
17/04/10 18:13:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=220440
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=973
HDFS: Number of bytes written=244
HDFS: Number of read operations=14
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=13405
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13405
Total vcore-seconds taken by all map tasks=13405
Total megabyte-seconds taken by all map tasks=13726720
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=180
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=124
CPU time spent (ms)=2450
Physical memory (bytes) snapshot=296972288
Virtual memory (bytes) snapshot=1424506880
Total committed heap usage (bytes)=217579520
File Input Format Counters
Bytes Read=793
File Output Format Counters
Bytes Written=244
17/04/10 18:13:13 INFO streaming.StreamJob: Output directory: /tmp/file649a194b8a0e
from.dfs(res)
OpenJDK Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
17/04/10 18:13:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/10 18:13:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
Line 1 does not have 8 elements
From the wiki:
model <- 3
modelfilename <- "my_smart_unique_name"
modelfile <- hdfs.file(modelfilename, "w")
hdfs.write(model, modelfile)
[1] TRUE
hdfs.close(modelfile)
Error in fh$sync : no field, method or inner class called 'sync'
Is 2.0 support in the works?
hdfs.init()
modelfilename <- "<PATH_TO_DIRECTORY>"
modelfile = hdfs.read(modelfilename, "r")
m <- hdfs.read(modelfile)
head(m)
Console :
> modelfilename <- "<PATH_TO_DIRECTORY>"
> modelfile = hdfs.read(modelfilename, "r")
Error in con$fh : $ operator is invalid for atomic vectors
> m <- hdfs.read(modelfile)
> head(m)
[1] 06 f7 9f 04 50 28
>
Access to data is OK but an error on "con$fh"
Hi,
sudo R CMD INSTALL /data/tarfiles/rhdfs_1.0.8.tar.gz
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: loading failed
I tried to open R and run "Sys.getenv("HADOOP_CMD")". It does give me the right HADOOP HOME. So the error message is really weird for me.
Can anybody help?
Max
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.