Git Product home page Git Product logo

spark-kafka-cassandra-applying-lambda-architecture's People

Contributors

aalkilani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-kafka-cassandra-applying-lambda-architecture's Issues

Make sure you port forward port 8988 to the guest

If you are following

https://app.pluralsight.com/player?course=spark-kafka-cassandra-applying-lambda-architecture&author=ahmad-alkilani&name=spark-kafka-cassandra-applying-lambda-architecture-m1&clip=4&mode=live

Make sure that you config you vagrant file to include:
Vagrant.configure("2") do |config|
#include this
config.vm.network "forwarded_port", guest: 8988, host: 8988
end

also after rebooting make sure all the processes are running:

docker ps -a
and if they show exited start them again:
docker ps start zeppelin
repeat for all the docker containers.

Kafka receiver demo

Hi,

When I am running spark streaming kafka receiver demo in Zeppelin, I get an error due to "Logging". I have read that spark 1.6.2 or 1.6.3 don´t support it. (http://stackoverflow.com/questions/38893655/spark-twitter-streaming-exception-org-apache-spark-logging-classnotfound)

java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:81)
... 46 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 59 more

I have imported and inserted in the interpreter the dependences for kafka. Any ideas?

Thanks

Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins

Hi Ahmad, I am not sure if this is the right place to raise issues. I have been following your Lambda Spark course on Pluralsight and I am stuck at the point shown in the title. When I try to execute the statement mentioned,

cd /pluralsight/spark/
./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

I get an Exception and my start-up fails.

18/02/09 09:16:44 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.fs.Path.<init>(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.foreach(RDD.scala:910) at batch.BatchJob$.main(BatchJob.scala:27) at batch.BatchJob.main(BatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558) Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.failExpecting(URI.java:2854) at java.net.URI$Parser.parse(URI.java:3057) at java.net.URI.<init>(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 31 more

I have spent quite a bit of time trying to get to the bottom of it but so far no luck. I have managed to pull the attached logs from the Hadoop webservice running in my VM url here,

http://lambda-pluralsight:8042/node/containerlogs/container_1518176371217_0003_01_000001/vagrant/stderr/?start=0

Logs for container_1518176371217_0003_01_000001.html.pdf

Also I've tried to start the application up in Debug Mode (at say port 5005 or 7777 as I found in some online examples) - but when I try starting my Intellij up in Remote Debug Mode I get a Connection Refused error message.

Any help or pointers would be much appreciated. My email is,
[email protected]
Kind Regards,
Robert.

Error when Create Kafka Receiver on Zeppelin

Hi Ahmad,
When I run Kafka receiver on zeppelin i got an error

My dependencies:
org.apache.kafka:kafka_2.11:0.8.2.1
org.apache.spark:spark-streaming-kafka-assembly_2.11:1.6.3
com.twitter:algebird-core_2.11:0.12.3

Error Message:
java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91)

I already try to make dependeny version same like video but still same result
I already try vagrant reload and vagrant reload --provision and still same result

Unable to login into VM

Hi, whenever I am trying to login into the VM using vagrant ssh it asks for password and throws the following error.

$ vagrant ssh Warning: Identity file C:/Users/Pramod not accessible: No such file or directory. [email protected]'s password: bash: Sripada/.vagrant.d/boxes/aalkilani-VAGRANTSLASH-spark-kafka-cassandra-applying-lambda-architecture/0.0.6/virtualbox/vagrant_private_key: No such file or directory

Did anyone face this issue, or anyone know the password of the machine and what could have been the problem.

Error executing BatchJob and save data to Cassandra

Hi,

I have problem when zeppelin execute batch job and save to cassandra. Module 'Persisting with Cassandra' in clip 'Spark Batch Cassandra Views: Demo'. When I run the code the result always error but not print the error log. I already restart docker and reload vagrant but still same.

I think it's a depedency problem.

My Depedencies :
com.twitter:algebird-core_2.11:0.12.3
org.apache.kafka:kafka_2.11:0.8.2.1
org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2
com.datastax.spark:spark-cassandra-connector_2.11:1.6.1
com.datastax.cassandra:cassandra-driver-core:3.0.1

Does anyone have a solution?
Thanks!

An established connection was aborted by the software in your host machine

Do you have any ideas on that issue?

Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: An established connection was aborted by the software in your host machine; Host Details : local host is: "SurfacePro4/192.168.64.42"; destination host is: "lambda-pluralsight":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:73)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329)
at batch.BatchJob$.main(BatchJob.scala:65)
at batch.BatchJob.main(BatchJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.IOException: An established connection was aborted by the software in your host machine
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:457)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

Vagrant up throws error

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' could not be found. Attempting to find and install...
    default: Box Provider: virtualbox
    default: Box Version: >= 0
The box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
`vagrant login`. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/aalkilani/spark-kafka-cassandra-applying-lambda-architecture"]
Error:

Environment:
OS X 10.12.1
$ vagrant --version
Vagrant 1.8.7

spark-kafka-cassandra-applying-lambda-architecture IO issue for Parquet

Hi Ahmad,

I've got an IO issue for your second example in Intelij when I try to save the result on HDFS in parquet format :
activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs://lambda-pluralsight:9000/lambda/batch1")

here is the exception :

17/03/28 10:02:36 INFO FileInputFormat: Total input paths to process : 1
Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: Connection reset by peer; Host Details : local host is: "MacBook-Air-de-Nicolas.local/192.168.5.100"; destination host is: "lambda-pluralsight":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:73)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329)
at batch.BatchJob$.main(BatchJob.scala:64)
at batch.BatchJob.main(BatchJob.scala)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:457)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

I've looked on the box to see if the lambda-pluralsight is well defined and hadoop is responding on port 9000. Do you have an idea were I'm wrong please. Thanks for your help.

Error launching spark-submit

Hi,

I just downloaded the last version of the box (with the fixes for the nodemanager service and env variables for java and hadoop home) but I found the exception below when launching the job with spark-submit.

I tried copying the 1.6.1 file manually but it got another exception then (Classnotfound, seems related to the scala version) so I assume the correct one is spark-assembly-1.6.3-hadoop2.7.0.jar

I'm using macos 10.12.1.

Thanks

Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://lambda-pluralsight:9000/spark/spark-assembly-1.6.1-hadoop2.6.0.jar
at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:134)
at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467)
at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193)
at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2189)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2195)
at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:601)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:327)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:407)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:446)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:444)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:444)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Applying the Lambda Architecture with Spark, Kafka, and Cassandra

cygwin_error_cmd.docx
In this video, author is : Ahmad Alkilani. asked to execute command for cloning repository.

git clone https://github.com/aalkilani/spark-kafta-cassandra-applying-lambda-architecture.git

Error came as below after entering my email address & password:

Cloning into 'spark-kafta-cassandra-applying-lambda-architecture'...
remote: Repository not found.
fatal: repository 'https://github.com/aalkilani/spark-kafta-cassandra-applying-lambda-architecture.git/' not found

Please help on this ASAP. Attached error screenshot.

Error running batch job with spark-submit

Hi,

I failed to run the batch job on yarn in demo "Saving to HDFS and Executing on YARN", here's the error log.

16/12/01 11:56:16 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1480592911767_0001 failed 2 times due to AM Container for appattempt_1480592911767_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://lambda-pluralsight:8088/cluster/app/application_1480592911767_0001Then, click on links to logs of each attempt.

Diagnostics: Exception from container-launch.

Container id: container_1480592911767_0001_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

Failing this attempt. Failing the application.

ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1480593276668
final status: FAILED
tracking URL: http://lambda-pluralsight:8088/cluster/app/application_1480592911767_0001
user: vagrant

16/12/01 11:56:16 WARN yarn.Client: Failed to cleanup staging dir .sparkStaging/application_1480592911767_0001

java.net.ConnectException: Call From lambda-pluralsight/127.0.0.1 to lambda-pluralsight:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2113)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:167)
at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:977)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1031)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
at org.apache.hadoop.ipc.Client.call(Client.java:1446)
... 31 more

Exception in thread "main" org.apache.spark.SparkException: Application application_1480592911767_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/01 11:56:17 INFO util.ShutdownHookManager: Shutdown hook called
16/12/01 11:56:17 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9fe8fe45-851a-41ad-8409-3daf17e08a5d

And the log shows

java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

Thanks
Gary

vagrant up is not working-(Errno::ECONNABORTED) -`readpartial': An established connection was aborted by the software in your host machine.

Vagrant up is not working, Any one can please help.

==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Connection reset. Retrying...
**C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:54:in readpartial': An established connection was aborted by the software in your host machine. (Errno::ECONNABORTED)** from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:54:in block (2 levels) in negotiate!'
from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:52:in loop' from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:52:in block in negotiate!'
from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:50:in loop' from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:50:in negotiate!'
from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/server_version.rb:32:in initialize' from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/session.rb:84:in new'
from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh/transport/session.rb:84:in initialize' from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh.rb:233:in new'
from C:/devtools/Vagrant/embedded/gems/gems/net-ssh-4.1.0/lib/net/ssh.rb:233:in start' from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/communicators/ssh/communicator.rb:397:in block (2 levels) in connect'
from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:88:in block in timeout' from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in block in catch'
from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch' from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch'
from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:103:in timeout' from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/communicators/ssh/communicator.rb:371:in block in connect'
from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/util/retryable.rb:17:in retryable' from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/communicators/ssh/communicator.rb:370:in connect'
from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/communicators/ssh/communicator.rb:68:in block in wait_for_ready' from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:88:in block in timeout'
from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in block in catch' from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch'
from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catch' from C:/devtools/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:103:in timeout'
from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/plugins/communicators/ssh/communicator.rb:46:in wait_for_ready' from C:/devtools/Vagrant/embedded/gems/gems/vagrant-1.9.4/lib/vagrant/action/builtin/wait_for_communicator.rb:16:in block in call'

Chapter 5 - state management using... - Zeppelin error - missing parameter type

Hi there,
I am trying to run the example shown in Chapter 5 - "state management using...." and have pasted the code I am running below. When I try to execute the paragraph in Zeppelin I get the following error,

inputPath: String = file:///vagrant/input
textDStream: org.apache.spark.streaming.dstream.DStream[String] = org.apache.spark.streaming.dstream.MappedDStream@7b181d30
activityStream: org.apache.spark.streaming.dstream.DStream[Activity] = org.apache.spark.streaming.dstream.TransformedDStream@7d753e31
<console>:60: error: missing parameter type
            .map { r => ((r.getString(0), r.getLong(1)),
                   ^

I cant see the issue and have spent some time looking at it and double checking that I copied the example from the lesson correctly. Any points/help would be appreciated.

Code entered into the Paragraph

val inputPath = "file:///vagrant/input"
val textDStream = ssc.textFileStream(inputPath)
val activityStream = textDStream.transform( input => {
input.flatMap { line =>
val record = line.split("\t")
val MS_IN_MIN = 1000 * 60
if (record.length == 7)
Some(Activity(record(0).toLong / MS_IN_MIN * MS_IN_MIN, record(1), record(2), record(3), record(4), record(5), record(6)))
else
None
}
} )

activityStream.transform( rdd => {
val df = rdd.toDF()
df.registerTempTable("activity")
val activityByProduct = sqlContext.sql("""SELECT
product,
timestamp_hour,
sum(case when action = 'purchase' then 1 else 0 end) as purchase_count,
sum(case when action = 'add_to_cart' then 1 else 0 end) as add_to_cart_count,
sum(case when action = 'page_view' then 1 else 0 end) as page_view_count
from activity
group by product, timestamp_hour """)

activityByProduct
        .map { r => ((r.getString(0), r.getLong(1)),
          ActivityByProduct(r.getString(0), r.getLong(1), r.getLong(2), r.getLong(3), r.getLong(4))
        ) }

} ).updateStateByKey((newItemsPerKey: Seq[ActivityByProduct], currentState: Option[(Long, Long, Long, Long)]) => {
var (prevTimestamp, purchase_count, add_to_cart_count, page_view_count) = currentState.getOrElse((System.currentTimeMillis(), 0L, 0L, 0L))
var result : Option[(Long, Long, Long, Long)] = null

    if (newItemsPerKey.isEmpty){
      if(System.currentTimeMillis() - prevTimestamp > 30000 + 4)
        result = None
      else
        result = Some((prevTimestamp, purchase_count, add_to_cart_count, page_view_count))
    } else {

      newItemsPerKey.foreach(a => {
        purchase_count += a.purchase_count
        add_to_cart_count += a.add_to_cart_count
        page_view_count += a.page_view_count
      })

      result = Some((System.currentTimeMillis(), purchase_count, add_to_cart_count, page_view_count))
    }

    result
  })

statefulActivityByProduct.foreachRDD(rdd => {
rdd.map(item => ActivityByProduct(item._1._1, item._1._2, item._2._1, item._2._2, item._2._3))
.toDF().registerTempTable("statefulActivityByProduct")
})

ssc.start()

stdin is not tty

There's an error message on Mac in vagrant up output, see below.
Machine boots, I can ssh to it, but the error appears each and every time.

==> default: Checking if box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' is up to date...
==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: flag to force provisioning. Provisioners marked to run always will still run.
==> default: Running provisioner: docker-images (shell)...
    default: Running: inline script
**==> default: stdin: is not a tty**
==> default: zookeeper
==> default: spark-1.6.3
==> default: cassandra
==> default: zeppelin
==> default: kafka

vagrant up gives an error

Hi Ahmad,

it seems that vagrant up gives the following error:
default: Box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' could not be found.

Facing a heap issue in BatchJob.scala script

Hello,

I am having a heap memory issue in this part of the course. My only solution was to downgrade to version 1.5 of Spark. But then, I had other errors...

JDK: 1.8
Scala: 2.11.7
Spark: 1.6

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/30 17:06:54 INFO SparkContext: Running Spark version 1.6.0
20/04/30 17:06:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/30 17:06:55 INFO SecurityManager: Changing view acls to: Nolwen.Brosson
20/04/30 17:06:55 INFO SecurityManager: Changing modify acls to: Nolwen.Brosson
20/04/30 17:06:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Nolwen.Brosson); users with modify permissions: Set(Nolwen.Brosson)
20/04/30 17:06:56 INFO Utils: Successfully started service 'sparkDriver' on port 57698.
20/04/30 17:06:57 INFO Slf4jLogger: Slf4jLogger started
20/04/30 17:06:57 INFO Remoting: Starting remoting
20/04/30 17:06:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:57711]
20/04/30 17:06:57 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57711.
20/04/30 17:06:57 INFO SparkEnv: Registering MapOutputTracker
20/04/30 17:06:57 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
    at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
    at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
    at batch.BatchJob$.main(BatchJob.scala:23)
    at batch.BatchJob.main(BatchJob.scala)
20/04/30 17:06:57 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
    at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
    at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
    at batch.BatchJob$.main(BatchJob.scala:23)
    at batch.BatchJob.main(BatchJob.scala)

I already tried different configurations:
JAVA_OPTS=-Xms128m -Xmx512m in my system variables

In Intellij:

  • I have added -Dspark.driver.memory=2g in the VM line in run/edit conf/...

  • I also have set -server -Xss1m -Xmx2048m in JVM options in file/settings/scala compiler

  • Finally, in help/ memory settings/ Maximum Heap size, the value is 2048.

But nothing works... Any ideas how I could do?

Can you confirm that this code should not be run under the virtual machine that we set up at the beginning of the course? Also, can you confirm that a manual instalation of Spark on my computer is not necessary at all?

Error when running Streaming SQL in Zeppelin

This is related to the video "Streaming Aggregations with Zeppelin: Demo"

When i run the following code in Apache Zeppelin

%sql
select from_unixtime(timestamp_hour/1000, "MM-dd HH:mm:00") as timestamphour, purchase_count, add_to_cart_count, page_view_count
from activityByProduct

I receive the following error, or nothing.

java.lang.NoSuchMethodException: org.apache.spark.io.LZ4CompressionCodec.<init>(org.apache.spark.SparkConf)
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getConstructor(Class.java:1825)
	at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:72)
	at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:66)
	at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$$decodeUnsafeRows(SparkPlan.scala:265)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:351)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:350)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:350)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
	at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1925)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1924)
	at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2562)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
	at sun.reflect.GeneratedMethodAccessor114.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:214)
	at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:129)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Any hints on fixing it?

Thanks!

Error when saving to hdfs

Hey,

I wonder why The following exception is thrown when executing the following line. I can browse hdfs from the named node UI http://lambda-pluralsight:50070
activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs://lambda-pluralsight:9000/lambda/batch1")

17/01/28 21:52:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/01/28 21:52:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.2 KB, free 125.8 KB) 17/01/28 21:52:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:10795 (size: 18.2 KB, free: 2.4 GB) 17/01/28 21:52:22 INFO SparkContext: Created broadcast 0 from textFile at BatchJob.scala:36 17/01/28 21:52:25 INFO FileInputFormat: Total input paths to process : 1 Exception in thread "main" java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "asheikh-QAL51/127.0.0.1"; destination host is: "lambda-pluralsight":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:73) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:329) at batch.BatchJob$.main(BatchJob.scala:67) at batch.BatchJob.main(BatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891) 17/01/28 21:52:25 INFO SparkContext: Invoking stop() from shutdown hook 17/01/28 21:52:25 INFO SparkUI: Stopped Spark web UI at http://10.20.6.174:4041 17/01/28 21:52:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/01/28 21:52:25 INFO MemoryStore: MemoryStore cleared 17/01/28 21:52:25 INFO BlockManager: BlockManager stopped 17/01/28 21:52:25 INFO BlockManagerMaster: BlockManagerMaster stopped 17/01/28 21:52:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/01/28 21:52:25 INFO SparkContext: Successfully stopped SparkContext 17/01/28 21:52:25 INFO ShutdownHookManager: Shutdown hook called 17/01/28 21:52:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-e725e6ec-9bf0-4151-8624-9e4e588b532c

Module 2 - Log producer demo

Make sure you also include this line spark-lambda.iml

<?xml version="1.0" encoding="UTF-8"?>
<module type="JAVA_MODULE" version="4">
     //stuf
    //include this-->
    <orderEntry type="library" name="Maven: com.typesafe:config:1.3.0" level="project" />
  </component>
</module>

Kafka exited after vagrant up

I don't know if anyone have the same problem, but for my case, after I run vagrant up, the Kafka is not running and docker ps -a showed the Kafka service exited. I have to manually run "docker restart kafka" to get it running.

Any ideas?

vagrant up gives error "is_port_open.rb:21:in `initialize' ...

Hi

When running "vagrant up" I got the error "is_port_open.rb:21:in `initialize': ... connect(2) for "0.0.0.0" port 10008 (Errno::EADDRNOTAVAIL)"

I got around this problem by adding ', host_ip: "127.0.0.1"' to all the forwarded_port lines in Vagrantfile

For example, the first line is

config.vm.network "forwarded_port", guest: 10008, host: 10008, host_ip: "127.0.0.1"

I think this is a problem with Windows 10 hosts

vagrant up - unable to unpackage box properly error

Hello,

could you please help me get up and running with the virtual box? I run out of ideas how to fix it myself.. When I run vagrant up I get (my host is windows 8.1):

$ vagrant.exe up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture'
default: URL: https://atlas.hashicorp.com/aalkilani/spark-kafka-cassandra-applying-lambda-architecture
==> default: Adding box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' (v0.0.6) for provider: virtualbox
default: Downloading: https://atlas.hashicorp.com/aalkilani/boxes/spark-kafka-cassandra-applying-lambda-architecture/versions/0.0.6/providers/virtualbox.box
default:
The box failed to unpackage properly. Please verify that the box
file you're trying to add is not corrupted and try again. The
output from attempting to unpackage (if any):

x ./box-disk1.vmdk: Write failed
x ./box.ovf
x ./Vagrantfile
x ./vagrant_private_key: Write failed
bsdtar.EXE: Error exit delayed from previous errors.

Chapter 5 - Advanced Streaming Operations: Evaluating Approximation Performance with Zeppelin: Demo

Hi,

I added the dependency "com.twitter:algebird-core_2.11:0.11.0" for using HyperLogLog. After saving that new dependency and switching to the notebook I got an error by running the paragraph:

 import com.twitter.algebird.{HyperLogLogMonoid, HLL}
  import org.apache.spark.streaming.State
  
  // serializable objects through adding 'case'
  case object functions {
      def mapVisitorsStateFunc = (k: (String, Long), v: Option[HLL], state: State[HLL]) => {
        val currentVisitorHLL = state.getOption().getOrElse(new HyperLogLogMonoid(12).zero)
        val newVisitorHLL = v match {
          case Some(visitorHLL) => currentVisitorHLL + visitorHLL
          case None => currentVisitorHLL
        }
        state.update(newVisitorHLL)
        val output = newVisitorHLL.approximateSize.estimate
        output
      }
  }

:28: error: object algebird is not a member of package com.twitter
import com.twitter.algebird.{HyperLogLogMonoid, HLL}

Please help me to solve this problem. Thanks!

VM box fails to start with vagrant up

I followed the steps for a windows 7 OS.. I do have many VM's loaded not using Vagrant for other projects. However then I use vagrant up... it goes through the download process and then tries to spin up the machine... then this.

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'aalkilani/spark-kafka-cassandra-applying-lambda
-architecture'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'aalkilani/spark-kafka-cassandra-applying-lambda-ar
chitecture' is up to date...
==> default: Setting the name of the VM: vagrant_default_1490300145891_1425
==> default: Destroying VM and associated drives...
C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/util/is_port_o
pen.rb:21:in initialize': The requested address is not valid in its context. - connect(2) for "0.0.0.0" port 10008 (Errno::EADDRNOTAVAIL) from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u til/is_port_open.rb:21:in new'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/is_port_open.rb:21:in block in is_port_open?' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:88:in bloc
k in timeout'
from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in bloc k in catch' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catc
h'
from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:32:in catc h' from C:/HashiCorp/Vagrant/embedded/lib/ruby/2.2.0/timeout.rb:103:in tim
eout'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/is_port_open.rb:19:in is_port_open?' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/handle_forwarded_port_collisions.rb:248:in port_check'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/handle_forwarded_port_collisions.rb:121:in []' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/handle_forwarded_port_collisions.rb:121:in block in handle'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/handle_forwarded_port_collisions.rb:257:in block in with_forwarde d_ports' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/handle_forwarded_port_collisions.rb:253:in each'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/handle_forwarded_port_collisions.rb:253:in with_forwarded_ports' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/handle_forwarded_port_collisions.rb:98:in handle'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/handle_forwarded_port_collisions.rb:42:in block in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/e nvironment.rb:567:in lock'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/handle_forwarded_port_collisions.rb:41:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/prepare_forwarded_port_collision_params.rb:30:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/env_set.rb:19:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/provision.rb:80:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/clear_forwarded_ports.rb:15:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/set_name.rb:50:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/clean_machine_folder.rb:17:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/check_accessible.rb:18:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builder.rb:116:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u til/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builder.rb:116:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/runner.rb:66:in block in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u til/busy.rb:19:in busy'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/runner.rb:66:in run' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/call.rb:53:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/box_check_outdated.rb:78:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/config_validate.rb:25:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi ders/virtualbox/action/check_virtualbox.rb:17:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/match_mac_address.rb:19:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/discard_state.rb:15:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/import.rb:74:in import' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi ders/virtualbox/action/import.rb:13:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi ders/virtualbox/action/prepare_clone_snapshot.rb:17:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/prepare_clone.rb:15:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi ders/virtualbox/action/customize.rb:40:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi ders/virtualbox/action/check_accessible.rb:18:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/config_validate.rb:25:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:95:in block in finalize_action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/builtin/handle_box.rb:56:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:95:in block in finalize_action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/warden.rb:34:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builtin/call.rb:53:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/plugins/provi
ders/virtualbox/action/check_virtualbox.rb:17:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/warden.rb:34:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a
ction/builder.rb:116:in call' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in block in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/u
til/busy.rb:19:in busy' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/a ction/runner.rb:66:in run'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/m
achine.rb:225:in action_raw' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/m achine.rb:200:in block in action'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/e
nvironment.rb:567:in lock' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/m achine.rb:186:in call'
from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/m
achine.rb:186:in action' from C:/HashiCorp/Vagrant/embedded/gems/gems/vagrant-1.9.3/lib/vagrant/b atch_action.rb:82:in block (2 levels) in run'

Has anyone else had this issue.. spending a few days trying to debug this..

I can import the VM manually and start it up in Virtualbox... but I'm sure the VM settings wont the configurations required for the course.

Job remains in ACCEPTED state after submitting in spark

Hi,

I have ran the following command on the Virtualbox VM for submitting the batch job in spark as explained in video:

Did the following steps on my Mac OSX:

  1. vagrant ssh
  2. cd /pluralsight/spark ; HADOOP_CONF_DIR is set to "/pluralsight/hadoop_conf"
  3. Ran command "
    ./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar"
  4. state of submitted job remains accepted only

./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar
16/12/13 14:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/13 14:36:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/12/13 14:36:55 INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers
16/12/13 14:36:55 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/12/13 14:36:55 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/12/13 14:36:55 INFO yarn.Client: Setting up container launch context for our AM
16/12/13 14:36:55 INFO yarn.Client: Setting up the launch environment for our AM container
16/12/13 14:36:56 INFO yarn.Client: Preparing resources for our AM container
16/12/13 14:36:56 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/spark/spark-assembly-1.6.1-hadoop2.6.0.jar
16/12/13 14:36:56 INFO yarn.Client: Uploading resource file:/vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar -> hdfs://lambda-pluralsight:9000/user/vagrant/.sparkStaging/application_1481639716980_0001/spark-lambda-1.0-SNAPSHOT-shaded.jar
16/12/13 14:37:00 INFO yarn.Client: Uploading resource file:/tmp/spark-28c1f1e5-d3df-4afc-8311-f966b490c8a0/__spark_conf__75654060557689293.zip -> hdfs://lambda-pluralsight:9000/user/vagrant/.sparkStaging/application_1481639716980_0001/__spark_conf__75654060557689293.zip
16/12/13 14:37:00 INFO spark.SecurityManager: Changing view acls to: vagrant
16/12/13 14:37:00 INFO spark.SecurityManager: Changing modify acls to: vagrant
16/12/13 14:37:00 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vagrant); users with modify permissions: Set(vagrant)
16/12/13 14:37:00 INFO yarn.Client: Submitting application 1 to ResourceManager
16/12/13 14:37:01 INFO impl.YarnClientImpl: Submitted application application_1481639716980_0001
16/12/13 14:37:02 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED)
16/12/13 14:37:02 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1481639821142
final status: UNDEFINED
tracking URL: http://lambda-pluralsight:8088/proxy/application_1481639716980_0001/
user: vagrant
16/12/13 14:37:03 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED)
16/12/13 14:37:04 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED)
16/12/13 14:37:05 INFO yarn.Client: Application report for application_1481639716980_0001 (state: ACCEPTED)

Like this the job remains in ACCEPTED state forever. I had waited for 15 mins max

Also, when I opened the following link -> http://127.0.0.1:8088/cluster/scheduler?openQueues=default I see the below information

Queue State: 	RUNNING
Used Capacity: 	0.0%
Absolute Used Capacity: 	0.0%
Absolute Capacity: 	100.0%
Absolute Max Capacity: 	100.0%
Used Resources: 	<memory:0, vCores:0>
Num Schedulable Applications: 	1
Num Non-Schedulable Applications: 	0
Num Containers: 	0
Max Applications: 	10000
Max Applications Per User: 	10000
Max Application Master Resources: 	<memory:0, vCores:0>
Used Application Master Resources: 	<memory:2048, vCores:1>
Max Application Master Resources Per User: 	<memory:0, vCores:0>
Configured Capacity: 	100.0%
Configured Max Capacity: 	100.0%
Configured Minimum User Limit Percent: 	100%
Configured User Limit Factor: 	1.0
Accessible Node Labels: 	*
Preemption: 	disabled

Not sure what is wrong. Please help me to proceed

~Ashish

The command 'vagrant up' failed.

I am following instructions in pluralsite course.
The command 'vagrant up' failed.
The version of my virtual box manager is 5.2.12
What could be the issue?

image

Error running the Kafka

The Kafka is running in Docker but when I checked the logs, there are some errors.

[2016-11-28 05:03:05,887] INFO Opening socket connection to server lambda-pluralsight/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-11-28 05:03:05,888] WARN Session 0x158a9476ad20001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

What is the problem? May I ignore this problem and follow the course? Or should I reinstall the vagrant?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.