conductor / kangaroo Goto Github PK
View Code? Open in Web Editor NEWHadoop utilities for Kafka, S3, and more
License: Apache License 2.0
Hadoop utilities for Kafka, S3, and more
License: Apache License 2.0
Hi, I tried to implement the kangaroo in kafka 0.8.2.2, it seems not working.
When I tried to debug it, it cannot find the partitions. But I am sure, I have set it up correctly.
Configuration conf = getConf();
//initializeConfig(conf);
// Create a new job
final Job job = Job.getInstance(conf, "TestMr");
job.setJarByClass(App.class);
// Set the InputFormat
job.setInputFormatClass(KafkaInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(BytesWritable.class);
job.setMapperClass(KafkaMapper.class);
job.setReducerClass(DnaReducer.class);
FileSystem hdfs = FileSystem.get(conf);
job.setNumReduceTasks(1000);
if (hdfs.exists(new Path("out")))
hdfs.delete(new Path("out"), true);
// Set your Zookeeper connection string
KafkaInputFormat.setZkConnect(job, "place_holder_for_kafka_server:2181");
// Set the topic you want to consume
KafkaInputFormat.setZkRoot(job, "/");
KafkaInputFormat.setTopic(job, "topic");
//KafkaInputFormat.setMaxSplitsPerPartition(job, 5);
// Set the consumer group associated with this job
KafkaInputFormat.setConsumerGroup(job, "mygroup");
FileOutputFormat.setOutputPath(job, new Path("out"));
// Set the mapper that will consume the data
// (Optional) Only commit offsets if the job is successful
if (job.waitForCompletion(true)) {
final ZkUtils zk = new ZkUtils(job.getConfiguration());
zk.commit("mygroup", "topic");
zk.close();
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
when I implement the MR, there is no error, and it seems cannot find any partition from kafka.
Kafka 0.8 has new Zookeeper Data Structures and as a result ZKUtils is not working with 0.8+. See: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
Using zookeeper for offset management is deprecated. I'm wondering if there are any thoughts on this or maybe even work already done. If not, I'm willing to give this a go.
Hello,
I would like to use kangaroo in one of my projects.
I was wondering if kangaroo is currently published on a maven repository, is it? If not, is it planned to be?
Thanks!
I have found that Kangaroo is missing offsets and starts reading partition from the beginning when Leader for a partition gets changed.
That happens, because offsets are being stored under /kangaroo-consumers/<consumer_group>/<offsets or offsets-temp>/<topic>/<brokerId>-<partitionId>
.
Since Leader re-election is quite usual situation for Kafka, I propose to remove brokerId
in getBrokerPartition()
from com.conductor.kafka.Partition
:
public String getBrokerPartition() {
return String.format("%d-%d", broker.getId(), partId);
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.