Git Product home page Git Product logo

Comments (5)

skanda83 avatar skanda83 commented on August 17, 2024 4

Jeff,

I was able to make secor push files to HDFS just by making the following changes:

  1. Add hadoop-client dependency to pom.xml
  2. Use secor.s3.filesystem=hdfs
  3. Use secor.s3.bucket=namenode-host:8020/dir_path

from secor.

HenryCaiHaiying avatar HenryCaiHaiying commented on August 17, 2024

This is very legitimate use case. I think adding the support for HDFS ingestion is not hard at all.

In Uploader.java, currently this is hard-coded as:
String s3Prefix = "s3n://" + mConfig.getS3Bucket() + "/" + mConfig.getS3Path();

If you make the destination filesystem a configurable parameter, that probably would just work because underlying we are using HDFS's FileSystem.moveFromLocalFile method to do the S3 file move anyway.

from secor.

banks avatar banks commented on August 17, 2024

@SEJeff out of interest, any reason you can't just use one of the other tools like linkedin's Camus (or the newer Gobblin: https://github.com/linkedin/gobblin)?

Not saying it's an invalid request for Secor to support that as an option -- there is more to like about Secor than just lack of Hadoop dependency for sure but it seems like there are already viable alternatives if you have HDFS target.

See "Hadoop Integration" here: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

from secor.

SEJeff avatar SEJeff commented on August 17, 2024

Yeah Camus and Gobblin have a lot more requirements and aren't as straightforward as secor. Also, Camus expects you to write your decoding logic and whatnot in Java. Not that it isn't entirely doable, but isn't preferable for my use case. I'm honestly unimpressed with the confluent tools for this.

Of the various kafka mirroring tools, Secor, to me, is one of the better ones from a design standpoint. I would like to be able to use it in an environment that doesn't use any cloud services whatsoever. We do have a large-ish hdfs install and I was hoping we could use that.

If you fundamentally disagree, feel free to close this issue. It would be really nice to use Secor vs other solutions however.

from secor.

HenryCaiHaiying avatar HenryCaiHaiying commented on August 17, 2024

Jeff,

If you want to explore, you can definitely add a HDFSUploader for secor.
We use Hadoop's FileSystem object to work with sequence file uploading, it
should be quite straightforward to have the local sequence file uploading
to HDFS.

On Mon, Nov 23, 2015 at 10:24 AM, Jeff Schroeder [email protected]
wrote:

Yeah Camus and Gobblin have a lot more requirements and aren't as
straightforward as secor. Also, Camus expects you to write your decoding
logic and whatnot in Java. Not that it isn't entirely doable, but isn't
preferable for my use case. I'm honestly unimpressed with the confluent
tools for this.

Of the various kafka mirroring tools, Secor, to me, is one of the better
ones from a design standpoint. I would like to be able to use it in an
environment that doesn't use any cloud services whatsoever. We do have a
large-ish hdfs install and I was hoping we could use that.

If you fundamentally disagree, feel free to close this issue. It would be
really nice to use Secor vs other solutions however.


Reply to this email directly or view it on GitHub
#129 (comment).

from secor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.