Git Product home page Git Product logo

Comments (5)

zackdever avatar zackdever commented on June 30, 2024

We handle this by setting secor.message.parser.class in the configs to a custom parser we wrote. As I understand it, the preference is for really generic parsers to be included in Secor, but it might be nice if there were some place outside of Secor to collect everyone's more specific parsers.

from secor.

HenryCaiHaiying avatar HenryCaiHaiying commented on June 30, 2024

Secor's design philosophy is to have a simple data ingestion pipeline to
get the kafka data into S3 as fast as possible and act as the source of the
truth for kafka data on S3 The more data transformation you add to secor
will add the delay and possibly introduce more points for errors. Once the
data is on S3, you have a variety of options to write data transformation
logics to post process the data.

On Fri, Aug 14, 2015 at 12:25 PM, Zack Dever [email protected]
wrote:

We handle this by setting secor.message.parser.class in the configs to a
custom parser we wrote. As I understand it, the preference is for really
generic parsers to be included in Secor, but it might be nice if there were
some place outside of Secor to collect everyone's more specific parsers.


Reply to this email directly or view it on GitHub
#119 (comment).

from secor.

pgarbacki avatar pgarbacki commented on June 30, 2024

+1 to what @HenryCaiHaiying wrote. There are better tools to do stream processing such as Storm or Samza.

@zackdever back in the day I created secor-contrib repo. I think it is a good place for less generic parsers. https://github.com/pinterest/secor-contrib

from secor.

ashubhumca avatar ashubhumca commented on June 30, 2024

Thanks to all of you for your replies. I agree with all the points which you have mentioned.
The code snippet which I've added for my custom requirement won't be any issue for the overall design of Secor. I understand this is a general tool for data ingestion but some times there might be very basic transformations like projection, formatting etc. kind of requirements possible which won't be very CPU intensive. And by default there won't be any transformation until unless user is specifying something to do. Users should be aware of what kind of logic they want to apply for transformation for performance perspective. This is just an extension in the features of Secor.

Here is the basic idea:

One transformation interface:

 package com.pinterest.secor.transformer;
 public interface MessageTransformer {

      public byte[] transform(byte[] message);

 }

Then default transformation class:

 public class DefaultMessageTransformer implements MessageTransformer {

protected SecorConfig mConfig;

public DefaultMessageTransformer(SecorConfig config) {
    mConfig = config;
}

@Override
public byte[] transform(byte[] message) {
    return message;
}

 }

which will not be doing anything and will be available in all the config properties by default. And for any custom transformation user will have to plugin their transformation class.
Please share your thoughts.

Thanks and Regards,
Ashish

from secor.

ashubhumca avatar ashubhumca commented on June 30, 2024

Support has been added.

from secor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.