Git Product home page Git Product logo

kite-examples's People

Contributors

epishkin avatar esammer avatar markgrover avatar rdblue avatar tomwheeler avatar tomwhite avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kite-examples's Issues

Question about kite.repo.uri

Hi everyone

I want to adapt the json example provided, but I got this error:

15/05/20 08:09:47 INFO conf.FlumeConfiguration: Processing:UFOKiteDS
15/05/20 08:09:47 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [UFOAgent]
15/05/20 08:09:47 INFO node.AbstractConfigurationProvider: Creating channels
15/05/20 08:09:47 INFO channel.DefaultChannelFactory: Creating instance of channel archivo type file
15/05/20 08:09:47 INFO node.AbstractConfigurationProvider: Created channel archivo
15/05/20 08:09:47 INFO source.DefaultSourceFactory: Creating instance of source UFODir, type spooldir
15/05/20 08:09:47 INFO interceptor.StaticInterceptor: Creating StaticInterceptor: preserveExisting=true,key=flume.avro.schema.url,value=file:/home/itam/schemas/ufos.avsc
15/05/20 08:09:47 INFO api.MorphlineContext: Importing commands
15/05/20 08:09:52 INFO api.MorphlineContext: Done importing commands
15/05/20 08:09:52 INFO sink.DefaultSinkFactory: Creating instance of sink: UFOKiteDS, type: org.apache.flume.sink.kite.DatasetSink
15/05/20 08:09:52 ERROR node.AbstractConfigurationProvider: Sink UFOKiteDS has been removed due to an error during configuration
java.lang.IllegalArgumentException
        at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at org.kitesdk.data.URIBuilder.<init>(URIBuilder.java:106)
        at org.kitesdk.data.URIBuilder.<init>(URIBuilder.java:90)
        at org.apache.flume.sink.kite.DatasetSink.configure(DatasetSink.java:188)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
15/05/20 08:09:52 INFO node.AbstractConfigurationProvider: Channel archivo connected to [UFODir]
15/05/20 08:09:52 INFO node.Application: Starting new configuration:{ sourceRunners:{UFODir=EventDrivenSourceRunner: { source:Spool Directory source UFODir: { spoolDir: /opt/ufos } }} sinkRunners:{} channels:{arch
ivo=FileChannel archivo { dataDirs: [/opt/ufos/log/data] }} }
15/05/20 08:09:52 INFO node.Application: Starting Channel archivo
15/05/20 08:09:52 INFO file.FileChannel: Starting FileChannel archivo { dataDirs: [/opt/ufos/log/data] }...
15/05/20 08:09:52 INFO file.Log: Encryption is not enabled

I ran the flume-agent with:

flume-ng agent -n UFOAgent -Xmx100m --conf ingestion -f ingestion/spooldir_example.conf

The spooldir_example.conf is


# Componentes
UFOAgent.sources = UFODir
UFOAgent.channels = archivo
UFOAgent.sinks = UFOKiteDS

# Canal
UFOAgent.channels.archivo.type = file
UFOAgent.channels.archivo.checkpointDir = /opt/ufos/log/checkpoint/
UFOAgent.channels.archivo.dataDirs = /opt/ufos/log/data/

# Fuente
UFOAgent.sources.UFODir.type = spooldir
UFOAgent.sources.UFODir.channels = archivo
UFOAgent.sources.UFODir.spoolDir = /opt/ufos
UFOAgent.sources.UFODir.fileHeader = true
UFOAgent.sources.UFODir.deletePolicy = immediate

# Interceptor
UFOAgent.sources.UFODir.interceptors = attach-schema morphline

UFOAgent.sources.UFODir.interceptors.attach-schema.type = static
UFOAgent.sources.UFODir.interceptors.attach-schema.key = flume.avro.schema.url
UFOAgent.sources.UFODir.interceptors.attach-schema.value = file:/home/itam/schemas/ufos.avsc

UFOAgent.sources.UFODir.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
UFOAgent.sources.UFODir.interceptors.morphline.morphlineFile = /home/itam/ingestion/morphline.conf
UFOAgent.sources.UFODir.interceptors.morphline.morphlineId = convertUFOFileToAvro


# Sumidero
UFOAgent.sinks.UFOKiteDS.type = org.apache.flume.sink.kite.DatasetSink
UFOAgent.sinks.UFOKiteDS.channel = archivo
UFOAgent.sinks.UFOKiteDS.kite.repo.uri = dataset:hive
UFOAgent.sinks.UFOKiteDS.kite.dataset.name = ufos
UFOAgent.sinks.UFOKiteDS.kite.batchSize = 10

I created the dataset as follows:

kite-dataset create ufos --schema /home/itam/schemas/ufos.avsc --format avro

Finally, the morphline.conf is

morphlines: [                                                                                                                                                                                                                                  
  {                                                                                                                                                                                                                                            
    id: convertUFOFileToAvro                                                                                                                                                                                                                   
    importCommands: ["com.cloudera.**", "org.kitesdk.**" ]                                                                                                                                                                                     
    commands: [                                                                                                                                                                                                                                
      { tryRules {                                                                                                                                                                                                                             
    catchExceptions : false                                                                                                                                                                                                                    
    throwExceptionIfAllRulesFailed : true                                                                                                                                                                                                      
    rules : [                                                                                                                                                                                                                                  
   # next rule of tryRules cmd:                                                                                                                                                                                                                
   {                                                                                                                                                                                                                                           
     commands : [                                                                                                                                                                                                                              
     { readCSV: {                                                                                                                                                                                                                              
     separator : "\t"                                                                                                                                                                                                                          
     columns : [Timestamp, City, State, Shape, Duration, Summary, Posted]                                                                                                                                                                      
     trim: true                                                                                                                                                                                                                                
     charset : UTF-8                                                                                                                                                                                                                           
     quoteChar : "\""                                                                                                                                                                                                                          
     }                                                                                                                                                                                                                                         
    }                                                                                                                                                                                                                                          

    {                                                                                                                                                                                                                                          
     toAvro {                                                                                                                                                                                                                                  
      schemaFile: /home/itam/schemas/ufos.avsc                                                                                                                                                                                                 
     }                                                                                                                                                                                                                                         

    }                                                                                                                                                                                                                                          
    {                                                                                                                                                                                                                                          
     writeAvroToByteArray: {                                                                                                                                                                                                                   

      format: containerlessBinary                                                                                                                                                                                                              

     }                                                                                                                                                                                                                                         

    }                                                                                                                                                                                                                                          
   ]                                                                                                                                                                                                                                           
   }                                                                                                                                                                                                                                           
   # next rule of tryRules cmd:                                                                                                                                                                                                                
   {                                                                                                                                                                                                                                           
     commands : [                                                                                                                                                                                                                              
    { dropRecord {} }                                                                                                                                                                                                                          
     ]                                                                                                                                                                                                                                         
   }                                                                                                                                                                                                                                           

  ]                                                                                                                                                                                                                                            
    }                                                                                                                                                                                                                                          
  }                                                                                                                                                                                                                                            

  { logTrace { format : "output record: {}", args : ["@{}"] } }                                                                                                                                                                                
    ]                                                                                                                                                                                                                                          
  }                                                                                                                                                                                                                                            


]                                                                                                                                                                                                                                              

What I am doing wrong?

FileNotFoundException when morphilne files in src/main/resources

Hi everybody, I'm trying to write some unit test for my morphlines files and I'm facing a problem related with the location of my morphline files.

The files are under "src/main/resources" folder then when compiled are in "target/classes" folder but the AbstractMorphlineTest class is looking for it on "target/test-classes". I understand that the obvious solution is move my files to "src/test/resources" but I wonder if there is any way to override this setting.

Thanks in advance

P.S.- I also have looked for the AbstractMorphlineTest in order to make some changes and submit a PR but I'm not able to find it in any repo.

can not build datasets with maven

Hi,
I try to implement the demos, but when I would like to create the datasets with the mvn command, it comes to an error:
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] demo ............................................... SUCCESS [ 5.167 s] [INFO] demo-core .......................................... SKIPPED [INFO] demo-crunch ........................................ SKIPPED [INFO] demo-logging-webapp ................................ SKIPPED [INFO] demo-reports-webapp ................................ SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 5.426 s [INFO] Finished at: 2016-06-02T11:06:06+02:00 [INFO] Final Memory: 35M/1370M [INFO] ------------------------------------------------------------------------ Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ShutdownHookManager$2 at org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:52) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager$2 at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:50) at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:271) at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:247) at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:239) ... 2 more

The dataset is not set.

We don't use the cloudera VM, we use cloudera enterprise in cluster mode.

Thanks in advance.
Martin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.