tmalaska Goto Github PK
Name: Ted Malaska
Type: User
Company: Cloudera
Name: Ted Malaska
Type: User
Company: Cloudera
An analysis of adverse drug event data using Hadoop, R, and Gephi
Examples for training
This tool is designed to look through your HDFS folders to ether identify files with no data in them or delete files with no data in them.
Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...
This will run a map only job to determine if the correct number of columns are in each row.
The ability to rebalance on clusters that have HBase by selecting folders to rebalance
Connecting the power of the D3 graphing library to CDH (HDFS, HBase and Impala)
FooBar
A upgrade Extended FairScheduler that takes Sub-Groups into account.
A simple program to put files from a directory into HDFS with the added functionality and defining how that action will happen
This is a FixedLengthInputFormat for Hadoop map reduce.
This is a working example of how to use Flume 1.1.0 to load files into hadoop.
This is a layer on top of the Flume NettyAvroRpcClient that allows for multiple connects to a server.
Blah Blah
Just for Fun do not use in the real world. :)
A teaching example of KMeans implemented with Giraph running on CDH 4.3
A simple example of using Giraph to root nodes in a tree
Advanced common functionality for hadoop
This will contain implementations that will copy records from a table with less regions then the final table.
Reads a HBase table and writes the out as Text, Seq, Avro, or Parquet
This is a simple example to show how a single HBase "get" can retrieve the top N {items,amount} in the order of amount decresing
HBase.MCC (HBase Multi Cluster Client). The goal is to support aways up solutions with HBase through multiple clusters
This is a tool for testing and managing many repeatedly and large bulk loads on HBase
Generation tool that generates DDLs and simple data load scripts.
This is a demo/training application. Used to show how easy it is to do operations like ingestion, aggregation, and change data capture. Using tools like Kafka, Spark Streaming, Flume, Kudu, SolR, HBase, and HDFS
Fast scalable time series database
Kite SDK
This is a single map reduce job that will append a unique sequence number to the front of every row in a source file.
A simple MR job where you can declare the number of mappers and reducer and a sleep time that they will sleep for.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.