snowplow-archive / snowplow.github.com Goto Github PK
View Code? Open in Web Editor NEWLegacy Snowplow website, switched off 25 April 2017
Home Page: http://snowplowanalytics.com
Legacy Snowplow website, switched off 25 April 2017
Home Page: http://snowplowanalytics.com
Spot trend -> drill down -> identify pattern -> update LookML -> validate
Think of an example we can do with Snowplow data
Alex is going to be speaking at Budapest DW Forum 2014
On Data warehouse day (Thursday 5th), Alex will be speaking about:
With the release of Amazon Kinesis in late 2013, the Snowplow team set themselves the challenge of porting Snowplow's Hadoop-based architecture to Kinesis. Alex Dean from Snowplow will share their experiences porting Snowplow to Kinesis, including: "hero" use cases for event streaming (with a live demo); building a lambda architecture with Kinesis and EMR; moving from a batch to streaming mindset.
Since its inception, the Snowplow open source event analytics platform (https://github.com/snowplow/snowplow) has always been tightly coupled to the batched-based Hadoop ecosystem, and Elastic MapReduce in particular. With the release of Amazon Kinesis in late 2013, we set ourselves the challenge of porting Snowplow to Kinesis, to give our users access to their Snowplow event stream in near-real-time.
With this porting process now complete, Alex Dean, Snowplow Analytics co-founder and technical lead, will share Snowplow's experiences in adopting stream processing as a complementary architecture to Hadoop and batch-based processing. In particular, Alex will explore:
On June 6th, Friday (Training day), Alex will be giving a half-day workshop:
Abstract
Hadoop is everywhere these days, but it can seem like a complex, intimidating ecosystem to those who have yet to jump in. In this hands-on workshop, Alex Dean, co-founder of Snowplow Analytics, will take you "from zero to Hadoop", showing you how to run a variety of simple (but powerful) Hadoop jobs on Elastic MapReduce, Amazon's hosted Hadoop service. Alex will start with a no-nonsense overview of what Hadoop is, explaining its strengths and weaknesses and why it's such a powerful platform for data warehouse practitioners. Then Alex will help get you setup with EMR and Amazon S3, before leading you through a very simple job in Pig, a simple language for writing Hadoop jobs. After this we will move onto writing a more advanced job in Scalding, Twitter's Scala API for writing Hadoop jobs. For our final job, we will consolidate everything we have learnt by building a multi-step job flexing Pig, Scalding and Apache HBase, the Hadoop database.
In detail
Agenda
Introducing Hadoop
Our simple job:
Setting up EMR, S3 and our local client tools
Writing our Pig Latin script
Running and inspecting our results
Scalding:
Introduction to Scalding and Cascading
Writing our Scalding app
Running and inspecting our results
Putting it all together:
Introduction to HBase
Writing our second Pig Latin script
Updating our Scalding app
Running and inspecting our results
Conclusions & next steps
Maruku is no longer being actively supported.
See https://help.github.com/articles/migrating-your-pages-site-from-maruku for details.
YAML colour scheme is particularly painful
Need to nail down our messaging really tightly, and put up customer testimonials
have to go to wiki to find e.g. the data flowchart
I'm getting this:
vagrant@precise64:/vagrant/snowplow.github.com$ jekyll serve
Configuration file: /vagrant/snowplow.github.com/_config.yml
Source: /vagrant/snowplow.github.com
Destination: /vagrant/snowplow.github.com/_site
Generating... Liquid Exception: incompatible character encodings: ISO-8859-1 and UTF-8 in atom.xml
error: incompatible character encodings: ISO-8859-1 and UTF-8. Use --trace to view backtrace
vagrant@precise64:/vagrant/snowplow.github.com$ jekyll serve --trace
Configuration file: /vagrant/snowplow.github.com/_config.yml
Source: /vagrant/snowplow.github.com
Destination: /vagrant/snowplow.github.com/_site
Generating... Liquid Exception: incompatible character encodings: ISO-8859-1 and UTF-8 in atom.xml
/home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/tags/for.rb:117:in `block (2 levels) in render': incompatible character encodings: ISO-8859-1 and UTF-8 (Encoding::CompatibilityError)
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/tags/for.rb:105:in `each'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/tags/for.rb:105:in `each_with_index'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/tags/for.rb:105:in `block in render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/context.rb:105:in `stack'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/tags/for.rb:104:in `render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/block.rb:106:in `block in render_all'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/block.rb:93:in `each'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/block.rb:93:in `render_all'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/block.rb:82:in `render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/template.rb:124:in `render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/liquid-2.5.5/lib/liquid/template.rb:132:in `render!'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/convertible.rb:88:in `render_liquid'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/convertible.rb:150:in `do_layout'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/page.rb:115:in `render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/site.rb:239:in `block in render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/site.rb:238:in `each'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/site.rb:238:in `render'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/site.rb:39:in `process'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/command.rb:18:in `process_site'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/commands/build.rb:23:in `build'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/lib/jekyll/commands/build.rb:7:in `process'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/jekyll-1.4.3/bin/jekyll:97:in `block (2 levels) in <top (required)>'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/command.rb:180:in `call'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/command.rb:180:in `call'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/command.rb:155:in `run'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/runner.rb:422:in `run_active_command'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/runner.rb:82:in `run!'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/delegates.rb:12:in `run!'
from /home/vagrant/.rvm/gems/ruby-1.9.3-p484@global/gems/commander-4.1.6/lib/commander/import.rb:10:in `block in <top (required)>'
Use it as an opportunity to talk about what we're trying to achieve in general, reiterate what is broken about analytics today and link to relevant blog posts etc.
Describe how to create goals at a session vs visitor level via derived tables
Eric's testimonial (we can clean up the English):
'' Snowplow was a game changer here at Viadeo. In few months we drop Google Analytics and Ominiture to fully rely on Snowplow in production and handle about 10 Millions analytics events per day. We like the versatility of the infrastructure - we use our internal Hadoop Spark infrastructure to create deep BI reports merging analytics data with our back-end business logs. In the same time we love the simplicity and the open-innovation approach offers by Redshift storage: today at Viadeo every single engineer or product manager is able to setup a rich metrics dashboard in few minutes."
Spec for engineer, listing the qualities that we look for. Explain we are building the team in London at the moment, but will look at working with remotes / opening an office in the US in due course
Having a web analytics platform with summarized data, like Google Analytics, can provide immense value to many organizations, but the future of web analytics is not summarized data, it's large amounts of unstructured and highly granular enriched data that can provide even more value in the hands of the right team.
Snowplow provides me complete ownership of my clickstream data with total flexibility at an affordable cost. I also see it as an insurance policy for the future.
Snowplow democratized clickstream data
Technology partners
Implementation partners
Supporting organisations
Need to port existing model to Scala and then serve via Playapp
Then email Karl Pawlewicz [email protected] about it, so that he can promote it at his end...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.