Git Product home page Git Product logo

rudderlabs / rudder-server Goto Github PK

View Code? Open in Web Editor NEW
3.9K 61.0 288.0 310.42 MB

Privacy and Security focused Segment-alternative, in Golang and React

Home Page: https://www.rudderstack.com/

License: Other

Go 85.12% Lua 0.03% Dockerfile 0.03% Shell 0.07% Makefile 0.09% PLpgSQL 0.14% HTML 14.52%
golang hybrid-cloud privacy security warehouse-management data-warehouse rudderstack customer-data customer-data-pipeline customer-data-platform

rudder-server's Introduction

馃摉 Just launched Data Learning Center - Resources on data engineering and data infrastructure

The Customer Data Platform for Developers

WebsiteDocumentationChangelogBlogSlackTwitter


As the leading open source Customer Data Platform (CDP), RudderStack provides data pipelines that make it easy to collect data from every application, website and SaaS platform, then activate it in your warehouse and business tools.

With RudderStack, you can build customer data pipelines that connect your whole customer data stack and then make them smarter by triggering enrichment and activation in customer tools based on analysis in your data warehouse. It's easy-to-use SDKs and event source integrations, Cloud Extract integrations, transformations, and expansive library of destination and warehouse integrations makes building customer data pipelines for both event streaming and cloud-to-warehouse ELT simple.

RudderStack

Try RudderStack Cloud Free - a free tier of RudderStack Cloud. Click here to start building a smarter customer data pipeline today, with RudderStack Cloud.

Key features

  • Warehouse-first: RudderStack treats your data warehouse as a first class citizen among destinations, with advanced features and configurable, near real-time sync.

  • Developer-focused: RudderStack is built API-first. It integrates seamlessly with the tools that the developers already use and love.

  • High Availability: RudderStack comes with at least 99.99% uptime. We have built a sophisticated error handling and retry system that ensures that your data will be delivered even in the event of network partitions or destinations downtime.

  • Privacy and Security: You can collect and store your customer data without sending everything to a third-party vendor. With RudderStack, you get fine-grained control over what data to forward to which analytical tool.

  • Unlimited Events: Event volume-based pricing of most of the commercial systems is broken. With RudderStack Open Source, you can collect as much data as possible without worrying about overrunning your event budgets.

  • Segment API-compatible: RudderStack is fully compatible with the Segment API. So you don't need to change your app if you are using Segment; just integrate the RudderStack SDKs into your app and your events will keep flowing to the destinations (including data warehouses) as before.

  • Production-ready: Companies like Mattermost, IFTTT, Torpedo, Grofers, 1mg, Nana, OnceHub, and dozens of large companies use RudderStack for collecting their events.

  • Seamless Integration: RudderStack currently supports integration with over 90 popular tool and warehouse destinations.

  • User-specified Transformation: RudderStack offers a powerful JavaScript-based event transformation framework which lets you enhance or transform your event data by combining it with your other internal data. Furthermore, as RudderStack runs inside your cloud or on-premise environment, you can easily access your production data to join with the event data.

Get started

The easiest way to experience RudderStack is to sign up for RudderStack Cloud Free - a completely free tier of RudderStack Cloud.

You can also set up RudderStack on your platform of choice with these two easy steps:

Step 1: Set up RudderStack

Note: If you are planning to use RudderStack in production, we STRONGLY recommend using our Kubernetes Helm charts. We update our Docker images with bug fixes much more frequently than our GitHub repo.

Step 2: Verify the installation

Once you have installed RudderStack, send test events to verify the setup.

Architecture

RudderStack is an independent, stand-alone system with a dependency only on the database (PostgreSQL). Its backend is written in Go with a rich UI written in React.js.

A high-level view of RudderStack鈥檚 architecture is shown below:

Architecture

For more details on the various architectural components, refer to our documentation.

Contribute

We would love to see you contribute to RudderStack. Get more information on how to contribute here.

License

RudderStack server is released under the Elastic License 2.0.

rudder-server's People

Contributors

achettyiitr avatar ameypv-rudder avatar arajguha avatar atzoum avatar bipin-rudder avatar bonapartepc avatar chandumlg avatar cisse21 avatar dependabot[bot] avatar devops-github-rudderstack avatar dhawal1248 avatar fracasula avatar fxenik avatar gane5hvarma avatar jayachand avatar kuldeep0020 avatar lokey avatar lvrach avatar psrikanth88 avatar ruchiramoitra avatar saivarunr avatar sanpj2292 avatar saurav-malani avatar shrouti1507 avatar sidddddarth avatar sivashanmukh avatar snarkychef avatar soumyadebm avatar sumanthpuram avatar utsabc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rudder-server's Issues

Screen event name missing in destinations

Hi, when I send a screen event via the HTTP API, the screen name is missing in both Amplitude and Snowflake destinations.

In Amplitude, the following fields are incorrect:

key rudder segment
display_name screenview Viewed <name> Screen
event_type screenview Viewed <name> Screen
event_properties.name <name>

In Snowflake, the Name column in the Screens table is null.

Client IP in destination is incorrect

Hi, it looks like the client IP address in my destinations (Amplitude and Snowflake) matches the IP of the rudder-server node rather than the actual client.

Tested with the latest version here deployed via rudderstack-helm. As an aside, is there a way to check the version or git hash of rudder-server instance?

Rudder as a datasource for Tableau?

Hi Rudder team,

I've got a tableau install on-prem, i'm looking for an event processing service that will bucket events sent from my app and then be able to present the bucketed/transformed data to Tableau as a data source. Is Rudder a good fit for this?

Test segment's python SDK with Rudder

Bring Up Rudder

  1. Go to the dashboard https://app.rudderlabs.com and set up your account. Copy your <workspace_token> from top of the home page.
  2. git clone --single-branch --branch segment-api https://github.com/rudderlabs/rudder-server.git
  3. git submodule init
  4. git submodule update
  5. cd rudder-transformer
  6. git checkout --track origin/segment-api
  7. replace <workspace_token> in build/docker.env with the above token.
  8. docker-compose up

Setup Source/Dest

  1. Login to app.rudderlabs.com
  2. On Create one source (Android or iOS) and configure a Google Analytics with the tracking ID

Check Segment's Python SDK

  1. git clone https://github.com/segmentio/analytics-python.git
  2. In analytics/init.py replace "host=None" with "host='http://localhost:8080'"
  3. python simulator.py --type track --event track --writeKey <write_key> --userId abcd --anonymousId abcd
    (replace write_key with correct key from dashboard)

Test segment's Go SDK with Rudder

Bring Up Rudder

  1. Go to the dashboard https://app.rudderlabs.com and set up your account. Copy your <workspace_token> from top of the home page.
  2. git clone --single-branch --branch segment-api https://github.com/rudderlabs/rudder-server.git
  3. git submodule init
  4. git submodule update
  5. cd rudder-transformer
  6. git checkout --track origin/segment-api
  7. replace <workspace_token> in build/docker.env with the above token.
  8. docker-compose up

Setup Source/Dest

  1. Login to app.rudderlabs.com
  2. On Create one source (Android or iOS) and configure a Google Analytics with the tracking ID

Check Segment's Go SDK

Need to fix this
https://github.com/segmentio/analytics-go

How to take BackUp on S3

Hi,
I just read that event data would be deleted from the PostgreSQL after the event is send to destination.
I want to back up the events on my own S3 bucket.
Let me know how to customise that.

"High" CPU usage at idle with official Docker setup

Hello, I tried setting up rudderstack with docker following the official documentation (https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker)
The only change I made to the docker-compose file was changing WORKSPACE_TOKEN with the correct value.
The server starts and the events get dispatched correctly, however the CPU usage at idle, with no events whatsoever is about 20-25% on my machine (i7-8550U 3.9GHz). I got similar results on a different machine too. I am running Linux and Docker 19.03.12.
All containers are on the "latest" tag, like in the provided docker-compose file (rudderlabs/rudder-server's hash is b0cf66d1817c and rudderlabs/rudder-transformer's hash is 4bb81602b25f)
Is this normal? If so, is there a way to reduce the CPU usage by tweaking the config file?

Thanks in advance!

Bugsnag Configuration

I think the Bugsnag API Key should be removed from the code and be externalised.
Should everyone trying the project get their own key?

TimescaleDB support?

Hi, I was wondering if there were plans to support TimescaleDB. TimescaleDB is implemented as a PostgreSQL extension, so it might be a matter of tweaking your PostgreSQL connector to be TimescaleDB-aware. It looks like this is the approach Grafana may have taken.

UI when adding a PostgreSQL sources in Grafana
image

Thanks!

Create CrashReporting abstraction

We are initializing bugsnag from our main package. This should be via abstraction and should be able to write custom plugins for Sentry, NewRelic etc.

Amplitude error: Cannot read property 'name' of undefined

Hi, I have a Java source (using rudder-sdk-java) connected to an Amplitude destination, and I'm getting the following error in the destination:

Source ID Attempt No. Job State Error Code Error Response
1fIIb8hISkVlmckcwy54umPI0A0 1 aborted 400 { "error": "Cannot read property 'name' of undefined" }

Add TODOs Badge to README

Hi there! I wanted to propose adding the following badge to the README to indicate how many TODO comments are in this codebase:

TODOs

The badge links to tickgit.com which is a free service that indexes and displays TODO comments in public github repos. It can help surface latent work and be a way for contributors to find areas of code to improve, that might not be otherwise documented.

The markdown is:

[![TODOs](https://badgen.net/https/api.tickgit.com/badgen/github.com/rudderlabs/rudder-server)](https://www.tickgit.com/browse?repo=github.com/rudderlabs/rudder-server)

Thanks for considering, feel free to close this issue if it's not appropriate or you prefer not to!

(full disclosure, I am the creator/maintainer of tickgit)

Running rudder-server without a rudderstack account?

Hello there!

It's a bit counter-intuitive to me that self-hosting this application would require creating an account on the project's web page. I'm wondering whether is a method for running the rudder server without an account-based workspace token, or otherwise to know what the purpose of this token is. The docs don't really explain its purpose, but they all state that it's a required part of setup.

Keeping in mind that I haven't read the code yet nor do I have a specific example of a failure mode, the need for an externally-generated key makes me a bit apprehensive about placing potentially sensitive information through the system. Just to give an idea of where my concerns are coming from.

Thanks for any clarification!

Support writeKey in event body

Segment's documentation recommends setting an Authorization header

writeKey, _, ok := req.request.BasicAuth()

However, there is at least one other way to supply a write key that is supported by Segment's API.

  1. Supplying a write key in the writeKey property of the event body will register as a valid event of the specified type. The writeKey in the event body will override the write key in the Authorization header if present.

  2. Also... Each item in a Batch may contain a different write key from its parent. iirc, Segment will use the batch write key by default and overrides with event specific write key for each event if present.


This API has been designed for maximum interoperability; best to make all possible accommodations.
We never know who might have rolled their own client relying on some quirk like this (they exist).

Docker setup is broken

Try to deploy rudder by using this tutorial: https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker
On docker-compose up -d got error in backend service:

--
-- wh_schemas
--

CREATE TABLE IF NOT EXISTS wh_schemas (
    id BIGSERIAL PRIMARY KEY,
    wh_upload_id BIGSERIAL,
    source_id VARCHAR(64) NOT NULL,
    namespace VARCHAR(64) NOT NULL,
    destination_id VARCHAR(64) NOT NULL,
    destination_type VARCHAR(64) NOT NULL,
    schema JSONB NOT NULL,
    error TEXT,
    created_at TIMESTAMP NOT NULL);

DROP INDEX IF EXISTS wh_schemas_source_destination_id_index;

CREATE INDEX IF NOT EXISTS wh_schemas_destination_id_namespace_index ON wh_schemas (destination_id, namespace); (details: read tcp 172.21.0.5:52578->172.21.0.4:5432: read: connection reset by peer)
        * driver: bad connection in line 0: SELECT pg_advisory_unlock($1)



goroutine 27 [running]:
github.com/bugsnag/bugsnag-go.AutoNotify(0xc0005a5d60, 0x3, 0x3)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/vendor/github.com/bugsnag/bugsnag-go/bugsnag.go:109 +0x2bc
panic(0x1449880, 0xc0001490c0)
        /root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/rruntime.Go.func1.1(0x1a22ea0, 0xc00023acf0)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:37 +0x33a
panic(0x1449880, 0xc0001490c0)
        /root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/warehouse.setupTables(0xc000678000)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1491 +0xfb
github.com/rudderlabs/rudder-server/warehouse.Start()
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1664 +0x119
main.startWarehouseService(...)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:153
main.main.func5()
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:352 +0x21
github.com/rudderlabs/rudder-server/rruntime.Go.func1(0x170eba0)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:40 +0x81
created by github.com/rudderlabs/rudder-server/rruntime.Go
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:26 +0x3f

Create a test that verifies MinIO/S3 destination is working

We are using GinkGo for integration tests.
Right now, we have different test suites for the following:

  1. Events are being sent to router tables after all the required transformations
  2. Uploading of event schema to config-backend is working as expected
  3. Tables are being migrated/deleted/backed up after successful completion.

We need to add a test to see if MinIO destination is working as expected.

Following Setup Instructions for Docker Causes a Crash of the Backend

Following the Docker Setup Instructions will produce a docker image for backend that will crash due to sql.NullTime not existing.

It looks like sql.NullTime was added in golang 1.13, but the build/Dockerfile-dev is using golang 1.12.

I was able to get the backend to run by switching the base image to golang:1.14-alpine in build/Dockerfile-dev. It does get it running on my local machine, but don't know if everything is fine with a newer version of golang.

Compatibility with CockroachDB

This is a duplicate of the closed issue #163

Nevertheless, I tried to run rudder-server with CockroachDB today but got an error -

ERROR   Rudder server needs postgres version >= 10. Exiting.

Understand from the CockroachDB docs that it is compatible with Postgres v9.5 onwards -
https://www.cockroachlabs.com/docs/v20.1/postgresql-compatibility.html

Does rudder-server use postgres specific features like store-procs, functions or triggers? (If not, it would be nice to have the scale-out features of CRDB for the backend).

It's not an issue.

I have created an Angular Service which can be used if someone is already using Angulartics2Segment in their Angular2+ project.
After injecting the rudder js script in index.html, by this service in place, I just needed to rename the injected class from Angulartics2Segment to AngularRudderService and import the AngularRudderService everywhere.
Might be helpful if you guys can create AngularticsRudder package which can be installed through npm.

import {Injectable} from '@angular/core';

@Injectable({
providedIn: 'root'
})
export class AngularRudderService {
constructor() {
}

pageTrack(path) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.page(path);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

eventTrack(action, properties) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.track(action, properties);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

setUserProperties(properties) {
try {
if (window.rudderanalytics) {
if (properties.userId) {
window.rudderanalytics.identify(properties.userId, properties);
} else {
window.rudderanalytics.identify(properties);
}
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

setAlias(alias) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.alias(alias);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
}

How to connect rudder-server with aws rds managed database instead of running database on container?

Hi,

I am trying to run the application supported with aws rds managed database. I verified that the aws postgres database is accessible on my machine but when I tried to pass the db endpoint in the environment variable of rudder-docker.yaml. It doesn't seems to work.

    entrypoint: sh -c '/wait-for aws-rds-postgres.amazonaws.com:5432 -- /rudder-server'
    ports:
      - "8080:8080"
    environment:
      - JOBS_DB_HOST=aws-rds-postgres.amazonaws.com
      - JOBS_DB_USER=rudder
      - JOBS_DB_PORT=5432
      - JOBS_DB_DB_NAME=jobsdb
      - JOBS_DB_PASSWORD=password

Please suggest what I need to do in order to run it in docker or kubernetes?

S3 Replay Docs

I'm not sure how I can replay my S3 backups for new destinations? Is this something that we can create docs for?

OnChange notifiers for backend config

Right now, backend polls the config-backend every n secs and forwards the latest config to all subscribers (eg. router, processor). We would eventually want to move to sockets to handle the changes instead of polling.

backend-config module should only notify changes to subscribers. It should also expose an API to return the complete config. This API would help subscribers who are coming online for the first time or if it missed handling some changes.

BigQuery Warehouse - DataSet Location

Hi !

I started using the rudder server locally but when adding the BigQuery destination I noticed that the "Location" configuration isn't being used ? (My DataSet is always created in the US)

When looking at the code I noticed that this location configuration wasn't used ? I've made a fix to create the Dataset with the correct location using this configuration so I could open a pull request with this change ?

Thanks :)

Gateway never returns 500

API Requests to rudder-server will only ever return a status code of 200 or 400.

if errorMessage != "" {
logger.Debug(errorMessage)
http.Error(w, errorMessage, 400)
} else {
logger.Debug(respMessage)
w.Write([]byte(respMessage))
}

This is problematic for at least 2 reasons

1) Lack of differentiation between 400 and 500 reduces visibility into service operation.

Services should differentiate between client errors and server errors. If the database becomes unavailable, you want the service to return 500 so that your alerting system can page the OPS team.

2) Clients treat 4xx and 5xx differently

Some clients will retry events when they receive a 500 and remove events when they receive a 400.

https://github.com/segmentio/analytics-android/blob/47f7341d81766b1b4a101ef69f491835d11f7532/analytics/src/main/java/com/segment/analytics/SegmentIntegration.java#L386-L394

The service must return 5xx on server error in order to minimize event loss during service outage.

Anonymous ID appearing as null when making track call using Analytics.Net code

Anonymous ID is appearing null when I am making track call using Analytics.Net code
unknown

I am using source code from - https://github.com/segmentio/Analytics.NET
Here if I am not passing this parameter then it is creating a new object at runtime and that have anonymous ID as null - https://github.com/segmentio/Analytics.NET/blob/master/Analytics/Model/BaseAction.cs#L34
Also the SDK is not setting this Anonymous ID
I am using it at server side (.Net Application)

@Team Please let me know by when I can expect it to be resolved

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.