Git Product home page Git Product logo

pirsch's Introduction

Pirsch Logo

Pirsch Analytics

Go Reference Go Report Card Chat on Discord

This is the open-source core of Pirsch Analytics, a privacy-friendly web analytics solution. It originally started as an experiment to reliably analyze web traffic from the server-side using Go.

Pirsch is made in the EU ๐Ÿ‡ช๐Ÿ‡บ and hosted on German ๐Ÿ‡ฉ๐Ÿ‡ช owned servers at Hetzner. You can find an interactive demo of what the dashboard looks like today here.

How does it work?

Pirsch generates a unique fingerprint for each visitor. The fingerprint is a hash of the visitors IP address, User-Agent, the date, and a salt. The tracking works without invading the visitor's privacy. It doesn't use cookies and no personal information is stored, making it GDPR-, CCPA-, and PECR-compliant. If used on the server-side, Pirsch can track visitors using ad blockers.

Learn more about privacy on our documentation.

Documentation

You can find our documentation here. The code reference can be found on go.dev.

Contributions

Contributions are welcome! Please open a pull requests for your changes and tickets in case you would like to discuss something or have a question.

To run the tests you'll need a ClickHouse database and a schema called pirschtest. The user is set to default (no password).

Note that we only accept pull requests if you transfer the ownership of your contribution to us. As we also offer a managed commercial solution with this library at its core, we want to make sure we can keep control over the source code.

License

GNU AGPLv3

pirsch's People

Contributors

kugelschieber avatar lenovouser avatar vanderlindenjc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pirsch's Issues

Sessions are not counted correctly

SELECT count(DISTINCT("fingerprint")) "visitors",
count(DISTINCT("fingerprint", "session")) "sessions"
FROM hit
;

Should show some difference, but it doesn't.

Better bot spam protection

Add IP ranges to filter for. I'm not sure this is actually that useful, since we filter out bot user agents anyways? It IS useful but keeping track of IPs is cumbersome and not reliable. I might use heuristics for this issue (like how many sessions does a client have, is it reasonable and so on).

Language can be null now

[ERROR] Error processing tracking data err=sql: Scan error on column index 2, name "language": converting NULL to string is unsupported

Visitors are counted twice

For "today" and "yesterday". The number for yesterday is included twice: yesterday*2+today

  • Pages
  • Browser
  • OS
  • Platform

Improve pirsch.js

The event handler might not fire, and some other improvements can be made.

Track (optional) status code

Track HTTP status codes to get statistics on failed requests. Should be optional and set when calling the Hit function.

Bots and User-Agents to filter

Go-http-client
Twingly
evc-batch
mailto
newspaper
FeedReader
Magic Browser
Ruby
Weavr
HackerNews
CFNetwork
SocialBeeAgent
Embed PHP library (?)

Update existing entry in processor

In case an entry exists already, update it. This is useful to be able to run the processor multiple times a day, whenever and without getting messed up data.

Frontend Tracking

Add a simple Js script to call an endpoint. Easy.

  • include screen size (window.screen)
  • make sure it's not blocked by the cache
  • run after DOMContentLoaded (event listener, and support older browsers)
  • endpoint must be configurable
  • opt-in
  • process screen size and add to analyzer

Funnels

Funnels can probably be implemented by combining multiple filtering steps. Example:

  • filter by route /checkout
  • filter by event "completed checkout" + previous filter
  • filter by conversion goal /checkout/confirmation/** + previous filters
  • calculate the loss in between (percentage)
  • show visitors from step 1 -> step n
200      --(-10%)--> 180       --(-50%)--> 90
Filter 1             Filter 2              Filter 3

Refactor Data Model

This will be a huge refactoring to simplify storing/processing the statistics and to allow them to be filtered by tenant + page + period, where each part is optional. The other views can be cumulated from these or extracted from the hits directly (live data).

Ignore empty User-Agents

They are either

  • bots or
  • people who do not want to be tracked

So I think it's fine to ignore them.

Value too long

marvinblum    | panic: pq: value too long for type character varying(200)
marvinblum    | 
marvinblum    | goroutine 11 [running]:
marvinblum    | github.com/emvi/pirsch.panicOnErr(...)
marvinblum    | 	/go/pkg/mod/github.com/emvi/[email protected]/util.go:23
marvinblum    | github.com/emvi/pirsch.(*Tracker).aggregate(0xc00005c460, 0x99f8a0, 0xc000076680)
marvinblum    | 	/go/pkg/mod/github.com/emvi/[email protected]/tracker.go:169 +0x44b
marvinblum    | created by github.com/emvi/pirsch.(*Tracker).startWorker
marvinblum    | 	/go/pkg/mod/github.com/emvi/[email protected]/tracker.go:151 +0x9a

Bounce rates

Count every visitor who leaves after visiting the first page as bounced.

Mongodb support

I would like to ask if you planning or maybe accept a PR to accept Mongodb integration?

Session enhancement

Right now, a session cannot live longer than the maximum session lifetime. To change this the following two changes need to be made:

  1. use "time" instead of "session" to look up existing sessions so they don't run out
  2. update session time in cache when found

SELECT "session" FROM "hit" WHERE fingerprint = $1 AND "session" > $2 LIMIT 1

Probably need to add more test to make sure sessions are continuous, but can run out at some point.

Add helper function for tenant ID

Creating new sql.NullInt64s everywhere is annoying. Add a function to do that by passing an integer (0 = null, everything else = valid ID).

Add tenants

To track multiple applications/subdomains or to separate data in other ways.

Use different hashing algorithm/uuid for fingerprints

So that the fingerprint cannot be unhashed again.

This is not an issue as long as you (the user) don't decrypt the hashes and no one gets access to your database. Consumer IPs usually change after a while, as ISPs use IP pools for their customers.

Active visitors

Must be counted for distinct fingerprints and not grouped by path for the total number of visitors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.