posthog / charts-clickhouse Goto Github PK

Helm chart for deploying PostHog with ClickHouse on your K8s infrastructure

License: MIT License

Mustache 6.72% JavaScript 11.05% Shell 6.81% Python 40.97% Smarty 33.95% Dockerfile 0.48%

posthog kubernetes helm-chart clickhouse

charts-clickhouse's Introduction

⚠️ PostHog no longer supports Kubernetes deployments. ⚠️

As of May 2023, PostHog no longer support Kubernetes deployments. This decision doesn't impact open source ("Hobby") users on Docker Compose deployments.

What's next?

To continue using PostHog, you have two options:

Using PostHog Cloud (Recommended)

We strongly encourage users to move to PostHog Cloud wherever possible so that they always have the latest features and the full benefit of official support. It usually works out much cheaper, too.

PostHog Cloud is SOC 2 compliant and available with either EU or US hosting options. If you already have a self-hosted instance, you can migrate to PostHog Cloud. Alternatively, you can choose to migrate to a open source deployment instead.

Self-hosting PostHog

If you want to continue using a self-hosted PostHog deployment then you do so without support.

We strongly recommend following the official instructions to deploy PostHog if you must self-host. Most people who modify or use a non-standard way of running this chart run into issues, which we are unable to help with.

Security updates will continue until May 2024.

PostHog Helm Chart

🦔 PostHog is a developer-friendly, open-source product analytics suite.

This Helm chart bootstraps a PostHog installation on a Kubernetes cluster using the Helm package manager.

Prerequisites

Kubernetes >=1.24 <= 1.26
Helm >= 3.7.0

Installation

Deployment instructions for the major cloud service providers and on-premise deploys are available here.

Changelog

We documented detailed changes for each major release in our upgrade notes.

Development

We welcome all contributions to the community, but no longer offer any support.

Testing

This repo uses several types of test suite targeting different goals:

lint tests: to verify if the Helm templates can be rendered without errors
unit tests: to verify if the rendered Helm templates are as we expect
integration tests: to verify if applying the rendered Helm templates against a Kubernetes target cluster gives us the stack and PostHog installation we expect

Lint tests

We use helm lint that can be invoked via: helm lint --strict --set "cloud=local" charts/posthog

Unit tests

In order to run the test suite, you need to install the helm-unittest plugin. You can do that by running: helm plugin install https://github.com/quintush/helm-unittest --version 0.2.8

For more information about how it works and how to write test cases, please look at the upstream documentation or to the tests already available in this repo.

To run the test suite you can execute: helm unittest --helm3 --strict --file 'tests/*.yaml' --file 'tests/clickhouse-operator/*.yaml' charts/posthog

If you need to update the snapshots, execute:

helm unittest --helm3 --strict --file 'tests/*.yaml' --file 'tests/**/*.yaml' charts/posthog -u

Integration tests

kubetest: to verify if applying the rendered Helm templates against a Kubernetes target cluster gives us the stack we expect (example: are the disks encrypted? Can this pod communicate with this service?)
k6: HTTP test used to verify the reliability, performance and compliance of the PostHog installation (example: is the PostHog ingestion working correctly?)
e2e - k3s: to verify Helm install/upgrade commands on a local k3s cluster
e2e - Amazon Web Services, e2e - DigitalOcean, e2e - Google Cloud Platform: to verify Helm install command on the officially supported cloud platforms

Running k3s for tests locally

k6 test is using k3s for running things on localhost, which might be tricky to get running locally. Here's one method:

# Install k3s without system daemon
curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_ENABLE=true sh
# Once done run k3s manually, disabling conflicting services
sudo k3s server --write-kubeconfig-mode 644 --disable traefik --docker --disable-network-policy

# Deploy the chart
helm upgrade --install -f ci/values/k3s.yaml --timeout 20m --create-namespace --namespace posthog posthog ./charts/posthog --wait --wait-for-jobs --debug

# Once done, prepare the data/environment
./ci/setup_ingestion_test.sh

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
export POSTHOG_API_ADDRESS=$(kubectl get svc -n posthog posthog-web -o jsonpath="{.spec.clusterIP}")
export POSTHOG_EVENTS_ADDRESS=$(kubectl get svc -n posthog posthog-events -o jsonpath="{.spec.clusterIP}")
export "POSTHOG_API_ENDPOINT=http://${POSTHOG_API_ADDRESS}:8000"
export "POSTHOG_EVENT_ENDPOINT=http://${POSTHOG_EVENTS_ADDRESS}:8000"
export "SKIP_SOURCE_IP_ADDRESS_CHECK=true"

# Run test
k6 run ci/k6/ingestion-test.js

Release

To release a new chart, bump the version in charts/posthog/Chart.yaml. We use Semantic Versioning:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backwards compatible manner
PATCH version when you make backwards compatible bug fixes

Read API here as the chart values interface. When increasing the MAJOR version, ensure to add appropriate documentation to the Upgrade notes.

Charts are published on push to the main branch.

Note that development charts are also released on PRs such that changes can be tested as required before merge, e.g. changing staging/dev to use the chart for more end to end validation.

charts-clickhouse's People

Contributors

Stargazers

Watchers

charts-clickhouse's Issues

EPIC helm chart self-hosted deployment support

Tracking so we would know what users are struggling with. Please feel free to update/add

Install failed

cluster too small - we have cluster requirements now on all pages ( https://posthogusers.slack.com/archives/C01GLBKHKQT/p1632033205050900 (too small cluster) & #59 (too small cluster))
https://posthogusers.slack.com/archives/G01H7LGR7D0/p1629390081015800 (postgres helm repo, aws changing their docs underneath us)
https://posthogusers.slack.com/archives/CT7HXDEG3/p1631604962126200 (k8s 1.22 support, which is DONE now)
Reinstall (https://posthogusers.slack.com/archives/C01FKPGG5U6/p1631774038046900 (reinstall needs upgrade oddness & https://posthogusers.slack.com/archives/CT7HXDEG3/p1633603105399900 (2nd attempt, chi-clickhouse not coming up - upgrade fixes)

uninstall

https://posthogusers.slack.com/archives/C02E48PCE0K/p1631625238004000 (namespace deletion finalizers) => #172

Upgrade failed

PostHog/posthog#6025 (migrate job to delete + re-run upgrade)

TLS setup issues:

https://posthogusers.slack.com/archives/CT7HXDEG3/p1631975228176900 (confusion around what hostname should be in aws)
https://posthogusers.slack.com/archives/C01FKPGG5U6/p1631601029037600 (404s: 2 separate; maybe not tls for 2nd)
https://posthogusers.slack.com/archives/G01JXEDAL22/p1632726015182800 (???)
#133

Something broke in prod:

https://posthog.slack.com/archives/C021HLFPAA2/p1631220219019900 (plug-in server couldn't connect to Redis; restart helped) => /health endpoint issue

Kafka:

https://posthogusers.slack.com/archives/CT7HXDEG3/p1632561949288200 (kafka out of space & resizing didn't work as expected) & https://posthogusers.slack.com/archives/C01FKPGG5U6/p1632577233112700 + issues in comments => we have kafka runbook docs now + changing default sizes atm

Other:

dry-run crds missing failure - expected #91
Lookup usage / increase disk size: https://posthogusers.slack.com/archives/CT7HXDEG3/p1632290353227900

Add FAQ about Kafka disk full problem

Explain why we run into this & how to fix it. Look into how we can monitor disk size. Consider adding more info to values or suggestions for sizing.

E.g. happened here https://posthogusers.slack.com/archives/C026SNNE430/p1628192801014300

Ability to customize annotations for individual ingress routes

Is your feature request related to a problem?

When using PostHog in environments with high security requirements it would be nice to have different Ingress rules for the various routes. Namely, I want to be able to expose routes that are required for ingesting traffic to a publicly accessible Ingress Controller, while exposing the Web UI routes to only an internal Ingress Controller accessible over VPN.

Describe the solution you'd like

The ability to set annotations for the ingress configuration for each route.

In the end state in my particular configuration I'd like to have:

An Ingress that would be my public space, with my default public ingress and no additional annotations, that would cover all paths other than /
An Ingress for the / path that would have additional annotations that I can use to either change the ingress class to point to my internal ingress controller or adding extra Nginx annotations to whitelist a particular IP address.

I am a Helm amateur so I can't comment on the best way to do this, but even letting users override the Ingress template entirely would permit this functionality.

Describe alternatives you've considered

I have looked at existing PostHog functionality for this.

https://posthog.com/docs/self-host/configure/securing-posthog#handbook-content-menu-wrapper#restrict-access-by-ip

Organizationally we already do much of our access security by either using internal ingress controllers or whitelisting IPs on the Nginx side. It would be preferable to bring PostHog inline with that use case, and can be combined with PostHog's existing functionality to provide defence-in-depth from a networking access perspective.

Additional context

Working in healthcare, I've found it much easier to convince enterprise security teams to permit novel applications when you can seal it off in a box protected by a VPN!

Thank you for your feature request – we love each and every one!

no matches for kind "ClickHouseInstallation" in version "clickhouse.altinity.com/v1"

Bug description

When trying to install on GKE/GCP, I get the following error (note, I'm a helm newbie, so this may be a simple error on my side - but I think i've followed the docs correctly). I've tried older chart versions, like 3.0.3, but still get the same error.

$  helm --kube-context analytics --namespace posthog --debug install posthog posthog/posthog --values ./posthog-values-v2.yaml --atomic --create-namespace --dry-run
install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /home/mands/.cache/helm/repository/posthog-3.2.0.tgz
install.go:194: [debug] WARNING: This chart or one of its subcharts contains CRDs. Rendering may fail or contain inaccuracies.
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "ClickHouseInstallation" in version "clickhouse.altinity.com/v1"
helm.go:88: [debug] unable to recognize "": no matches for kind "ClickHouseInstallation" in version "clickhouse.altinity.com/v1"
unable to build kubernetes objects from release manifest
helm.sh/helm/v3/pkg/action.(*Install).Run
	helm.sh/helm/v3/pkg/action/install.go:265
main.runInstall
	helm.sh/helm/v3/cmd/helm/install.go:242
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:120
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:960
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:897
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371

Values file

cloud: "gcp"
ingress:
  hostname: xyz.example.com
  gcp:
    ip_name: "posthog"
  nginx:
    enabled: false
certManager:
  enabled: false
metrics:
  enabled: false

#clickhouseOperator:
#  enabled: false
#clickhouse:
#  enabled: false

Uncommenting the above sections in the yaml that disable ClickHouse allow the install operation to complete, but obviously without ClickHouse

Expected behavior

The dry-run completes and outputs the generates k8s objects

How to reproduce

Run the commands above

Environment

Deployment platform (gcp/aws/...): GCP
Chart version/commit: 3.2.0
Posthog version: default for chart version

Additional context

$ helm version
version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}

$ kubectl --context=analytics version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"archive", BuildDate:"2021-03-30T00:00:00Z", GoVersion:"go1.16", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.8-gke.900", GitCommit:"28ab8501be88ea42e897ca8514d7cd0b436253d9", GitTreeState:"clean", BuildDate:"2021-06-30T09:23:36Z", GoVersion:"go1.15.13b5", Compiler:"gc", Platform:"linux/amd64"}

Thank you for your bug report – we love squashing them!

Major upgrades need to follow upgrade-notes

Creating an issue as docs are moving to posthog.com, so I don't want to change it twice.

Add some bigger bolder signs around how users should verify that they are doing a major upgrade or not (ideally a single shell cmd) & if so direct them to upgrade notes

helm list

shows the chart version
we can check the tags to see where the chart is at https://github.com/PostHog/charts-clickhouse/tags

Automated test for deploying to a different namespace

Is your feature request related to a problem?

We have broken chart deployment to namespaces other than "posthog" various times.

Describe the solution you'd like

Add a test that tries to deploy with different namespace name & check everything came up.

Describe alternatives you've considered

as is; use a different namespace during release testing.

Additional context

Thank you for your feature request – we love each and every one!

Allow PostHog and ClickHouse to be in different Namespaces

Is your feature request related to a problem?

Hi, I'm not sure if this is a minor bug or a feature request, but I had a problem where the Kafka tables in ClickHouse weren't connecting because I had connected PostHog to an existing ClickHouse Cluster which happened to be in a different namespace. I manually dropped and recreated the Kafka tables with the namespace added to the URL, which fixed the issue.

Describe the solution you'd like

In the _helpers.tpl maybe it could default to always using the full form, for example:
Namespace: analytics
Release Name: prod
Before URL: prod-posthog-kafka:9092
After URL: prod-posthog-kafka.analytics:9092

I'm making an assumption here that, by changing it there, it would change the URL set by the migrate job that creates the tables.

Describe alternatives you've considered

I probably could have set this full form URL myself in the values.yml had I known that it would be a problem, so an alternative could be just to document it.

Additional context

I'm new to both PostHog and Kubernetes.
I've run into other issues before because of the way I've tried to set things up.

Thanks guys, love your work.

Update `clickhouse-operator` related resources to support k8s >= 1.22

We should update the clickhouse-operator related resources included in this chart as they are currently using deprecated k8s API that will be removed in version >= 1.22

Describe alternatives you've considered

Manually changing the API without upgrading the charts (see #126)

Additional context

W1019 11:43:08.443821   56647 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1019 11:43:08.601663   56647 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1019 11:43:08.831510   56647 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1019 11:43:08.871469   56647 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition

K8s v1.22 compatibility

we use v1beta1 quite a bit & it's not available going forward
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#customresourcedefinition-v122

no matches for kind "ClusterIssuer" in version "cert-manager.io/v1"

Bug description

When the cert-manager is enabled, an install or upgrade fails with the following error:

Error: UPGRADE FAILED: unable to recognize "": no matches for kind "ClusterIssuer" in version "cert-manager.io/v1"

values.yaml

cloud: "do"
ingress:
  hostname: <host<
  nginx:
    enabled: true
cert-manager:
  enabled: true

# cloud: "do"
# ingress:
#   nginx:
#     enabled: true
#     redirectToTLS: false
#   letsencrypt: false
# web:
#   secureCookies: false

# Note that this is experimental, please let us know how this worked for you.

# More storage space
clickhouseOperator:
  storage: 200Gi

postgresql:
  persistence:
    size: 60Gi

redis:
  master:
    size: 30Gi

kafka:
  persistence:
    size: 30Gi
  logRetentionBytes: _22_000_000_000

# Enable horizontal autoscaling for services
pgbouncer:
  hpa:
    enabled: true

web:
  hpa:
    enabled: true

beat:
  hpa:
    enabled: true

worker:
  hpa:
    enabled: true

plugins:
  hpa:
    enabled: true

Environment

Deployment platform (gcp/aws/...): do
Chart version/commit: 1.29.0
Posthog version: posthog-4.0.0

Change loadbalancers to nodeports to reduce costs

On DigitalOcean each load balancer is 10$/month, we currently have 3, but only need 1, the other two can be changed to node ports reducing the deployment cost by 20$/month

[GCP] Make sure the GCE ingress uses the defined readiness probe definition

Bug description

While working on the e2e test for GCP, I found the GCE Ingress (GLB) not picking up the health check information from the readiness probe definition. Instead, it is using the default / making the health check to fail (as the app returns an HTTP 302).

This StackOverflow thread might have more info.

Expected behavior

When we deploy PostHog to GCP, the ingress load balancer uses the HTTP /_health path as readiness probe.

Actual behavior

When we deploy PostHog to GCP, the ingress load balancer uses the HTTP / path as readiness probe.

How to reproduce

Install PostHog on GCP

IP address captured is worker node IP instead of user IP

Bug description

The IP address captured in the events is Kubernetes worker node instead of the user's IP address.

I had also tried enabling proxy protocol to ELB following this documentation https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-proxy-protocol.html

Posthog is deployed on AWS EKS.

Expected behavior

IP address captured in the events should be user's public IP

Environment

Deployment platform: aws
Chart version/commit: 3.12.0
Posthog version: 1.28.0

AWS self-hosting costs analysis

Goal: help us and users understand hosting costs

Total: roughly 70$ + 15$ + 55$ = 140$/month

Quoting from https://aws.amazon.com/ec2/pricing/on-demand/

compute

2x m5.large (which is the default created when setting up a Kubernetes Cluster with eksctl. Have 2vCPU and 8GB of memory which works well for us and could be smaller in theory.

on-demand hourly in cheapest locations (e.g. US West Oregon) is $0.096, which makes *730h/month = 70$ (reserved instances are about 40% cheaper).

Data transfer

In is free, out is free up to 1GB/month, I assume we don't need more (except potentially when someone is using a plugin to send data somewhere else)

With the AWS Free Usage Tier, you get up to 15GBs of Data Transfer Out (to internet, other AWS regions, or CloudFront) free each month across regions (except AWS GovCloud regions) for one year. After first year, you get 1GB Data Transfer Out to internet free per month per region.

Storage

68GB in the chart defaults *0.10 = 7$
2x 80GB or snapshots *0.05 = 8$
^ which I'm not sure why are there https://eu-west-3.console.aws.amazon.com/ec2/v2/home?region=eu-west-3#Volumes:sort=desc:createTime

https://aws.amazon.com/ebs/pricing/

General Purpose SSD (gp3) - Storage | $0.08/GB-month
General Purpose SSD (gp2) Volumes | $0.10 per GB-month of provisioned storage. (which we got)
EBS Snapshots | $0.05 per GB-month of data stored

Elastic IP (1)

You can have one Elastic IP (EIP) address associated with a running instance at no charge

Load Balancers

We have 3 classic ones, so 73030.025 = 55$ + however much data was processed * 0.008/GB

https://aws.amazon.com/elasticloadbalancing/pricing/?nc=sn&loc=3

$0.025 per Classic Load Balancer-hour (or partial hour)
$0.008 per GB of data processed by a Classic Load Balancer

Chart-tests are flaky

chart-test checks i.e. our ci for making sure we didn't break the chart against various k8s versions are flaky & it would be great if they didn't fail if things are fine.

GCP deployment results in 502 http error

Bug description

Upon installing to GCP with the default values in the README.md, I get a 502 upon going to the main URL

helm command

$ helm --kube-context analytics --namespace posthog1 install posthog posthog/posthog --values ./posthog-values-v2.yaml --set email.password=$SENDGRID_API_KEY --atomic --create-namespace

values

cloud: "gcp"
ingress:
  hostname: xyz.example.com
  nginx:
    enabled: false
certManager:
  enabled: false
metrics:
  enabled: false
email:
  from_email: [email protected]
  host: smtp.sendgrid.net
  user: apikey

I created an External IP in the GCP console, as per the docs, which appears to be connected to the K8S node port

I have an Google managed SSL cert which seems to have provisioned, hence DNS is working

$ gcloud beta compute ssl-certificates list
NAME                                                TYPE          CREATION_TIMESTAMP             EXPIRE_TIME                    MANAGED_STATUS
k8s2-cr-mi7kcees-a0bsfajor9b9gf2a-19c53f806b9b9baf  SELF_MANAGED  2021-06-20T23:39:15.697-07:00  2021-09-18T22:31:28.000-07:00
mcrt-291b0d5a-888d-4ac4-8d00-6dc563537402           MANAGED	  2021-08-06T06:53:02.540-07:00  2021-11-04T06:53:05.000-07:00  ACTIVE
    xyz.example.com: ACTIVE

The posthog NodePort seems to be up

$ kubectl get svc posthog --namespace posthog1
NAME      TYPE       CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
posthog   NodePort   10.72.0.5    <none>        8000:30984/TCP   19m

Expected behavior

The posthog home page is returned

Environment

Deployment platform (gcp/aws/...): GCP
Chart version/commit: 3.3.0
Posthog version: default for chart

Additional context

I'm using posthog1 as a namespace, as I'm unable to delete the initial posthog namespace due to the ClickHouseInstallation operator not deleting when running helm uninstall

Thank you for your bug report – we love squashing them!

Install fails on non-posthog namespace

Bug description

I saw that your Clickhouse charts were now public, so I thought I'd have a go at installing them in our staging namespace to see how it works. I'm running into an issue, though, where the chart seems to expect the namespace to be posthog.

Expected behavior

I can install the chart in any namespace.

How to reproduce

helm upgrade -i --timeout 20m --namespace main-tools-staging -f values-staging.yaml staging-posthog-clickhouse path/to/my/chart
Get the error Error: UPGRADE FAILED: failed to create resource: namespaces "posthog" not found

My chart

apiVersion: v2
name: posthog-clickhouse
type: application
version: 0.1.0
appVersion: "1.26.0"
dependencies:
  - name: posthog
    version: 3.0.0
    repository: https://posthog.github.io/charts-clickhouse

Values

posthog-clickhouse:
  cloud: "gcp"
  certManager:
    enabled: false
  ingress:
    nginx:
      enabled: false
    hostname: posthog-clickhouse.staging.example.com
  hooks:
    migrate:
      hookAnnotation: "pre-install,pre-upgrade"
  clickhouseOperator:
    namespace: "main-tools-staging"

Environment

Deployment platform (gcp/aws/...): GCP
Chart version/commit: 3.0.0
Posthog version: 1.26

Failed updates leave migrations job around ocasionally

Bug description

Failed updates leave migrations job there, so we need to run

kubectl delete job -n posthog posthog-migrate

to clear the state

Expected behavior

clean state after upgrade rolls back

How to reproduce

run an upgrade where migration fails (e.g. --timeout too short)
see that migrations job sticks around

Thank you for your bug report – we love squashing them!

Failled vanilla install

Bug description

Vanilla installation fails with no custom values.yaml with

Error: failed to install CRD crds/clickhouse_installation_template.yaml: unable to recognize "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"

Expected behavior

Install the chart

How to reproduce

Add repo, update repo
helm install --create-namespace --namespace posthog posthog posthog/posthog
Error: failed to install CRD crds/clickhouse_installation_template.yaml: unable to recognize "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"

Environment

managed kubernetes in scaleway
latest in repo
default for chart

Additional context

Thank you for your bug report – we love squashing them!

Investigate downtime during deployment

Bug description

A customer running on AWS is reporting some downtime during helm operations. They are running with 2 replicas for web/events.

Expected behavior

No downtime expected

How to reproduce

"It appears that helm creates two new nodes, and terminates the old ones. But it terminates them before the new pods become ready / healthy. For example I can see this shortly after an upgrade, causing NGINX to show a 503 for all requests."

NAME                                                READY   STATUS    RESTARTS   AGE
posthog-web-c4bc4b487-4msjb                         0/1     Running   0          54s
posthog-web-c4bc4b487-cxzhn                         0/1     Running   0          54s

Environment

Deployment platform (gcp/aws/...): AWS (possibly everywhere)
Chart version/commit: latest
Posthog version: latest

Additional context

See private Slack channel

Document backup / retention methods

Is your feature request related to a problem?

We have just launched the Clickhouse stack and are now testing it in anger.

Previously, we were on the Postgres stack which had the DB in RDS (which has auto snapshot and retention settings).

We'd love to have the same level of backups / retention on the Clickhouse stack.

Describe the solution you'd like

We'd like to know how & what to backup on the Clickhouse stack (Clickhouse events? Postgres data?)

Describe alternatives you've considered

Not run any backups.

Additional context

Thank you for your feature request – we love each and every one!

Readme: What are the actual system requirements?

The DigitalOcean page says it's 8gb ram, 4cpu. That's a little expensive for a < 100 user website so I tried with 1gb instead - didn't install. 2gb installs, but keeps crashing. This is from the DigitalOcean Marketplace 1-click-install. Then I started wondering, if 2gb is crashing then surely 8gb can't really handle millions of users when the time comes?

Matomo has a clear page showing system requirements in different traffic situations. What are they for PostHog?

Helm charts enhancement

👋 Hi! I’m going to list here few random ideas on how we could improve our helm charts divided by topic:

📈 Scaling

we should support vertical and horizontal scaling of all our dependendencies: Kafka, ClickHouse and PostgreSQL
- vertical service a scale: this is usually an operation used as first mitigation in case of resource contention. It usually involves adding more CPU/memory/storage to a pod.
- horizontal service scale: this is usually an operation that can take some time (depending on the dataset) and usually requires dataset partitioning/sharding and a cluster rebalance operation.
related to ☝️ we should make sure we mount service data dir on top of resizable storage

🚨 Monitoring & Alerting

As part of the helm charts, we should ship a basic monitoring/alerting stack. I know we have some debugging information already built-in into PostHog and we could probably extend that but I don’t think it will covers most of the cases we might need (e.g. how can we troubleshoot a problem when a PostHog installation is down?)

📑 Documentation

We should document all the maintenance operations & alerts in a runbook.

Please share your ideas and I'll add them to this post. Thank you!

Add option (value) for specifying cluster issuer directly

In the current helm chart, either you use the CertManager in the chart, or you don't get SSL. It would be nice that we could specify the name of the cluster issuer / secret to the ingress in case there already is a CertManager installed in the cluster.

Describe the solution you'd like

Allow to specify cert-manager.io/cluster-issuer value in ingress when certManager: False

aws EKS fail

I deployed eks in AWS for several times and reported an error after waiting for a long time. What might be the reason?

Address deprecation warnings from chart install

Bug description

Deprecation warnings e.g.

W0715 21:23:44.131190    6536 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition

Expected behavior

We're using the right versions and logs aren't spammed with deprecation warnings

How to reproduce

helm install with certManager & nginx enabled

Environment

Chart version/commit: main

Thank you for your bug report – we love squashing them!

Email setup from the PostHog app

Is your feature request related to a problem?

Currently in order to have email available for password reset etc it needs to be set in the values.

Describe the solution you'd like

Ability to set the email config in the PostHog app.

Describe alternatives you've considered

As is.

Additional context

In order for the email to stick around during upgrades etc we'll need to store the account info in the (Postgres) DB as we can't create a k8s secret from the app afaik.

Thank you for your feature request – we love each and every one!

AWS documentation

Is your feature request related to a problem?

Let's add documentation on how to deploy to AWS.

Describe the solution you'd like

Skip cloudformation madness.

Instead:

Link to some guide on how to set up EKS
Give a guide for setting up on AWS (e.g. values, etc).

Describe alternatives you've considered

Cloudformation madness. We've seen this doesn't really work as the helm action and chart quickstart is quite raw.

Additional context

Thank you for your feature request – we love each and every one!

Evaluate the deprecation of the `beat` deployment

Is your feature request related to a problem?

We currently run two specific deployments for beat and workers. We could probably simplify this setup by deprecating the beat deployment and reconfigure workers to run ./bin/docker-worker-celery --with-scheduler instead of ./bin/docker-worker-celery.

I'm not aware of any downside doing that (as it's also how we run PostHog Cloud) but we should probably double check everything works as expected before moving forward.

Describe the solution you'd like

Less resources to maintain.

Describe alternatives you've considered

Do nothing.

Additional context

See conversation in our internal Slack

Generate clickhouse password dynamically

Is your feature request related to a problem?

Currently, the clickhouse password is static and publicly available in this repo

Describe the solution you'd like

Generate the data dynamically via secrets.

Describe alternatives you've considered

Make it a mandatory variable

Additional context

See Altinity/clickhouse-operator#700 for context on how to implement this.

Thank you for your feature request – we love each and every one!

Chart is currently broken

New installs broke after #28 (note - tests caught this, but we merged anyways)

See an attempt at fixing it over at #32. However this does not work. The new allow_experimental_window_functions were added in clickhouse 21.3, but 21.2 changed docker permissions in a way that clickhouse operator does not seem to support.

I've created an issue upstream Altinity/clickhouse-operator#734

Digital Ocean deployment post-install

Is your feature request related to a problem?

Currently the Digital Ocean 1-click install button deploys Posthog with TLS disabled. Furthermore the update button would likely revert any changes made.
Note to self: Wordpress has a section about configuring hostname and LetsEncrypt: https://marketplace.digitalocean.com/apps/wordpress (though oddly no info about that in their kubernetes one https://marketplace.digitalocean.com/apps/wordpress-kubernetes)

Also migrations might not be able to allocate (not enough memory)

Describe the solution you'd like

Description to users how to secure Posthog and update to configs so that the 1-click upgrades don't revert the changes.

Describe alternatives you've considered

Using parameters, but sadly DigitalOcean doesn't support it at this point.

Thank you for your feature request – we love each and every one!

Helm fails to render the chart when Redis is disabled

Bug description

I tried to install the Helm chart using an external Redis cluster and it failed with the following error message:

helm install -f posthog.yaml --timeout 20m -n applications posthog posthog/posthog --dry-run --debug --version=5.1.0
install.go:159: [debug] Original chart version: "5.1.0"
install.go:176: [debug] CHART PATH: /Users/aleksanderarruda/Library/Caches/helm/repository/posthog-5.1.0.tgz

install.go:184: [debug] WARNING: This chart or one of its subcharts contains CRDs. Rendering may fail or contain inaccuracies.
Error: YAML parse error on posthog/templates/plugins-deployment.yaml: error converting YAML to JSON: yaml: line 61: did not find expected key
helm.go:84: [debug] error converting YAML to JSON: yaml: line 61: did not find expected key

How to reproduce

Use the following values.yaml file:

cloud: "aws"

redis:
  enabled: false

  host: "redis.com"
  password: "doesntmatter"
  port: "6379"

Install the Helm chart (--dry-run should be enough):

helm install -f posthog.yaml --timeout 20m -n applications posthog-clickhouse posthog/posthog --dry-run --debug --version=5.1.0

Environment

Deployment platform: AWS
Chart version/commit: 5.1.0
Posthog version: 1.30.0

Default chart configuration to target more scalable needs (non-hobbyist)

Is your feature request related to a problem?

We don't really have good guides around how to get to a more scalable PostHog install and some stuff is hard to change later. Current docs only address values file changes: https://posthog.com/docs/self-host/deploy/configuration#scaling-up

Describe the solution you'd like

We'll have clear guides for these 3 use cases:

10$ / 20$ hobby version one can use for up to ? events and it's a bit slow to load potentially?
Potentially: 60$ small setup (on DigitalOcean only as GCP/AWS it's already more expensive, so why not go the scalable route?) scales up to ? events
(>100$) auto-scalable setup for any platform with guidance on cluster & node sizing. This is what the chart default values will target.

We'll need:

Resource limits #159
Stateful services default sizes to upsize
Doc to include cluster and node size recommendation (for auto-scalable and small setup)
Doc for how to get minimal / cheaper setup
stateless services autoscaling (hpa)
Doc for scaling up (links to migration guide)

Describe alternatives you've considered

Keep the minimal chart as the default.

Additional notes

If the auto-scalable default version close to the minimal price we can get then we just have two: hobbyist + scalable. If it is significantly more than we'll want to keep the DO 60$ option. If significantly more than 100$ we might want keep info for aws/gcp/azure for minimal setups too which are about 100$ iirc)

https://posthog.github.io/charts-clickhouse/ returns 404

Is your feature request related to a problem?

https://posthog.github.io/charts-clickhouse/ returns 404

Describe the solution you'd like

We see something similar to https://posthog.github.io/charts/ ideally with a PostHog logo too. We should also redirect folks from there to the clickhouse charts.

Describe alternatives you've considered

as is

Additional context

Thank you for your feature request – we love each and every one!

Posthog with existing Clickhouse

Bug description

I am already using Clickhouse for some projects of mine in the k8s cluster by using liwenhe1993 helm chart. So it's an obvious thing - avoiding installing an operator.

clickhouseOperator:
  enabled: false

clickhouse:
  enabled: true
  host: clickhouse.clickhouse.svc.cluster.local
  user: default
  password: pwd
  database: ph

Migration job pods are failing with error:

infi.clickhouse_orm.database.ServerError: Requested cluster 'posthog' not found (version 21.3.16.5 (official build)) (170)

Expected behavior

Migrations went well

How to reproduce

Disable click house operator
Configure all values required for existing clickhouse
Run helm installation helm upgrade -i -f values.yaml --timeout 20m --namespace posthog posthog posthog/posthog --atomic

Environment

k8s 1.22 EKS

Additional context

Database ph created successfully

Thanks.

e2e chart testing

Is your feature request related to a problem?

Unit testing is great and we're already working on it. Adding full e2e tests would give us even more confidence and to be able to move fast without breaking things.

Describe the solution you'd like

We create e2e tests for all our platforms (DO, AWS, Azure, GCP).
We'll have a cluster we continuously use only for these tests, we'll run max 1 concurrently per cluster. This means these won't be ran on each PR most likely but rather on master in PostHog repo, which helps also catch when PostHog app version breaks data ingestion completely. We'll use this for automated release branch testing too.

The test will:

send some data & check it's there
try to upgrade the existing deployment and check that the data is still there
nuke the cluster & try a new install

Describe alternatives you've considered

continue to manually test things

Additional context

We already have https://github.com/PostHog/vpc/blob/main/client_values/posthog/values.yaml & https://github.com/PostHog/posthog/actions/workflows/vpc-deploy.yml

Thank you for your feature request – we love each and every one!

Set pod priorities to reduce probability of ingestion being down

Is your feature request related to a problem?

Currently all pods are equal, so if anything happens the events pod might get killed first, but there are pods we care less about e.g. beat/worker even plugins as if events and kafka are up we can just catch up ingesting events later, but if events pod is down we lose data(events).

Describe the solution you'd like

Consider adding a PodDisruptionBudget for all deployments. The PDB should be enabled if there are more then 1 replicas configured (replicascount or hpa.minpods > 1) for the component. The PDB makes the components more reliable during node maintenance, scaling and eviction.

See the official k8s documentation for more info.

Describe alternatives you've considered

Do nothing. The user must create and maintain the PDB by themselves.

Additional context

I'm not sure how pod priorities work once we have multiple replicas (likely well if we do autoscaling and poorly if we don't).

PDB can be enabled if there are more than 1 replica or with a configuration option.

Upgrade redis subchart

We're currently 6 major versions behind upstream chart https://github.com/bitnami/charts/tree/master/bitnami/redis. Before we make the chart public, it might be worthwhile to get up to latest.

Feature(s) which are worthwhile for us: turning off replicas for small installations.

Upgrading for existing users: It seems like deleting statefulsets before upgrading might be needed.

Kafka retention policy in the chart

Is your feature request related to a problem?

We currently hardcode the retention bytes value which means if someone changes the kafka size they should change that too, which is an easy thing to forget.

Describe the solution you'd like

Do some math to figure out a good value in the helm templates for it.

Describe alternatives you've considered

as is

Additional context

There are two configs we can set for kafka (both set minimum values and data can't be deleted beforehand):

time - kafka docs, note that the default for bitnami is 168h readme
bytes - kafka docs note that the default for bitnami is _1073741824, i.e. 1GB(?) Note that there are some overhead data probably too (likely not much)

Note that the retention check loop by default is ran every 5min retention check interval, we can change it to be more frequent, but probably don't need to.

Concrete proposal:

size to be max(1GB, 90% of the volume size), for very small instances we'll use 1GB minimum, otherwise 90% of the volume size. Alternative if we want to limit the buffer size in GBs we could do sth like: max(1GB, min(5GB, 90% of the volume size)) for very small volumes we'll use 1GB minimum, otherwise growing up to 5GB max depending on the volume size.
time to 1h, we really don't want the retention to be blocked based on time. Because the install gets into a bad state, where no events are ingested while the disk is full until it's manually upsized. On the other hand we can't expect the plugin server to instantly handle all kafka messages, so we can set some minimum value here, e.g. 1h. If the disk is so small that we can't fit 1h of data on it then failing hard here is likely a good way to get self-hosted folks to upsize, because if we set it to be really small (e.g. 5min) we might just constantly be in a state where events are dropped and not notice vs an hour delay on seeing events seems like something one would ask about. Also based on the retention check loop 5 min doesn't seem like a good value. The bitnami chart has 168h minimum, so we most likely want to override this to something smaller.

cc @fuziontech & @guidoiaquinti would love your input.

Please remove cert-manager CRDs from crds folder

Is your feature request related to a problem?

We already have cert-manager in our cluster and the crds are installed by the cert-manager helm chart. By adding cert-manager crds into this helm chart it attempts to install them again even if the values file disables the install of cert-manager. There is a --skip-crds flag in helm, but by using that the Clickhouse CRDs would not be installed. The cert-manager chart already bundles the crds so I'm not sure why you need them copied into this chart. Can they be removed?

We're currently new to Clickhouse so allowing it to be installed with Posthog, but potentially the same issue would arise if we installed Clickhouse via a separate chart too.

Describe the solution you'd like

Allow the cert-manager chart to install the cert-manager crds. Remove them from this chart.

Describe alternatives you've considered

Copying your chart locally, deleting the crd and installing from my own local copy

Additional context

None

Thank you for your feature request – we love each and every one!

Digital Ocean deployment fails when using current required resources and resources aren't enforced

Bug description

There are 3 problems:

Digital-Ocean doesn't enforce any of the restrictions for the deployment (even though we get asked that)
When the deploy script fails the user just sees a "something went wrong" unhelpful page asking them to check if digitalocean is up
If pods get allocated randomly we might not have the 1000Mi available in any node, which Migrations currently require

Details

Looks like we run out of resources, in the Kubernetes Dashboard I can see https://cloud.digitalocean.com/kubernetes/clusters/fc09727b-3758-460c-a2dd-3abe435e83d4/db/7cfa7c2e956b6e62d85e8b711afb1e798ba9a49e/#/overview?namespace=_all

What they need

> kubectl get pod posthog-migrate-4b5j6 --namespace posthog -o yaml | grep memory
        memory: 1000Mi
        memory: 1000Mi
    message: '0/2 nodes are available: 2 Insufficient memory.'

> kubectl get pod posthog-posthog-postgresql-0 --namespace posthog -o yaml | grep cpu
        cpu: 250m
    message: '0/2 nodes are available: 2 Insufficient cpu.'

What we have available

kubectl describe node | grep cpu
                    beta.kubernetes.io/instance-type=s-1vcpu-2gb
                    node.kubernetes.io/instance-type=s-1vcpu-2gb
  cpu:                1
  cpu:                900m
  cpu                852m (94%)   102m (11%)
                    beta.kubernetes.io/instance-type=s-1vcpu-2gb
                    node.kubernetes.io/instance-type=s-1vcpu-2gb
  cpu:                1
  cpu:                900m
  cpu                752m (83%)   102m (11%)

> kubectl describe node | grep memory
  MemoryPressure       False   Mon, 12 Jul 2021 23:48:48 +0200   Mon, 12 Jul 2021 23:18:16 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             2043324Ki
  memory:             1574Mi
  memory             971Mi (61%)  440Mi (27%)
  MemoryPressure       False   Mon, 12 Jul 2021 23:49:01 +0200   Mon, 12 Jul 2021 23:18:32 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             2043324Ki
  memory:             1574Mi
  memory             921Mi (58%)  100Mi (6%)

So for CPU we have 900m allocatable for each node & used 852m & 752m, which means we don't have the 250m anymore for postgressql.
And for memory we ask for 1Mi for migrations, but based on the way things got allocated we have ~600Mi free on both nodes.

> kubectl get pods --namespace posthog -o wide
NAME                                                READY   STATUS      RESTARTS   AGE    IP             NODE                   NOMINATED NODE   READINESS GATES
chi-posthog-posthog-0-0-0                           1/1     Running     0          48s    10.244.0.181   pool-d6lzgpvf9-8pwo0   <none>           <none>
clickhouse-operator-6b54d6b5fb-949qp                2/2     Running     0          72s    10.244.0.232   pool-d6lzgpvf9-8pwo0   <none>           <none>
posthog-beat-84d8c69887-vxzjr                       1/1     Running     0          72s    10.244.1.90    pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-cert-manager-69f4ff7b57-hgbmh               1/1     Running     0          73s    10.244.1.101   pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-cert-manager-cainjector-6d95d46d98-bfvbq    1/1     Running     0          73s    10.244.1.102   pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-cert-manager-webhook-6469c785fc-trqkh       1/1     Running     0          73s    10.244.1.99    pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-ingress-nginx-admission-patch-mzhlx         0/1     Completed   0          108s   10.244.1.86    pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-ingress-nginx-controller-648b4f45ff-pwggp   1/1     Running     0          73s    10.244.0.226   pool-d6lzgpvf9-8pwo0   <none>           <none>
posthog-migrate-4b5j6                               0/1     Pending     0          15m    <none>         <none>                 <none>           <none>
posthog-pgbouncer-58578887cc-xbqc9                  1/1     Running     0          73s    10.244.0.129   pool-d6lzgpvf9-8pwo0   <none>           <none>
posthog-posthog-kafka-0                             1/1     Running     1          51s    10.244.1.98    pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-posthog-postgresql-0                        0/1     Pending     0          72s    <none>         <none>                 <none>           <none>
posthog-posthog-redis-master-0                      1/1     Running     0          72s    10.244.1.120   pool-d6lzgpvf9-8pwod   <none>           <none>
posthog-posthog-redis-slave-0                       1/1     Running     0          72s    10.244.0.136   pool-d6lzgpvf9-8pwo0   <none>           <none>
posthog-posthog-redis-slave-1                       1/1     Running     0          41s    10.244.0.184   pool-d6lzgpvf9-8pwo0   <none>           <none>
posthog-zookeeper-0                                 1/1     Running     0          72s    10.244.1.35    pool-d6lzgpvf9-8pwod   <none>           <none>

Our current resource restrictions on DO say we need 2 nodes & in total 4cpu and 4GB of memory here: https://github.com/digitalocean/marketplace-kubernetes/blob/58905fac864b050acda1a86e5b38df8b0d41d67d/stacks/posthog/do_config.yml#L6

But I was able to create the 1-click deployment with 2vCPU total

In terms of resource needs:

> kubectl describe node
... 
  Namespace                   Name                                                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                 ------------  ----------  ---------------  -------------  ---
  kube-system                 cilium-4nkwf                                         300m (33%)    0 (0%)      300Mi (19%)      0 (0%)         30m
  kube-system                 csi-do-node-49p42                                    0 (0%)        0 (0%)      70Mi (4%)        0 (0%)         30m
  kube-system                 do-node-agent-88pl5                                  102m (11%)    102m (11%)  80Mi (5%)        100Mi (6%)     30m
  kube-system                 kube-proxy-h695x                                     0 (0%)        0 (0%)      125Mi (7%)       0 (0%)         30m
  posthog                     posthog-beat-6b887b9f8b-zt2dt                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         55s
  posthog                     posthog-cert-manager-69f4ff7b57-8n2tk                0 (0%)        0 (0%)      0 (0%)           0 (0%)         55s
  posthog                     posthog-cert-manager-cainjector-6d95d46d98-m6bdp     0 (0%)        0 (0%)      0 (0%)           0 (0%)         56s
  posthog                     posthog-cert-manager-webhook-6469c785fc-fbpgc        0 (0%)        0 (0%)      0 (0%)           0 (0%)         55s
  posthog                     posthog-ingress-nginx-controller-648b4f45ff-7t948    100m (11%)    0 (0%)      90Mi (5%)        0 (0%)         55s
  posthog                     posthog-posthog-kafka-0                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         54s
  posthog                     posthog-posthog-postgresql-0                         250m (27%)    0 (0%)      256Mi (16%)      0 (0%)         54s
  posthog                     posthog-posthog-redis-slave-1                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         23s
...
  kube-system                 cilium-6mn4v                            300m (33%)    0 (0%)      300Mi (19%)      0 (0%)         31m
  kube-system                 cilium-operator-84dcdcbc66-dg5qr        0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 cilium-operator-84dcdcbc66-lntb4        0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 coredns-85d9ccbb46-6sh47                100m (11%)    0 (0%)      70Mi (4%)        170Mi (10%)    34m
  kube-system                 coredns-85d9ccbb46-8gq5s                100m (11%)    0 (0%)      70Mi (4%)        170Mi (10%)    34m
  kube-system                 csi-do-node-w4b9s                       0 (0%)        0 (0%)      70Mi (4%)        0 (0%)         31m
  kube-system                 do-node-agent-5rq6r                     102m (11%)    102m (11%)  80Mi (5%)        100Mi (6%)     31m
  kube-system                 kube-proxy-hdcd9                        0 (0%)        0 (0%)      125Mi (7%)       0 (0%)         31m
  posthog                     clickhouse-operator-6b54d6b5fb-7q2g8    0 (0%)        0 (0%)      0 (0%)           0 (0%)         54s
  posthog                     posthog-pgbouncer-7b57f8b67b-fbbcx      0 (0%)        0 (0%)      0 (0%)           0 (0%)         53s
  posthog                     posthog-posthog-redis-master-0          0 (0%)        0 (0%)      0 (0%)           0 (0%)         53s
  posthog                     posthog-posthog-redis-slave-0           0 (0%)        0 (0%)      0 (0%)           0 (0%)         52s
  posthog                     posthog-zookeeper-0                     250m (27%)    0 (0%)      256Mi (16%)      0 (0%)         53s

when I wait long enough to (i.e. wait out for the migrations to time out and deploy script to fail I'll see this fun picture)

Tested trying to see if they enforce any restrictions we set - nope, I was able to use a single node 1vCPU & 2Gi ram total cluster

Expected behavior

1-click deployment doesn't let us create broken deployments & we see something better than a "something went wrong" page for trying to view the cluster.

How to reproduce

https://marketplace.digitalocean.com/apps/posthog-1 click Install App
Select New Cluster
Opt for 1 dev node
Look at the pods e.g. from kubernetes dashboard or kubectl & see not everything was able to allocate

Thank you for your bug report – we love squashing them!

HTTPS certificate error

Bug description

I deployed a new postog in AWS and prompted success through helm chart, but the HTTPS certificate was wrong in the end. The certificate prompt was "kubernetes progress controller fake certificate"

If helm install or upgrade failed include your values.yaml (without sensitive data) and output from kubectl describe nodes.

values.yaml

cloud: "aws"
ingress:
hostname: posthog...*
nginx:
enabled: true
letsencrypt:
enabled: true
certManager:
enabled: true

clickhouseOperator:
storage: 100Gi

postgresql:
persistence:
size: 50Gi

kafka:
persistence:
size: 20Gi
logRetentionBytes: _8_000_000_000

web:
replicacount: 3

worker:
replicacount: 3

plugins:
replicacount: 3

How to reproduce

Environment

helm-chart & EKS

Additional context

Thank you for your bug report – we love squashing them!

Add FAQ about log spam

Is your feature request related to a problem?

Describe the solution you'd like

just add a FAQ so we know to ignore them in the future

Describe alternatives you've considered

nothing

Additional context

Thank you for your feature request – we love each and every one!

Dynamic DigitalOcean versioning

Is your feature request related to a problem?

After digitalocean/marketplace-kubernetes#233 we're really close to being in do marketplace. However the version of the chart is hardcoded there. Let's automate upgrading

Describe the solution you'd like

Use the latest version of chart when deploying/upgrading in the scripts

Describe alternatives you've considered

Automate creating PRs against linked repo - but this will suck resources from the maintainers.

Additional context

cc @tiina303 you wanted to own this?

Thank you for your feature request – we love each and every one!

Don't install unnecessary `StorageClass`es

Is your feature request related to a problem?

As part of the clickhouse_instance.yaml template definition, if we are installing the chart on GCP or AWS, we add (by default) an unnecessary StorageClass definition.

kubectl get storageclass

NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  15h
gp2-resizable   kubernetes.io/aws-ebs   Retain          Immediate              true                   15h

This StorageClass is then only referenced by ClickHouseInstallation and not by other stateful components. I think we should remove this unnecessary complication especially as it created confusions in the past. A StorageClass is also a k8s global resource, we might hit naming collisions, etc…

Unfortunately this will be a breaking change and it will require a safe migration strategy like:

make sure helm runs cleanly
make sure the default StorageClass has VolumeExpansion set to true
stop the ClickHouse pod (make sure we don't destroy the underlying PVC/PV)
use something like pv-migrate (see https://github.com/utkuozdemir/pv-migrate) to migrate the data from the source PV to the destination PV.
delete the ClickHouseInstallation definition
manually patch the PersistentVolume definition to use the new volumes
run helm upgrade with the new config to create a new ClickHouseInstallation definition

Describe the solution you’d like

By default, PostHog shouldn't require any custom StorageClass.

Describe alternatives you’ve considered

Do nothing.

Resizing Kafka on all platforms

Is your feature request related to a problem?

AWS

User ran out of kafka space on AWS. We tried kubectl edit pvc data-posthog-posthog-kafka-0 -n posthog but this failed since the default storage class on AWS is not resizable.

Describe the solution you'd like

AWS

Default to a resizable StorageClass

Describe alternatives you've considered

Nuke kafka from orbit - can cause data loss

Additional context

Thank you for your feature request – we love each and every one!

Add chart version to instance status page

Is your feature request related to a problem?

Having the chart version there in addition to other metrics can help us detect faster that someone's using old version.

Describe the solution you'd like

instance/status page has chart version too

Describe alternatives you've considered

As is, no chart version there

Additional context

Make sure this works reasonably on cloud.

Thank you for your feature request – we love each and every one!

Improved support for helm releases with a different name

Proposed change

When deploying to a name other than posthog, e.g. my-posthog-deployment, we had to add the following config:

  kafka:
    externalZookeeper:
      servers:
        - "my-posthog-deployment-posthog-zookeeper:2181"

It would be nice if this was handled automatically.

Alternative options

Perhaps just some documentation would be helpful; it took a little while to debug this.

Additional context

Document namespace deletion problem

Is your feature request related to a problem?

https://posthogusers.slack.com/archives/C01PPBY3G6Q/p1635344876020400?thread_ts=1635342422.018000&cid=C01PPBY3G6Q

Describe the solution you'd like

Document it in our troubleshooting page, but instead of edit it would be better to have the patch command & we should check if anything else is laying around too.

while we're at it documenting the migrations job sticking around too.

posthog / charts-clickhouse Goto Github PK

charts-clickhouse's Introduction

⚠️ PostHog no longer supports Kubernetes deployments. ⚠️

What's next?

Using PostHog Cloud (Recommended)

Self-hosting PostHog

PostHog Helm Chart

Prerequisites

Installation

Changelog

Development

Testing

Lint tests

Unit tests

Integration tests

Running k3s for tests locally

Release

charts-clickhouse's People

Contributors

Stargazers

Watchers

Forkers

charts-clickhouse's Issues

Install failed

uninstall

Upgrade failed

TLS setup issues:

Something broke in prod:

Kafka:

Other:

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Thank you for your feature request – we love each and every one!

Bug description

Values file

Expected behavior

How to reproduce

Environment

Additional context

Thank you for your bug report – we love squashing them!

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Thank you for your feature request – we love each and every one!

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe alternatives you've considered

Additional context

Bug description

Environment

Bug description

Expected behavior

Actual behavior

How to reproduce

Bug description

Expected behavior

Environment

compute

Data transfer

Storage

Elastic IP (1)

Load Balancers

Bug description

helm command

values

Expected behavior

Environment

Additional context

Thank you for your bug report – we love squashing them!

Bug description

Expected behavior

How to reproduce

My chart

Values

Environment