haiku / infrastructure Goto Github PK

Haiku infrastructure as code

Home Page: https://hub.docker.com/r/haiku

License: MIT License

Shell 24.69% CSS 4.00% Makefile 3.39% Perl 5.68% HTML 2.78% Python 25.24% Ruby 4.51% Dockerfile 7.15% FreeMarker 22.35% HCL 0.21%

docker infrastructure haiku

infrastructure's Introduction

Infrastructure as code

Treating Haiku's infrastructure as cattle instead of kittens since 2017.

Directories

docs - Full documentation on Haiku's infrastructure
containers - Manifests to build and deploy containers
deployments - Kubernetes manifests for Haiku infrastructure
playground - Things that we're experimenting with. Not used in production.

Architecture

Quickstart

These are the path of least resistance for new admins to do "things"

💥 DANGER
Never run `kubectl delete` on persistent volume claims you need! Running `kubectl delete` / `kubectl delete -f` on things describing volume claims will result in Kubernetes dropping (deleting) persistant volumes. (AKA massive + rapid data loss)

Running through the Kubernetes Basics class is recommended!

Pre-requirements

Install kubectl
Export the Kubernetes configuration from Digital Ocean, import locally
- If this is your first kubernetes cluster, just put it at ~/.kube/config

Quick Commands

aka, i'm a sysadmin and a dog in a lab-coat

Check your configured cluster

List the known Kubernetes Clusters (contexts) of my client:

kubectl config get-contexts

Change the Kubernetes Cluster my local client focuses on:

kubectl config use-context (NAME)

List Deployments

Deployments are "how many desired sets of pods" within the cluster.

kubectl get deployments

Scaling Deployments

If you want something to "stop running for a while", this is the easiest and safest way. NEVER run kubectl delete if you don't know what you're doing.

kubectl scale --replicas=0 deployments/(NAME)

List Pods

Pods are one or more tightly related containers running in Kubernetes Deleting a Pod will result in the related deployment recreating it.

Entering a container

aka, equavient to docker exec -it (NAME) /bin/bash -l...

If the pod has one container:

kubectl exec -it pod/(NAME) -- /bin/bash -l

If the pod has multiple containers:

kubectl exec -it pod/(NAME) -c containername -- /bin/bash -l

Examining Stuff

kubectl describe pod/(NAME)
kubectl describe deployment/(NAME)

Initial Installation

Deploy ingress controller via instructions in deployments/ingress-controller/traefik
Deploy manifests in deployments for various services
Scale each deployment to 0 replicas
- kubectl scale --replicas=0 deployments/(BLAH)
Populate persistent volumes for each application
- see tools/migration_tools for some scripts to do this en-masse via rsync
Once ready for switchover, adjust DNS to new load balancer
Scale up applications
kubectl scale --replicas=1 deployments/(BLAH)

Rolling Restarts

To perform a rolling restart of each deployment replica:

kubectl rollout restart deployment/(NAME)

Example

-n kube-system is the namespace. We run Traefik in a seperate namespace since it's important.

Rolling restart of Traefik:

kubectl -n kube-system rollout restart daemonset/traefik-ingress-controller

Rolling Upgrade

Here we upgrade a container image from the command line. You can also update the matching yml document and run kubectl apply -f (thing).yml

Example

-n kube-system is the namespace. We run Traefik in a seperate namespace since it's important.

Rolling upgrade of Traefik:

kubectl -n kube-system set image daemonset/traefik-ingress-controller traefik-ingress-lb=docker.io/traefik:v2.6

Accessing Services / Pods

You can port-forward / tunnel from various points within the Kubernetes cluster to your local desktop. This is really useful for troubleshooting or understanding issues better.

Listen on localhost port 8888, to port 9999 within the pod

kubectl port-forward pod/(NAME) 8888:9999

Listen on localhost port 8080, to named port web of the service

kubectl port-forward service/(NAME) 8080:web

Pressing ctl+c will terminate the port-forwarding proxy

Importing data

Restoring volume / database backups: See deployments/other/restore.yml*

Manual database import: cat coolstuff.sql | kubectl exec -i deployment/postgres -- psql -U postgres

Forcing CronJobs

We leverage multiple jobs to perform various automatic activities within kubernetes. Some example jobs include postgresql backups to s3, persistent volume backups to s3, and syncing various git repositories.

Once and a while, you may want to force these jobs to run before performing maintenance, or for testing purposes.

pgbackup - PostgreSQL backup jobs
pvbackup - Persistent volume backup jobs

There are several example restore jobs in deployments/other. These can be manually edited and applied to restore data. It's highly recommended to review these CAREFULLY before use as a mistake could result in unattended data loss.

These restore jobs should be used on empty databases / persistent volumes only!

Listing CronJobs

$ kubectl get cronjobs
NAME                        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
discourse-pgbackup          0 0 * * 1,4   False     0        2d14h           6d13h
discourse-pvbackup          0 3 * * 3     False     0        11h             6d17h
gerrit-github-sync          0 * * * *     False     0        38m             13d
gerrit-pvbackup             0 1 * * 1,4   False     0        2d13h           8d
haikudepotserver-pgbackup   0 0 * * 1,4   False     0        2d14h           3d21h
.

Forcing a CronJob to run This is a great thing to do before any maintenance :-)

$ kubectl create job --from=cronjob/discourse-pgbackup discourse-pgbackup-manual-220316

Monitoring manual CronJob

$ kubectl get jobs
NAME                                 COMPLETIONS   DURATION   AGE
discourse-pgbackup-manual-220316     1/1           1m         1m

$ kubectl logs jobs/discourse-pgbackup-manual-220316
Backup discourse...
Backup complete!
Encryption complete!
Added `s3remote` successfully.
`/tmp/discourse_2022-03-14.sql.xz.gpg` -> `s3remote/haiku-backups/pg-discourse/discourse_2022-03-14.sql.xz.gpg`
Total: 0 B, Transferred: 136.45 MiB, Speed: 77.32 MiB/s
Snapshot of discourse completed successfully! (haiku-backups/pg-discourse/discourse_2022-03-14.sql.xz.gpg)

Secrets

For obvious reasons, 🔑 secrets are omitted from this repository.

infrastructure's People

Contributors

Stargazers

Watchers

Forkers

nielx waddlesplash pulkomandy korli mmlr mazbrili kapix msgpo tigro petr-akhlamov hrithikkumar49 bandar-a samuraicrow forza-tng gzdxyy

infrastructure's Issues

Implement automatic backups of Docker volumes

A simplistic backup script was created and placed in root's home directory on maui which when executed, grabs the contents of /var/lib/docker/volumes and compresses it into an archive in /var/backups with a standardized name.

[root@maui ~]# ls -la /var/backups
total 4479488
drwxr-xr-x.  2 root root      4096 Feb 22 00:35 .
drwxr-xr-x. 20 root root      4096 Sep 20 20:17 ..
-rw-r--r--.  1 root root 854837532 Sep 20 20:23 docker_volumes-1505931571.tar.xz
-rw-r--r--.  1 root root 930442172 Nov  4 21:09 docker_volumes-1509825854.tar.xz
-rw-r--r--.  1 root root 931729732 Nov 14 19:49 docker_volumes-1510685040.tar.xz
-rw-r--r--.  1 root root 930946048 Jan  4 16:33 docker_volumes-1515079678.tar.xz
-rw-r--r--.  1 root root 939003176 Feb 22 00:41 docker_volumes-1519256121.tar.xz

This needs to be better in a few ways:

automatic grooming of old backups.
automatic (weekly?) push to Hetzner secure backups space.
automatic run, manual trigger at the moment (hey, it's beta)

Fix pootle container build

Pootle no longer builds due to an outdated translate library which references git vs https in it's git submodule.

For now, i'm manually copying docker.io/haiku/pootle:2.7.6-1 over to ghcr, however we should fix the pootle container build.

https://github.com/nielx/translate/blob/master/.gitmodules#L3

Can't download commit-msg hook from Gerrit

/Data/haiku> scp -p -P 29418 [email protected]:hooks/commit-msg haiku/.git/hooks/
ssh: connect to host git.haiku-os.org port 29418: Connection refused

I'm guessing port 29418 is blocked.

Support for GitHub notifications is broken

Since a lot of time Haiku-infrastructure doesn't support GitHub notifications for developer branches anymore. Since creating branches is not allowed anymore due to gerrit, this needs to be fixed ASAP.

Status Page for Infrastructure

With all the different bits of web infrastructure all needing to be maintained/managed, it would be good if we had a status page so both users and admins can check the status of web infrastructure. Plus, for scheduled system maintenance, it would be good to have a location for sysadmins to issue maintenance notices.

Now, I did some research and there are two types of status pages:

"Static" status pages: generated using a static site generator like Hugo. These can be deployed using our existing Netlify plans and thus are relatively simple to set up. Mainly involves admins sending alerts for downtime, maintenance.
Options: https://github.com/cstate/cstate, https://marquez.co/statusfy, https://github.com/statsig-io/statuspage (with auto-monitoring),
"Dynamic" status pages: dynamic pages, some even have auto-monitoring functions. May be more complex to set up.
Options: https://upptime.js.org/, https://github.com/Monitorr/Monitorr, https://github.com/valeriansaliou/vigil

We'll have to determine which type of status page is best for us and whether we need any features like notifications through email, Matrix etc.

Rollout of kubernetes

As documented in #66 , we need to grow beyond docker swarm. We have everything in place to begin moving to a Digital Ocean managed Kubernetes cluster.

Old ingress:
ingress.haiku-os.org -> limerick -> Docker Swarm -> Traefik v1.7

New ingress:
ingress.ams3.haiku-os.org -> DO Load Balancer -> Traefik v2.6 -> (pods)

Preparation:

Complete Traefik Ingress Controller
Complete Traefik TCP LB config for git / rsync / etc
Create scheduled downtime handler
Test longhorn storage RWX
Create migration script for repeat rsync from limerick to k8s
~~Test longhorn storage RWX with 3-k8s node recycle~~ ❌ Not suitable #75
~~Test longhorn backup restoration~~ ❌ Not suitable #75
Create scheduled CronJob to backup volumes to secure S3 bucket.
Test image updates and pod upgrades with ReadWriteOnce storage (vs ReadWriteMany)

Tasks:

Migrate SMTP, Migrate SMTP pv - Migrated via duplication
Migrate Gerrit, cgit, pv data - Done. Running on k8s
Deploy Postgresql14 database + Migrate roles - Done
Migrate Trac postgresql database, migrate trac
Migrate Pootle, i18n (@nielx working on)
Migrate Concourse
Migrate Redis, Discourse (BIG)
Migrate Haikuports, haikuports buildmaster, hpkgbouncer (BIG because of complexity + data size)
Migrate misc little things and tune

Future:

Investigate some RWX storage solutions. RWO limits our rolling / zero downtime update ability
Investigate running Gerrit with a scale > 1. Has locking issues when more than one run on single node.

IPv6 Support

Digital Ocean load balancers don't support IPv6. (Confirmed via their support)

Please vote this one if you're interested: https://ideas.digitalocean.com/network/p/ipv6-for-load-balancers

As an alternative, Vultr (a competitor we evaluated) actually supports IPv6 on their load balancers. We went with Digital Ocean because of wider recognition, however with DO dragging on features like IPv6, maybe it's time to re-evaluate them?

Investigate oauth2 support in userguide

Haiku now has a central identity provider. Investigate attaching userguide to it via oauth2

Investigate oauth2 support in Trac.

Haiku now has a central identity provider. Investigate attaching Trac to it via oauth2

[Change Request] Deploy HaikuDepot version 1.0.151

Description

Update HaikuDepot to version 1.5.1

How has the change been tested

Dev-tested by the HaikuDepot dev.
Automated tests run by the Github action: Please link to the build of the image

Steps to implement the change

Note: Please mark changes from the default steps below in bold

Verify that the image is available in the package registry.

Start a job to backup the database:

$ kubectl create job --from=cronjob/haikudepotserver-pgbackup haikudepotserver-pgbackup-manual-1.0.151

Monitor the job to make sure it finishes correctly:

$ kubectl logs -f jobs/haikudepotserver-pgbackup-manual-1.0.151
Backup haikudepotserver...
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
Added `s3remote` successfully.
`/tmp/haikudepotserver_2023-08-06.sql.xz.gpg` -> `s3remote/haiku-backups/pg-haikudepotserver/haikudepotserver_2023-08-06.sql.xz.gpg`
Total: 0 B, Transferred: 245.32 MiB, Speed: 86.05 MiB/s
Snapshot of haikudepotserver completed successfully! (haiku-backups/pg-haikudepotserver/haikudepotserver_2023-08-06.sql.xz.gpg)

Apply any pre-deployment configuration changes (see section Configuration Changes)
Update the version in the infrastructure repository in deployments/haikudepotserver.yml.

Apply the update to the server:

$ kubectl apply -f deployments/haikudepotserver.yml

Apply any post-deployment configuration changes (see section Configuration Changes)
Post-deployment checks (is the web service responding, can you refresh the data using the HaikuDepot app)
Commit and push the updated deployment configuration to GitHub.
Announce the update on the haiku-sysadmin and haiku mailing list.

Configuration Changes

Please list any configuration changes, and note whether they need to be done pre-deploy or post-deploy

None

Rollback Plan

If the update is unsuccesful, try rolling back the image with the following commands:

$ git restore deployments/haikudepotserver.yml
$ kubectl apply -f deployments/haikudepotserver.yml

If the update applied database transformations, or the database go corrupted in any other way, please also restore the database to the backup crated as part of these update steps.

Investigate and monitor outage on Friday, June 9th.

We suffered an outage that @nielx resolved on June 9th, 2023. I feel like the RCA was a resource shortage since we deployed a the new Keycloak server a few days prior for SSO.

For now we're monitoring, however if it happens again we will likely need to grow our number of k8s nodes from 3 to 4 (incurring a ~$48 / month cost)

Investigate oauth2 support in Discourse

Haiku now has a central identity provider. Investigate attaching Discourse to it via oauth2

Grow beyond Docker Swarm

When we rolled our containerized infrastructure out in 2017, we chose Docker Swarm + Docker Compose due to its simplicity.

However, since then Kubernetes has grown by leaps and bounds to become the "standard" for > 50% of cloud organizations. We use Rexray in our Docker swarm to have "Kubernetes like managed persistent volumes"... however Rexray is essentially a dead project creating risk.

The BIG downside is cloud costs have been steadily increasing everywhere. A reasonable sized 3-node Kubernetes cluster for our workloads at Digital Ocean is around $144-$240 / month. K8S + DO Managed load balancers are likely extra as well. (today we spend ~$124.03 / month on one single big node and a small IPFS gateway node in Germany)

Pros:

More standardized stack
Limitless scaling. We could fire up a few additional K8S nodes before big releases like R1
Worker nodes can be rebooted with limited impact to running services
We gain 100% uptime upgrades (badly needed for things like software repositories... though it may not matter as much with IPFS?)

Cons:

Cost (more smaller nodes over one larger node)

Other solutions like Hashicorp Nomad have been suggested... but I don't know these solutions well enough to make an informed decision. AWS + GCP + Digital Ocean + Vultr all support managed K8S, and none of them support managed Nomad.

The cheaper solution would be going back to dedicated server instances or colocation running Kubernetes... but these all require manpower to manage (which we're short on).

tldr; Does the increase in price justify the improved ease of management?

Investigate internal oauth server

review.haiku-os.org is dependent on Github.

Since git access is critical to our infrastructure, we may want to investigate deploying our own oauth server at some point. We could begin to tie various services back to this common authentication gateway.

Github was an easy solution for the short term, but it makes us dependent on an external 3rd party service... are we ok with this?

concourse: re-runs builds each day even when there are no new commits on the git repos.

Not sure if this is the correct place to bring this up. I might be reading things the wrong way... and this is not actually an issue. If that's the case, I apologize in advance.

While nosing around https://ci.haiku-os.org/ and https://cgit.haiku-os.org/haiku/log/, I've noticed that... despite the last commit on the beta3 branch being from 2022-07-12, concourse seems to be creating new builds daily for that branch (both for 64 bits and 32 bits).

I admit that I might reading it wrong, but logs like for build 478 and older ones like build 445, to use any two examples... seem to be rebuilding things for the same git commit reference.

As this seems quite wasteful to my untrained eye... I thought it made sense to report it. Again, if this is just "working as expected"... please disregard my intrusion.

Buildbot: Nightly builds broken

As tracked here: buildbot/buildbot#4686 the nightly scheduler doesn't work if multiple codebases depend on nightly firing of builds.

Shift away from docker.io

Background:
www.osnews.com/story/135919/docker-is-deleting-open-source-organisations-what-you-need-to-know

Infrastructure

Infrastructure things to shift (in order of importance)

Build

Crane over build pipeline containers
Adjust build pipelines

[gerrit] Obtain commiter from git instead of ENV[USER] in hooks

The Perl hook here: https://github.com/haiku/infrastructure/blob/master/data/gerrit/hooks/receive-notify.pl needs to obtain the commiter from the last git commit instead of relying on ENV[USER] which doesn't make sense with gerrit. (no local users)

Buildbot is unbelievably slow after adding the 4 new workers

Running htop on Maui shows that it uses around 5-10% CPU for the main process, so it must be blocked on something else. Even the backend is slow: it was taking 30 seconds to start the next task after completing the prior one in some cases.

It looks like we still use SQLite for the database: https://github.com/haiku/infrastructure/blob/master/data/buildbot/master.cfg#L59

And per the documentation, it looks like quite a lot of stuff is stored in the database, and virtually all pages and build status stuff is stored and access from it: http://docs.buildbot.net/latest/developer/database.html

Indeed, they have some tickets about it: buildbot/buildbot#3002

It looks like they support using a Postgres database instead. We should do that ASAP.

Install Matomo for web analytics

https://github.com/matomo-org/docker

We should probably install it as https://metrics.haiku-os.org/matomo/, if we put it as analytics.haiku-os.org then some adblockers will block it for being called "analytics" (lol.)

It needs MySQL and serves via FastCGI, which is a bit of a pain. Maybe we should add a "generic nginx" container instead of spinning up nginxes for every individual thing that needs fastcgi? Or go vote for traefik/traefik#753.

Setup + deploy maui server

A running track of what needs to be done to complete baron -> maui transition
Setup

Purchase Maui from Hetzner
Configure sysadmins
Setup docker + tools + volumes
haiku/infrastructure git repo

Deploy git/cgit
aka http://git.haiku-os.org @ vmrepo
Converting from bare ssh git + cgit to Gerrit.

Develop and deploy gerrit + cgit containers
Persistent volume structure
Properly route to using nginx configurations
Github OAuth integration
Convert git hooks to something that Gerrit can execute
Test Gerrit workflow
Ensure all commiters can access new git service and have proper permissions
Advertise new git services to commiters

Deploy trac
aka https://dev.haiku-os.org @ vmdev

Deploy trac container
Properly route to using nginx configurations

Deploy ports-mirror server
aka https://ports-mirror.haiku-os.org @ baron

Develop + deploy ports-mirror container
Properly route to using nginx configurations

Deploy pootle/userguide server
aka https://i18n.haiku-os.org @ vmdev

Develop + deploy pootle + userguide containers
Properly route to using nginx configurations

Deploy buildbot
was aka https://buildbot.haiku-os.org @ baron
new aka https://build.haiku-os.org @ maui

deploy buildbot container
Properly route to using nginx configurations

Deploy haikudepot
aka https://depot.haiku-os.org @ vmrepo

Develop + deploy haikudepot container
Properly route to using nginx configurations

Deploy buildmaster
aka https://vmpkg.haiku-os.org @ vmpkg
Risk: Not sure how haikuporter buildmaster is going to work in a container

Develop + deploy buildbot container
Properly route to using nginx configurations

Deploy discourse
aka https://discuss.haiku-os.org @ vmsite
Do we want to move to discourse hosted version?

Clean up baron

Planning: Container deployment and grooming

We need to better define how containers are deployed and groomed.

Today we use docker-compose to manage our containers. We might want to look into other solutions.

Requirements:

Ensuring the proper containers are running
Ensuring the proper number of containers are running
Ensuring containers are started with the correct volumes
Ensuring containers are started with the correct ports + ips
End users should be able to deploy to docker installed on their desktops
Ensuring all requirements above are documented for each container.

cgit: Update to v1.2.4 Once Released

The version of cgit that we are running is now about 2-3 years old, and is internally making use of a git version that is, correspondingly, 2-3 years old. We are currently running cgit v1.2.1 while the latest is v1.2.3.

While we could upgrade now, I see that a new release may be just around a corner. It may be worth waiting for this update, as it would bring the internal git version from 2.25.1 (released more than a year ago), to 2.31.0 (the latest release).

Update to Trac 1.4

Trac 1.4 is due to come out at the end of the month, so it is a good moment to see where we are.

Check support for the modules that we use:

link-haiku-cgit.py
TracAccountManager ~~(version 0.5.dev0)~~ (updated to 0.5.1.dev0, will require Genshi)
TracMasterTickets ~~(4.0.0.dev0)~~ (updated to 4.0.2)
TracPoll (0.4.0.dev0) (does not depend on Genshi)
TracRobotsTxt (2.1)
TracSpamFilter (1.2.1.dev0) (needs an update on 1.3/1.4, see https://trac.edgewall.org/browser/plugins/trunk/spam-filter)
TracSubcomponents ~~(1.2.1)~~ (updated to 1.3.0)
TracVote (0.7.0.dev0) (should work, does not depend on Genshi)

After verifying the support of the modules (and preparing the upgraded versions), the steps would be the following:

Create a new image for Trac 1.4 (or 1.3.6 during testing)
Set up a temporary subdomain(dev-next)
Clone the docker volume
Clone the database
Set up the trac 1.4 image for dev-next
Set up the database connection string on the dev-next image
Update the plugins on dev-next
Run trac-admin upgrade on dev-next
Start the image
Test!

Investigate oauth2 support in Pootle

Haiku now has a central identity provider. Investigate attaching Pootle to it via oauth2

Allocate additional 30GiB data storage to each buildmaster

Enhancement for an additional 30 GiB storage to each buildmaster.

Buildmaster x86_gcc2
Buildmaster x86_64

NOTE: Enhances build capacity for 'build everything' packaging of Haikuports.
Enhancement for: https://dev.haiku-os.org/ticket/14996

Haiku Depot Server: error when selecting Categories filter.

Selecting 'Categories' within Web GUI filter toolbar gives:
Error: https://depot.haiku-os.org/__error?jrpcerrorcd=-32001

NOTE: No failures when testing other filter selections.

Finish i18n migration

Pootle: Have the build fix for musl reviewed (nielx)
Pootle: Set up repository push access for updated translations
Pootle: Check the synchronization script
Userguide: Check the functionality of the userguide tool
Userguide: Check 404
Pootle: Set up a weekly cron job to synchronize everything
Pootle/userguide: use pg_dump to periodically dump the db

Status page for infrastructure partly incorrect

https://status.haiku-os.org/status/haiku

Gerrit, Git, Trac are listed as available, although they redirect to the main website.
For instance, the content of the target website should be checked for a pattern.

Test longhorn fault tolerance

Our storage should be durable enough to withstand a rolling k8s node outage. This means longhorn should replicate data when a k8s node is cordoned off to other nodes in the cluster.

For longhorn to pass testing for Haiku Infrastructure, it needs to prove it can withstand a rolling outage of each node in the cluster.

Test parameters:

Add 20 GiB of data to longhorn
Recycle each node pool node in DO (with a standard cordon process)
Monitor that longhorn replicates data to other nodes (while stalling the node shutdown)
Perform this on each node to prove full redundancy is maintained

Pootle: import catkeys from buildbot

The current Pootle image is large (1.2 GB), because of all the build tools that are included in the image. In the previous setup the catkeys were generated by the build bots, and then imported into the system. I wish to return to this situation.

Solving this would mean to define a way to import the files.

Setup regular unit test runs on buildbot

I have started cleaning up our test suite. While there is still a significant amount of failures, it would be good to have testing setup already for both gcc2 and x86_64.

This will require Haiku hosts, ideally VMs spun up on demand since the tests can deadlock or KDL the system.

ports-mirror restart

The ports-mirror python script has been exiting (and causing the container to restart over and over) due to a bug.

Fetching origin
Fetching origin
Already up to date.
updating haikuports.git/master
2018-05-23T13:58:50
	cdrtools-3.02~a09.recipe => }.tar.bz2 (http://downloads.sf.net/cdrtools/cdrtools-${portVersion/\~/}.tar.bz2)
* unable to download }.tar.bz2 - curl gave resultcode 22
2018-05-23T13:58:51
	dragonmemory-1.recipe => DragonMemory-1-source.tgz (http://cznic.dl.sourceforge.net/project/dragonmemory/DragonMemory-source.tgz)
* unable to download DragonMemory-1-source.tgz - curl gave resultcode 6
2018-05-23T13:58:51
	dragonmemory-1.recipe => DragonMemory-1-source.tgz (http://heanet.dl.sourceforge.net/project/dragonmemory/DragonMemory-source.tgz)
* unable to download DragonMemory-1-source.tgz - curl gave resultcode 22
2018-05-23T13:58:51
	abe-1.1.recipe => abe-1.1.tar.gz (http://superb-dca3.dl.sourceforge.net/project/abe/abe/abe-1.1/abe-1.1.tar.gz)
* unable to download abe-1.1.tar.gz - curl gave resultcode 6
2018-05-23T13:58:51
	cd-5.8.recipe => cd-5.8_Sources.zip (http://heanet.dl.sourceforge.net/project/canvasdraw/5.8/Docs%20and%20Sources/cd-5.8_Sources.zip)
* unable to download cd-5.8_Sources.zip - curl gave resultcode 22
2018-05-23T13:58:51
	agg-2.5.recipe => agg-2.5.tar.gz (http://gnashdev.org/tools/ltib/agg-2.5.tar.gz)
* unable to download agg-2.5.tar.gz - curl gave resultcode 22
2018-05-23T13:58:52
	aiksaurus-1.2.1.recipe => aiksaurus-1.2.1.tar.gz (http://switch.dl.sourceforge.net/project/aiksaurus/aiksaurus/1.2.1/aiksaurus-1.2.1.tar.gz)
* unable to download aiksaurus-1.2.1.tar.gz - curl gave resultcode 6
2018-05-23T13:58:52
	libpaper-1.1.24.recipe => libpaper_1.1.24.tar.gz (http://ftp.de.debian.org/debian/pool/main/libp/libpaper/libpaper_1.1.24.tar.gz)
* unable to download libpaper_1.1.24.tar.gz - curl gave resultcode 22
2018-05-23T13:58:52
	xpdf-4.00.recipe => xpdf-4.00.tar.gz (http://www.xpdfreader.com/dl/xpdf-4.00.tar.gz)
* unable to download xpdf-4.00.tar.gz - curl gave resultcode 22
2018-05-23T13:58:52
	xpdf-3.04.recipe => xpdf-3.04.tar.gz (http://mirror.ctan.org/support/xpdf/xpdf-3.04.tar.gz)
* unable to download xpdf-3.04.tar.gz - curl gave resultcode 22
Traceback (most recent call last):
  File "/usr/local/bin/update-ports-mirror", line 137, in <module>
    updateFromCheckout(gitRepoDir)
  File "/usr/local/bin/update-ports-mirror", line 87, in updateFromCheckout
    os.mkdir(targetdir)
OSError: [Errno 2] No such file or directory: '/ports-mirror/srv-www/aobook/aobook-haiku-${portVersion/_'
Fetching origin

[cgit] TLS certificate expired

Hello,

the TLS certificate of https://cgit.haiku-os.org/ expired.
Now you'll get a warning in your browser before you're able to see the site.

Yours,

ngc4622

[Change Request] Deploy Discourse version 3.0.6

Description

Update Discourse to version 3.0.6

How has the change been tested

Dev-tested by the container developer.

Steps to implement the change

Note: Please mark changes from the default steps below in bold

Verify that the image is available in the package registry.
Make the installation read-only using the Enable read-only button on the Admin/Backups page.
Start a backup by using the Backup button on that page.
Update the version in the infrastructure repository in deployments/discourse.yml.

Apply the update to the server:

$ kubectl apply -f deployments/discourse.yml

Post-deployment checks (is the web service responding, is the site read-write again)
Commit and push the updated deployment configuration to GitHub.
Announce the update on the haiku-sysadmin and haiku mailing list.

Configuration Changes

Please list any configuration changes, and note whether they need to be done pre-deploy or post-deploy

None

Rollback Plan

If the update is unsuccesful, try rolling back the image with the following commands:

$ git restore deployments/haikudepotserver.yml
$ kubectl apply -f deployments/haikudepotserver.yml

If the update applied database transformations, or the database go corrupted in any other way, use Discourse's built in database restore features to return the data to the previously saved version.

cgit subdomain not redirecting to git subdomain following traefik switch

As in title.

Download RSS is broken

https://download.haiku-os.org/nightly-images/x86_gcc2_hybrid/rss/atom.xml

It's there ... but it's empty.

Investigate oauth2 support in HaikuDepotServer

Haiku now has a central identity provider. Investigate attaching HaikuDepotServer to it via oauth2

HDS: update Haikuports import

See: https://depot.haiku-os.org/#!/repository/haikuports?bcguid=bc614-WTUS
Last Import: 2019-09-04 13:18:29

refactor ports-mirror to use remote object storage

Ports-mirror rummages around in Haikuports and makes backups of sources for our packages. It's handy, however used a considerable amount of locally bound resources and storage.

https://github.com/haiku/infrastructure/tree/master/containers/ports-mirror

ports-mirror really needs refactored to use generic s3 buckets (aka Wasabi) for storage. This would enable us to archive large amounts of storage for cheap, while maintaining small and agile storage dependencies on our infrastructure

Ideas:

While archiving ports, store the date, time, source, sha256, recipe as metadata in s3
- sha256 in metadata is important so we can compare what the recipe expects, and what is stored in s3.
We should be able to point haikuporter directly at the s3 storage provider ideally to reduce bandwidth within our infrastructure.

No longer store nightlies on walter

Previously:

buildbot workers were modified to upload nightlies directly over s3 to walter.
generate-download-pages was modified to parse s3 buckets directly for nightly images

Given the two facts above, nothing is technically stopping us from "removing the nightlies from walter" and only hosting them on Wasabi. (we could keep the sha256sum's on walter though)

This would:

Reduce the space requirements of our core infrastructure from "1TiB or more" to "a few hundred GiB"
we could have a few core developers mirror the nightly repository as a long-term backup.

We are already mirroring the nightly images from walter to wasabi, and the cost is roughly $4.99 / month

buildbot.haiku-os.org certificate expired.

Buildbot's certificate appears to have expired:

NET::ERR_CERT_DATE_INVALID
Subject: buildbot.haiku-os.org
Issuer: Let's Encrypt Authority X3
Expires on: Apr 8, 2018
Current date: Apr 16, 2018

Move haiku repos over to object storage / cdn

The wasabi object storage has been a pretty reliable solution to host our repositories. We should consider hosting our repositories on object storage as well.

Limitations:

s3 doesn't support symlinks, "current" points to a versioned repo
We shouldn't release haiku with a repo configuration which 100% depends on an external vendor.

To solve this, it would be nice if we had a tool / application which could provide haiku users "HTTP 302" redirects to external object storage / s3.

Buildbot uploads R1/beta1 repositories to master S3 bucket

See last step of: https://build.haiku-os.org/buildbot/#/builders/28/builds/333

This was a R1/beta1 repo build, but it got shoved into the "master" S3 bucket.

HIG redirect incorrect

https://dev.haiku-os.org/ticket/14299#no3

Investigate large concourse database usage

Concourse is using 46 GiB in the atc database. This represents a big chunk of the space usage on limerick.

I enabled reaping of historical build logs over 200 builds in the past via 2be1503. No substantial reduction in db size was seen.

concourse/concourse#6892 was opened for upstream investigation.

Git hooks async due to gerrit bug

Due to https://bugs.chromium.org/p/gerrit/issues/detail?id=5514 , the git hooks are running as ref-updated vs ref-update.

ref-updated == fires properly on "push" and on "review accept"
ref-update == fires properly on "push", but doesn't seem to fire on "review accept"

Once the bug linked above is resolved, and our gerrit container is updated to the fixed version, we should investigate moving back to the more proper ref-update (it will require some minor modifications to ref-update since the flags differ)

[Change Request] Deploy HaikuDepot version 1.0.149

Description

Update HaikuDepot to version 1.0.149

How has the change been tested

Dev-tested by the HaikuDepot dev.
Automated tests run by the Github action

Steps to implement the change

Note: Please mark changes from the default steps below in bold

Verify that the image is available in the package registry.

Start a job to backup the database:

$ kubectl create job --from=cronjob/haikudepotserver-pgbackup haikudepotserver-pgbackup-manual-1.0.149

Monitor the job to make sure it finishes correctly:

$ kubectl logs -f jobs/haikudepotserver-pgbackup-manual-1.0.149
Backup haikudepotserver...
gpg: directory '/root/.gnupg' created
gpg: keybox '/root/.gnupg/pubring.kbx' created
Added `s3remote` successfully.
`/tmp/haikudepotserver_2023-08-06.sql.xz.gpg` -> `s3remote/haiku-backups/pg-haikudepotserver/haikudepotserver_2023-08-06.sql.xz.gpg`
Total: 0 B, Transferred: 245.32 MiB, Speed: 86.05 MiB/s
Snapshot of haikudepotserver completed successfully! (haiku-backups/pg-haikudepotserver/haikudepotserver_2023-08-06.sql.xz.gpg)

Apply any pre-deployment configuration changes (see section Configuration Changes)
Update the version in the infrastructure repository in deployments/haikudepotserver.yml.

Apply the update to the server:

$ kubectl apply -f deployments/haikudepotserver.yml

Apply any post-deployment configuration changes (see section Configuration Changes)
Post-deployment checks (is the web service responding, can you refresh the data using the HaikuDepot app)
Commit and push the updated deployment configuration to GitHub.
Announce the update on the haiku-sysadmin and haiku mailing list.

Configuration Changes

Please list any configuration changes, and note whether they need to be done pre-deploy or post-deploy

None

Rollback Plan

If the update is unsuccesful, try rolling back the image with the following commands:

$ git restore deployments/haikudepotserver.yml
$ kubectl apply -f deployments/haikudepotserver.yml

If the update applied database transformations, or the database go corrupted in any other way, please also restore the database to the backup crated as part of these update steps.

Offer rsync of haikuports

Haikuports lives in it's own world outside of Haiku's core infrastructure:

So, the rsync services use by everything else don't offer up haikuports (the thing we need mirrors of most)
Figure out a way to mount haikuports repos into the repo-sync container so haikuports can be mirrored like everything else.

Gerrit 2.16 EOL

According to the Release Plan of Gerrit 3.2, the 2.16 release is EOL. This means we need to start looking into updating Gerrit at some point.

The possible targets could be:

Gerrit 3.0
Gerrit 3.1
Gerrit 3.2

Initial to do list:

Review whether the target version works in WebPositive
Create a list of plugins and review whether they work in the target version
Review all repository-customizations (i.e. hook scripts) for possible incompatibilities
Create an update plan, including backup
Update!

haiku / infrastructure Goto Github PK

infrastructure's Introduction

Infrastructure as code

Directories

Architecture

Quickstart

Pre-requirements

Quick Commands

Check your configured cluster

List Deployments

Scaling Deployments

List Pods

Entering a container

Examining Stuff

Initial Installation

Rolling Restarts

Example

Rolling Upgrade

Example

Accessing Services / Pods

Importing data

Forcing CronJobs

Secrets

infrastructure's People

Contributors

Stargazers

Watchers

Forkers

infrastructure's Issues

Description

How has the change been tested

Steps to implement the change

Configuration Changes

Rollback Plan

Infrastructure

Build

Description

How has the change been tested

Steps to implement the change

Configuration Changes

Rollback Plan

Description

How has the change been tested

Steps to implement the change

Configuration Changes

Rollback Plan

Recommend Projects

Recommend Topics

Recommend Org