gdcc / dataverse-kubernetes Goto Github PK

Simple to use Dataverse container images and Kubernetes objects

License: Apache License 2.0

Dockerfile 34.20% Shell 65.80%

container containerization dataverse docker hacktoberfest k8s kubernetes kubernetes-cluster kubernetes-deployment kustomize

dataverse-kubernetes's People

Contributors

Stargazers

Watchers

dataverse-kubernetes's Issues

Test running on k3s and micro-k8s

As the title says...

Document env var handling for Java properties

Explain in README how environment variables are mapped to Java system properties in Glassfish and how this can be used for configuration.

Update to Dataverse 4.12

Upstream IQSS/dataverse has released v4.12 yesterday.

Update the docker images
Create job description to trigger inplace re-index (no Helm or Operator around yet)

Upgrade to Dataverse 4.15

Upstream released v4.15. New images needed 😄

Looks like we need a new Solr image, too, as the schema changed. Release notes should mention reindexing via job. Refer to upstream release docs about username issues.

Upgrade to Dataverse 4.13

Upstream released Dataverse v4.13 yesterday. New images needed... 😉

There are no special actions required after the upgrade according to release docs.

ConfigMap for Dataverse is not used by bootstrapping job

Change the K8s definition of the bootstrap job to dump ConfigMap dataverse into environment variables during bootstrapping.

Look into multi instance installations

Driving a Dataverse multi-instance installation has some edges and needs to be examined carefully. Obviously, scaling with Kubernetes is super-easy.

http://guides.dataverse.org/en/latest/installation/advanced.html#id1

glassfish/domains/domain1/docroot is not persistent (dataverse logos)

The dataverse logos files are stored in glassfish/domains/domain1/docroot/logos, but is not persistent.
If, for some reason, the dataverse pod is re-created, the logos are gone:

bash-4.2$ ls -la /opt/dataverse/appserver/glassfish/domains/domain1/docroot
total 24
drwxr-xr-x 1 glassfish glassfish 4096 Jun 11 07:28 .
drwxr-xr-x 1 glassfish glassfish 4096 Jun  6 13:20 ..
-rw-r--r-- 1 glassfish glassfish 4626 Aug 21  2014 index.html
drwxr-xr-x 4 glassfish glassfish 4096 Jun 11 07:28 logos

After re-creation of the dataverse pod:

bash-4.2$ ls -la /opt/dataverse/appserver/glassfish/domains/domain1/docroot
total 20
drwxr-xr-x 2 glassfish glassfish 4096 Aug 21  2014 .
drwxr-xr-x 1 glassfish glassfish 4096 Jun 11 09:12 ..
-rw-r--r-- 1 glassfish glassfish 4626 Aug 21  2014 index.html

Missing permissions on data directory

Data is persisted at mountpoint /data which is backed by a volume mount.

By default, this is owned by root in Kubernetes, thus we need an init container setting it to the dataverse user. Otherwise we cannot save files.

Create container image using Payara 5

This relates to IQSS/dataverse#5292, IQSS/dataverse#4172, IQSS/dataverse#4248 and IQSS/dataverse#2628.

Today payara/docker-payaraserver-full#89 has been merged and a payara/server-full:5.192 image has been released. The journey of upstream transition from Glassfish4 town to Payara5 promised land should be supported in adding an experimental Dockerfile running Dataverse on the upstream image.

Create S3 credentials file from K8s Secret

Until upstream issue IQSS/dataverse#5733 is resolved, a workaround is needed.

Add scheduled image builds for improved security

Images should be updated regularly when the base image (currently CentOS) is updated.

Linked builds on Docker Hub only work for non-official images (they might have a lack of resources...), so we need to add this ourselfs.

First idea: use Travis CI, they offer scheduled builds OOB.

Deployment to Minikube fails with weird errors

@donsizemore and @wilkos-dans experienced issues when deploying Dataverse 4.14 to Minikube.

No real or helpfull errors were logged (except the normal exceptions about Rserve password alias and code scanning), the DAS just went away.

I could reproduce this with default Minikube config CPU=2, MEM=2048 on KVM2 driver. This seems to be memory related, as my API server had been killed, too.

After using CPU=2, MEM=4096 I could successfully deploy, but now I experience issues with the bootstrapping script.

Dataverse is really memory hungry:

> kubectl top pod -A | sort -h -k4
NAMESPACE     NAME                               CPU(cores)   MEMORY(bytes)   
kube-system   kube-addon-manager-minikube        14m          5Mi             
kube-system   coredns-fb8b8dccf-pztgf            18m          8Mi             
kube-system   coredns-fb8b8dccf-wsx8f            16m          9Mi             
kube-system   kube-proxy-xhtbh                   5m           12Mi            
kube-system   kube-scheduler-minikube            8m           12Mi            
kube-system   storage-provisioner                1m           14Mi            
kube-system   heapster-64b4q                     1m           19Mi            
kube-system   influxdb-grafana-xcf5c             12m          40Mi            
kube-system   kube-controller-manager-minikube   95m          41Mi            
kube-system   etcd-minikube                      99m          44Mi            
default       postgresql-668dccf4cd-pzq5n        12m          51Mi            
kube-system   kube-apiserver-minikube            145m         186Mi           
default       solr-6ddd95c86d-plm8f              5m           227Mi           
default       dataverse-799cd744fd-sfmdl         16m          996Mi           

> kubectl top node                
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   805m         26%    2817Mi          77%

Update kubeval to v0.10.0

Currently using kubeval-0.7.3, we should be using v0.10.0 instead.

Use any commit or version of Dataverse in container and Kubernetes

Currently, the upstream project does not contain up-to-date Docker images nor does IQSS provide officiall support these installation adventures (which is why this project came to live in the first place).

With the current rise of attention for salvation of IQSS/dataverse#4172, a demand to try things on K8s is rising, too. In addendum, it should be possible to develop, run and test any commit or code change on containers (enhanced by K8s), locally or within CI.

A current solution to this is usage of IQSS/dataverse-ansible in combination with ec2-create-instance.sh. This is fine, but requires an Amazon EC2 account and payment. Also, DataverseEU is going to run things on K8s and the word goes it might be a bad idea to have too much differences between development and production (coming from the principles of CD), especially when talking about technology stack.

Thus, a workflow is needed, which preferably is adaptable to different scenarios and use cases.
To all readers: any feedback is welcome on this. Please join the discussion.
Pinging @pdurbin, @4tikhonov, @donsizemore, @pameyer here.

Ease deployments on different platforms

Inspired by a video conference with @wilkos-dans @4tikhonov and @pdurbin I thought about providing reusable terraform scripts for cross-plattform deployments.

This could include instructions to setup a K8s cluster and necessary services.

Solr is not available after deployment because of wrong permissions

When starting the Solr pod, it depends on the storage provider, if the mounted /data directory is writable by the Solr user.

On Minikube, this hasn't been an issue. Being on a real Kubernetes cluster, this leads to errors:

o.a.s.s.HttpSolrCall null:org.apache.solr.core.SolrCoreInitializationException: SolrCore 'collection1' is not available due to init failure: java.nio.file.AccessDeniedException: /opt/solr/server/solr/collection1/data/snapshot_metadata

Fix failing kubeval lint due to empty ConfigMap

The dataverse ConfigMap should not contain an empty data tag, as this violates the K8s schema.

Tedious JVM option names should be cleaned up upstream

The JVM options for configuring Dataverse basic system properties are non-uniform and contain chars like dashes, which are not allow for environment vars.

Relevant place to look how the environment cars are handled, see code here.

The upstream should be changed to have proper names with aliases for the current names (so no existing installation breaks).

Currently, the upstream contains no options with _ in them, but the dev docs should definitly point out that this is a reserved character and should not be used.

Add simple Apache Reverse Proxy with mod_auth_mellon

To easily test SAML authentication for Dataverse, a simple image, based on mod_auth_mellon should be created. That is because Shibboleth is much more complex than needed for this simple demo showcase.

It might even be relevant for simple production instances reusing a institution-only SAML IdP instead of a full grown federation like eduGain and the like.

Configuration of the Apache (and thus also the SP) should be easily done from a ConfigMap.
The image should be based on an official Apache image like https://hub.docker.com/_/httpd

Sample configuration should be done against https://samltest.id so it is usefull for a demo showcase.

Create a demo showing how to setup for S3 using Minio

As #28 has been resolved, a simple demo using Kustomize would be great, showing what to do when you want to connect your K8s deployed Dataverse to a S3 storage.

Define RAM limits/guarantees in K8s

The deployment descriptors should contain a reasonable limitations and guarantees for computing resources.

See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Bootstrapping and deployment does not stick to Postgres connection details

Currently, the bootstrap job fails if the database name is not corresponding to the username and the ConfigMap does not contain a database name, but the Postgres database needs different values for database and user. The deployment fails in this case, too.

For easier usage, simply use the Secret in an intuitive way:

key "database" with the database name to use
key "username" with the username to use
key "password" to be used with special handling from a secret volume mount

The keys database and username are mapped to POSTGRES_DATABASE and POSTGRES_USER environment variables, but marked as optional so the deployment does not fail if these keys are not present (one might prefer adding the values to the ConfigMap).

Create a demo for using DCM

Hello,
It will be a great help having a demo on dataverse with DCM (data capture module).
We can quickly test or do the demo on this functionality.

Thanks in advanced,

Thanh Thanh

Upgrade to Dataverse 4.14

Upstream released v4.14. New images needed 😄

Upstream release docs state no special treatments, so this should be straight forward.

How to run on Elastic Container Service for Kubernetes (Amazon EKS)

@whorka and I were just chatting about how I've used minikube to spin up dataverse-kubernetes but I don't know how to spin it up in the cloud. I have access to EKS. Here are some screenshots:

The problem is I don't know what I'm doing in this EKS interface. Can we add some tips to the README so I can try dataverse-kubernetes on EKS? I'm happy to start with either the EKS web interface or the command line. When I spin up Dataverse on EC2 I use our handy ec2-create-instance.sh script: http://guides.dataverse.org/en/4.11/developers/deployment.html

Fix documentation links pointing to Docker Hub and make links to docs usable on Docker Hub

The links used in the docs pointing to the Docker Hub repos are only usable when logged in as a user. Replace with https://hub.docker.com/r/iqss/dataverse-k8s and similar.
Thanks @vsoch for pointing this out in IQSS/dataverse#4665

When using the README on Docker Hub, links to parts of the docs are broken because they are relative and Docker does not copy everything. Maybe links in the README should be changed to use links to GitHub instead.

Maybe the images should receive their own READMEs anyway. Have a look at that, too.

Create jobs with generated names and add labels

When applying jobs for configuring or reindexing Dataverse, using a static name leads to conflicts, when the former job has not been deleted.

Changing the metadata to generate a unique name for us will solve that.

Cleaning up jobs is still a leftover to the K8s admin or tooling. This is made easier when adding proper metadata labels to all jobs with sth. like dataverse and/or dataverse-job.

validation errors during kubectl apply

opening by @poikilotherm's request. trying today with minikube and kubectl 1.14 on Ubuntu 18.04, i see:

error: error validating "k8s/utils/postgresql/kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

error: error validating "k8s/solr/kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

error: error validating "k8s/dataverse/kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

I did choose to ignore the errors, and am waiting on Dataverse to initialize.

Dataverse K8s `Deployment` is missing `Secret` mounts for DOI and others.

At least the DOI secret should be mounted.

Maybe make the error message more descriptive about what happens and how dangerous this is (it isn't).

Create a release schema

To make image releases independent from Dataverse releases, a new schema needs to be implemented.

It needs to carry a release info for the image and should contain the Dataverse release used inside the image.

The docker tags need to reflect this - tagged images are not updated by default in Kubernetes, this is only done for "latest" or untagged.

Update base image to CentOS 7

Currently, the base image in use is centos:7.5 instead of centos:7.

This is due to possible problems lurking beyond an update described in IQSS/dataverse#5374.
It might be totally unrelated and just caused by the preparation scripts, but better safe than sorry.

This issue tracks the progress and is used for a switch test. Let's see how this goes.

Make jHove find the XSD schema (missing installer bit)

The original Dataverse installer changes a hard coded path in jHove.conf to the XSD schema.

This was missed out when creating the minimal install scripts for the containers and leads to errors like those seen in IQSS/dataverse#2429.

Support for Prometheus and Grafana

dataverse-ansible supports Prometheus now that IQSS/dataverse-ansible#96 has been merged.

@pmauduit are following up with this by adding collectd and Grafana on top and are leaving comments here on the configs we are using and the commands we are running: IQSS/dataverse-ansible#99

As of this writing the following metrics are being tracked:

CPU load
Memory
Glassfish heap usage

Here's a screenshot of how it looks.

With dataverse-ansible we are continuing to install everything on a single CentOS 7 box but here in Kubernetes land perhaps it makes sense to run the monitoring service in a separate pod/container.

Script to apply database configuration options

Currently, only JVM settings are configured on container startup. During the bootstrap job, a few minimal settings are set, so Dataverse is at least reachable and secured (API access).

This however is not sufficient for any further configuration. There needs to be a simple script, taking values from a ConfigMap and set those in the running instance. This should be separate from the bootstrapping script and idempotent, as it might/will be used frequently.

Living in the container image makes it easier to use - it is no configuration, so it should not live in a ConfigMap itself. It needs secrets from Kubernetes like the API unblock key and using the env var containing the correct address makes it even easier to use.

Bootstrapping fails on Minikube

When trying to Bootstrap on Minikube using kubectl apply -f k8s/dataverse/jobs/bootstrap.yaml, the pqsl call fails with "wrong password".

Need to investigate if this is only happening when using Minikube, etc. This hasn't been touched in a while and is all of a sudden broken...

Create docs for upstream pointing to this project

@pdurbin invited on IRC to create a PR at IQSS/dataverse to provide an adventure (no dices included) in http://guides.dataverse.org/en/latest/installation/prep.html#choose-your-own-installation-adventure

We should point people to this project, which is still in flux, but worth a try.

This might be a solution to IQSS/dataverse#4665, too... ;-)

Add notes about monitoring resources

Some quick first points:

Heapster (notice about deprecation/abandoned by upstream)
Minikube Heapster addon + usage
metrics-server
kubectl top

Catch mails for demo or dev and expose via web UI

Using sufficient tools to catch mails from Dataverse without much configuration or installation hassles would be a benefit.

Deploy dataverseAdmin password from Secret

Just discovered IQSS/dataverse#4178 and https://github.com/IQSS/dataverse/blob/2f477c9e733bdffb9421d5e51a7ce2c49d4073d5/scripts/api/setup-all.sh#L15

The bootstrap shell script should be enhanced to provide this password from a secret or default to admin.

Add documentation on necessary K8s configuration batch job

No documentation has been added on this very important job. As it is needed to apply any database based options, this should be done in the very near future.

Add documentation on "how does it work"

Create an easy to follow diagram via https://g.gravizo.com to show people, how things are connected.

Restart Glassfish Container if deployment fails

In case the autodeploy fails (e. g. due to no connection to the database), the container should be restarted to try again.

Maybe a K8s healthcheck to kill and recreate?

UML diagram enhancements

Bootstrapping Job is indicating for what componentes it waits to be ready. This should be done for configuration job, too.
Dataverse and Solr deployment objects use an init-container to fix permissions on persistent volumes. Draw that...
End timeline of both Jobs when they are done, it might now be read as "they live on to eternity".
Enhance readability by adding vertical sections

Add link to Zalando AWS Kubernetes docs

In the AWS docs, a link to https://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html should be added as a best practice example for running production kubernetes clusters on AWS.

Move from Travis CI to Community Jenkins

The community uses https://jenkins.dataverse.org for different projects.
This project should move its CI testing from Travis to this service for sustainability.

Make BuiltinUsers.Key configurable from a K8S `Secret`

Relates to #16

Currently the only secret variable not covered by a Kubernetes Secret is the BuiltinUsers.Key database setting, which is needed to create local users from the API.

This should be enhanced.

Make K8s descriptors easier to reuse via splitting up by type

Currently, all necessary things for a deployment are bundled together (Deployment, ConfigMap, PersistentVolumeClaim).

This is not easy to reuse when deploying with different needs or types of infrastructure (e.g. think about different storageClass or storage size for PVCs).

Because of that, the descriptors should be split up. Add documentation to the README for usage with Minikube (and maybe others).

After the split, one can add this project as upstream (either via subtree, submodule or imports) and symlink to things not needing to change. Other parts can be simply replaced with custom descriptors.

Upgrade to Dataverse 4.15.1

Upstream released v4.15.1, containing important fixes. New images needed 😄

No special things recommended or mentioned in release notes, so the ones from v4.15 will still apply.

Create a K8s Job for dataverse-sample-data

For testing and demo purposes, it will be usefull to have some sample data around.
Currently, upstream created https://github.com/IQSS/dataverse-sample-data as a solution to IQSS/dataverse#5235.

This can be used in a simple Kubernetes Job to populate with sample data.

gdcc / dataverse-kubernetes Goto Github PK

dataverse-kubernetes's People

Contributors

Stargazers

Watchers

Forkers

dataverse-kubernetes's Issues

Recommend Projects

Recommend Topics

Recommend Org