Comments (9)
Hi @dns2utf8
Interesting I've never observed that the operator would delete the newest jobs first.
By default k8up keeps 6 finished jobs around. If you delete the pod it won't delete the job because the pod is a child of the job and deletion propagation is from top to bottom. So deleting the job will delete the pod, too.
Also each job is a child of a backup
object (kubectl get backups
). Also worth noting: if a backup run fails it will recreate the pods, too (I also think 6 times by default). So if the backup was failing and retries happened it could be that a newer pod actually belongs to an older backup job, giving the impression that it deleted the newer jobs first.
The relation ship between the objects is: schedule -> backup -> job -> pod. Deleting the parent will delete the child.
Can you please try again and observe the backup objects in your namespace? For example: watch kubectl get backups
without manually deleting finished jobs?
Regards
Simon
from k8up.
Hi Simon
Something is very strange. I don't get any backups:
$ kubectl get backups
No resources found.
I cleaned the remaining Jobs by hand an hour ago
$ kubectl get jobs --all-namespaces | rg backupjob
devops backupjob-1579018440 1/1 46s 11m
myapp-staging backupjob-1579017060 1/1 13s 34m
The pods still don't survive long enough:
$ kubectl get pods --all-namespaces | rg backupjob
$
The operator has the following logs during the same time:
2020/01/14 16:14:00 [INFO] scheduled-backup-schedule-devops-1579018440 for repo s3:http://s3.bucket.internal:9000/k8up is queued waiting for jobs [Prune Check] to finish
2020/01/14 16:14:00 [INFO] All blocking jobs on s3:http://s3.bucket.internal:9000/k8up for scheduled-backup-schedule-devops-1579018440 are now finished
2020/01/14 16:14:00 [INFO] New backup job received scheduled-backup-schedule-devops-1579018440 in namespace devops
2020/01/14 16:14:00 [INFO] Listing all PVCs with annotation appuio.ch/backup in namespace devops
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-minio doesn't have annotation, adding to list...
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-minio-old isn't RWX
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-postgresql annotation is false. Skipping
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-postgresql-old isn't RWX
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-prometheus-old isn't RWX
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-prometheus-server doesn't have annotation, adding to list...
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-redis doesn't have annotation, adding to list...
2020/01/14 16:14:00 [INFO] PVC devops/gitlab-redis-old isn't RWX
2020/01/14 16:14:00 [INFO] PVC devops/repo-data-gitlab-gitaly-0 doesn't have annotation, adding to list...
2020/01/14 16:14:00 [INFO] PVC devops/repo-data-gitlab-gitaly-0-old isn't RWX
2020/01/14 16:14:00 [INFO] devops/backupjob-1579018440 is running
2020/01/14 16:14:00 [INFO] devops/backupjob-1579018440 is running
2020/01/14 16:14:05 [INFO] devops/backupjob-1579018440 is running
2020/01/14 16:14:35 [INFO] devops/backupjob-1579018440 is running
2020/01/14 16:14:46 [INFO] devops/backupjob-1579018440 finished successfully
2020/01/14 16:14:46 [INFO] Cleaning up 11/21 jobs
2020/01/14 16:14:46 [INFO] Removing job scheduled-backup-schedule-devops-1578586440 limit reached
2020/01/14 16:14:46 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578586440
2020/01/14 16:14:46 [INFO] Removing job scheduled-backup-schedule-devops-1578615240 limit reached
2020/01/14 16:14:46 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578615240
2020/01/14 16:14:46 [INFO] Removing job scheduled-backup-schedule-devops-1578644040 limit reached
2020/01/14 16:14:46 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578644040
2020/01/14 16:14:46 [INFO] Removing job scheduled-backup-schedule-devops-1578658440 limit reached
2020/01/14 16:14:46 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578658440
2020/01/14 16:14:46 [INFO] Removing job scheduled-backup-schedule-devops-1578672840 limit reached
2020/01/14 16:14:46 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578672840
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578701640 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578701640
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578730440 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578730440
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578744840 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578744840
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578759240 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578759240
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578788040 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578788040
2020/01/14 16:14:47 [INFO] Removing job scheduled-backup-schedule-devops-1578932040 limit reached
2020/01/14 16:14:47 [INFO] Cleanup backup scheduled-backup-schedule-devops-1578932040
Hope this helps.
Best regards,
Stefan
from k8up.
Hi
Thanks for the details!
Something is very strange. I don't get any backups:
Backups are namespaced, so you'll need--all-namespaces
if you're in the wrong one.
Is the operator currently processing any schedules? If not it may be stuck, try restarting it. I think you're using an older release (the helm charts aren't yet updated with the newest releases yet) and experience some deadlocking on the operator. These issues should be fixed with newer releases. It's still in somewhat early development ;)
Your log snipped shows the operator deleting the backups with ascending timestamps, it's starting with the oldest one (lowest timestamp) in ascending order. According to the logs it's doing it the right way around.
from k8up.
Good morning
Backups are namespaced, so you'll need --all-namespaces if you're in the wrong one.
My bad, I ran it again just now. Sadly I do not get any k8up backups, velero only.
$ kubectl get backups --all-namespaces
NAMESPACE NAME AGE
velero all-namespaces-stateless-20191218123559 27d
...
velero all-namespaces-stateless-20200115003819 7h40m
velero all-namespaces-stateless-20200115063819 100m
velero bi-hourly-stateless-20200113081718 2d
velero bi-hourly-stateless-20200113101718 46h
velero bi-hourly-stateless-20200113121718 44h
...
velero bi-hourly-stateless-20200115081719 118s
Right now k8up tries to run archivejobs that fail with Error occurred: Bucket name contains invalid characters
.
Strange enough, the PVC (snapshot-test-restore-test-mfw) it tries to archive is from a different namespace ...
I am killing the operator pod now and see if it recovers.
Cheers,
Stefan
from k8up.
Hi
Can you show me the schedules you use for the backups/archives?
The archive job is per backup repository, not per namespace. The archive simply takes everything it finds in the repository and then dumps the latest snapshot to the other S3 bucket. So that's expected behaviour. I recommend to backup not more than a few namespaces to the same S3 bucket. Restic has some limitations with mutual exclusive operations on the same bucket.
Regards
Simon
from k8up.
Hi
This config is currenlty in two namespaces:
---
apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
name: schedule-tubee-staging
namespace: myapp-staging
#namespace: devops
spec:
backend:
s3:
endpoint: http://s3.bucket.internal:9000
bucket: k8up
accessKeyIDSecretRef:
name: backup-credentials
key: username
secretAccessKeySecretRef:
name: backup-credentials
key: password
repoPasswordSecretRef:
name: backup-repo
key: password
archive:
schedule: '14 4 * * *'
restoreMethod:
s3:
endpoint: http://s3.bucket.internal:9000/
bucket: k8up
accessKeyIDSecretRef:
name: backup-credentials
key: username
secretAccessKeySecretRef:
name: backup-credentials
key: password
backup:
schedule: '5 9,10,11,12,13,14,15,16,17,18 * * *'
keepJobs: 48
#promURL: https://prometheus-io-instance
check:
schedule: '20 5 * * 1-5'
#promURL: https://prometheus-io-instance
prune:
schedule: '14 5 * * 0'
retention:
keepLast: 25
keepDaily: 14
I am currently using the same bucket for everything and separate the jobs by time.
Do you think one Archive Job in the eg. k8up-operator
namespace would be the preferred setup?
Cheers,
Stefan
from k8up.
Hi @dns2utf8
I see that you use the same bucket for the backups and the archives. The archive function is intended to get some long term archival data. It works by doing actual restores into tar.gz files and uploading them to another bucket (for example with glacier), so it will use quite a lot of space and isn't really compatible with the Restic repository format. Do you really need that to be run on a daily basis? What do you want to accomplish?
Yes it would make sense to set the archive in only one namespace thus only running one, as this could potential lock the prune job that's running an hour later, because the archive could take a lot of time and is mutual exclusive to the prune job.
Best Regards
Simon
from k8up.
Hi Simon
The initial idea was to have a daily snapshot of all the important data for LTS or disaster recovery.
The hourly restic backups during the day in case somebody deleted something by accident.
I solved the problems by deleting all the prune and archive jobs for now.
Tomorrow, the new strategy will be discussed.
Best,
Stefan
from k8up.
Hi Stefan
Well in that case I'd solve it with appropriate retention rules for the prune instead of daily archives. I'd recommend monthly archives if that kind of data protection is needed for your use case.
If all other issues have been fixed we can close the ticket.
If you need advice on setting up schedules to avoid Restic's locking problems please open a new ticket. We have customers that use k8up for very large deployments with archival schedules, so we already worked out some quirks that come with that :)
Regards
Simon
from k8up.
Related Issues (20)
- Allow to attach dynamic tags to snapshots
- Env variable BACKUP_GLOBALRESTORES3ACCESKEYID contains typo
- Allow using custom CA certificates HOT 2
- Support container Security Context HOT 1
- Improve external Contributor Experience
- Archive is Not Created HOT 1
- Include K8s objects in backup
- Failed backup stated as succeeded HOT 2
- Longhorn & Wordpress - k8up changes directory permission HOT 2
- Create PVC Backups via CSI Snapshots or CSI Cloned Volumes
- GCS backend only allows short lived token
- concurrentRunsAllowed: false is not working
- Fix typo in `RestServerSpec`
- Refactoring e2e test with kubernetes-sigs/e2e-framework
- Installation of CRD fails HOT 3
- Helm chart support dual stuck clusters
- Add easy way to create an backup instance of a schedule
- Helm Chart 4.5.1 hasn't been released to repositories HOT 4
- Allow to set the priorityClassName of Backup pods
- make run doesn't work
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k8up.