Comments (5)
How would an option to keep the last n versions be an improvement in that automated scenario?
If we want to guarantee that we always have the last n versions of datafiles, no matter how old they are, then relying on date does not work.
Example: if the job runs once a day, and we would like to always have the last two versions, then using --date 2_days_ago
might fail. Considering this history:
day | status
--------------------------------
4 | successful, data files updated in commit a02
3 | job failed, no commit/updates
2 | job failed, no commit/updates
1 | successful, data files updated in commit a01
in day 4, running the dvc gc --date 2
would only keep the last version of data files (day 4) and remove the version of day 1. @dberenbaum
from dvc.
This is especially important if we would like to do this in an automated script on a regular interval, say every week (and hence we don't know about the history of commits to tune the command arguments).
If you want to do this on a regular interval, could you use dvc gc --date
?
from dvc.
If you want to do this on a regular interval, could you use
dvc gc --date
?
That's a good idea; however, suppose this is a regular update job which changes the data files and latest run fails (for whatever reason). Then, using --date
option, in the next run of this job probably only the last version of data files are kept and the older ones will be removed. @dberenbaum
from dvc.
How would an option to keep the last n versions be an improvement in that automated scenario?
from dvc.
Thanks for the explanation! We don't have any mechanism now in DVC to look for which commits make DVC-tracked changes, so we would need to implement that first. I think it also raises a lot of questions. Do you want the last two commits, or the last two versions of every DVC-tracked file? Do changes to dependencies (like the code files used to generate DVC outputs) count, or should it only be DVC-tracked outputs?
Have you tried to implement something similar yourself, like using git log -- '*.dvc'
to find commits where .dvc
files changed and passing those revisions to dvc gc
?
from dvc.
Related Issues (20)
- Does not work if the password contains special characters for webdavs use.
- `dvc queue start` checkout more files than required
- Pipeline is not executed for parameter with name `size` or `nfiles` HOT 3
- dvc exp run: Unnecessary "Collecting and computing hashes" after chaging cmd in dvc.yaml HOT 1
- pull: "Fetching" step takes forever HOT 3
- exp run: unnecessary hashing during experiments HOT 1
- pull specific_file.dvc: SCM-Error when dvc import without access in same repository HOT 13
- `dvc.api.dataset`
- data status returns files as "Not in remote" even though they are marked as push: false in pipeline
- `dvc.api.open`: broken with `no_scm` HOT 2
- experiments: submit to studio
- Failed to create a Pyinstaller app that includes dvc HOT 8
- Gett Error with pushing large file (data.csv ~ 280MB size) HOT 2
- DVC fails to read path to python.exe, if forward slashes "/ " used in windows HOT 6
- API for updating hash and size in dvc.lock file due to changes that'd have no effect on dvc DAG HOT 4
- Forward Agent usage HOT 3
- import: flag / parameter to skip the computation of the checksums HOT 1
- dvc fetch: Files downloaded from remote storage (AWS S3) to the DVC cache should have mtime restored HOT 7
- DVC Fetch stuck
- DVC Fetch stuck HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dvc.