Comments (6)
We are aware of this and are working to fix it. Multiple issues are open regarding this, so I will close this in favor of those.
from dvc.
This seems to be a long standing issue. Is there an ETA for a fix or improvement to the
dvc add
performance?
Unfortunately, we don't have any ETA to share at the moment.
from dvc.
Duplicate of #8008, #7607, and #3177.
from dvc.
I have a similar issue with a datasets of ~50GB which takes hours whenever I add few files. Hope this will be fixed soon
from dvc.
@RadouaneK, if you are only adding a few files, you can instead pass the filenames of the dataset that you are modifying.
Eg:
data
└── file1
Instead of doing dvc add data
, you can dvc add data/file1
. Only those files will be updated. Similarly, you can pass a subdir if you are modifying a subdirectory of the dataset (eg: dvc add data/dir/subdir_or_file
).
See https://dvc.org/doc/user-guide/data-management/modifying-large-datasets.
from dvc.
This seems to be a long standing issue. Is there an ETA for a fix or improvement to the dvc add
performance?
from dvc.
Related Issues (20)
- RPM missing dependency on RHEL 8 variants HOT 3
- `dvc checkout` : Checkout takes a huge amount of time despite using hardlink cache type and having multiple .dvc files for each data folder HOT 17
- dvc pull: does not pull out folder but says "everything is up to date", `dvc push` "pushes" them over and over again HOT 4
- dvc exp run: with import-db fails with `'NoneType' object has no attribute 'isabs'` HOT 1
- Warning/error when trying to push/pull outs with cache: false
- fix ssh fsspec: make put atomic HOT 6
- "Assume yes" flag for `dvc commit` HOT 1
- dvc==3.53.0 import fails with No such file or directory when cache.dir configured and cache.type symlink HOT 6
- dvc pull crashing on a FSx Lustre file system HOT 2
- `dvc repro -R <dir_1>` can run each `dir_1/**/dvc.yaml` from CWD
- Python CLI: `DeprecationWarning` on `dvc.repo.Repo` import HOT 4
- dvc update should consider "cache: false" setting of output in imported `.dvc` HOT 4
- Ability to track Docker images in Docker Hub or AWS ECR as artifacts HOT 5
- Keep temporary clones of import source repos HOT 4
- Dvc pull Crashes on Windows HOT 1
- `dvc diff` slow when there are many unique additions and deletions
- Unable connect dvc to Google Drive. Access blocked! HOT 8
- `dvc status`: add flag to ignore files excepted from cache. HOT 2
- Add `--allow-missing` for `dvc commit` HOT 13
- dvc pull/fetch: corrupted cache with GDrive HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dvc.