Comments (9)
Now we can set log_to_db=True
, then the checkpoint will store to artifact store and update the adapter_id to metadata store. We can load the model after completing the training
from superduperdb.
We need to implement the S3 artifact store to support saving large files remotely later and support saving folders directly to artifact store.
Now we use zip to save the folder to artifact store, but it is not the best way to save the large model file (maybe >1G).
from superduperdb.
Idea to add "saving a directory" to Document
/ artifact_store
.
from superduperdb.
@jieguangzhou to add proposal for saving list of artifacts in model.
from superduperdb.
Original Artifact workflow
Input: x
, x
is defined as an artifact in _artifacts
-
object.dict().encode()
- convert artifact to Encodable →
Encodable(x)
Encodable(x).encode()
→{'_content': xxx}
- convert artifact to Encodable →
-
Save to artifact store
-
if
r['_content']['leaf_type']==encodable
, saver['_content']
save
bytes
to artifact and deletebytes
-
-
create or update message to
metadata_store
Loading
-
load
info
frommetadata_store
-
check
_content
in info and decode them and load artifact using_content
load
bytes
and use datatype to decode itrename the key
_content.bytes
to_content.x
New Artifact workflow with saving a directory
New DataType instance: file
x is path or directory
encode(x): check x exist and return x
decode(x): check x exiet and return x
Saving
Input: x
, x
is defined as an artifact in _artifacts
-
object.dict().encode()
- convert artifact to Encodable →
Encodable(x)
Encodable(x).encode()
→{'_content': xxx}
- convert artifact to Encodable →
-
Save to artifact store
-
if
r['_content']['leaf_type']== encodable
, saver['_content']
- if
datatype is file datatype
:
_save_path
(new function ofarfifact_store
, copy local file form local file system toartifact _store
) - else:
_save_bytes
- if
-
-
create or update message to
metadata_store
Loading
-
load
info
frommetadata_store
-
check
_content
in info and decode them and load artifact using_content
if datatype is file datatype
:
_load_path
(new function ofartifact_store
, copy file fromartifact_store
to local file system) return new path- else:
load_bytes
use datatype to decode the output
Other
rename the key _content.bytes
to _content.x
from superduperdb.
Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore
?
What problem are we solving here?
from superduperdb.
Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore?
If we use FileSystemArtifactStore, we can use symlink or copy a new directory.
But if we only use symlink, the directory will not actually be saved in artifact_store.
For example: If I want to training a model on server1, and deploy the service on server2.
We need copy the whole directory of FileSystemArtifactStore
to server2, But it doesn’t work when using symlink
I think all the artifacts need to save into ArtifactStore
.
What problem are we solving here?
This solves the problem that artifact_store
could only save bytes
data before, but now it can support bytes
and files/directory
. Not all models and data should be saved in bytes
format.
from superduperdb.
@jieguangzhou I agree with the general proposal.
What will the schema inside _content
look like for file/ directory types?
Also, how will we synchronize directories to the artifact store? With, for instance, aws s3
that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on the gridfs
files.
from superduperdb.
@jieguangzhou I agree with the general proposal.
What will the schema inside
_content
look like for file/ directory types?
the _content
is same as before, just save the path to bytes
Also, how will we synchronize directories to the artifact store? With, for instance,
aws s3
that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on thegridfs
files.
I posted a quick implementation for this, please help to take a look. #1805 @blythed
from superduperdb.
Related Issues (20)
- [FCT] Functional approach to graph models and listeners HOT 2
- [BUG] Copy vectors has a bug when document_embedded is false and outputs are stored in separate table HOT 1
- [BUG]: Variable inject for list values in a serialised component missing kwargs
- [MISC]: Vectors are not checked for shape in case of vector encoder used in VectorIndex
- [LORA] Fix fine-tuning with new `.fit_in_db(db=db)` pattern
- [REL-CLT] Release checklist 0.2.0
- [REL-CLT] Self-hosted LLM on vLLM and fine-tuning on `ray`
- [REL-CLT] Defining your own models and creating custom integrations with new API
- [REL-CLT] Listeners with CDC multiple combined models replacing MLOps
- [REL-CLT] Contributing to SuperDuperDB (how-to)
- [REL-CLT] Migrate blog to own repository
- [REL-CLT] Nicely structured use-cases with tabs for databackends, models and datatypes
- [REL-CLT] Clear onboarding protocol
- [REL-CLT] Rewrite `README.md`
- [REL-CLT] Check all code-snippets and Jupyter notebooks
- Model Service APP
- SuperDuperDB model ecosystem
- [DOCS0-2] Document all public classes, methods and functions
- According to the selection of different databases, the model automatically generates notebook examples.
- [DOCS0-2] Clean security issues of public keys in our docs and notebooks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superduperdb.