Git Product home page Git Product logo

Comments (9)

jieguangzhou avatar jieguangzhou commented on May 23, 2024

Now we can set log_to_db=True, then the checkpoint will store to artifact store and update the adapter_id to metadata store. We can load the model after completing the training

from superduperdb.

jieguangzhou avatar jieguangzhou commented on May 23, 2024

We need to implement the S3 artifact store to support saving large files remotely later and support saving folders directly to artifact store.
Now we use zip to save the folder to artifact store, but it is not the best way to save the large model file (maybe >1G).

from superduperdb.

blythed avatar blythed commented on May 23, 2024

Idea to add "saving a directory" to Document/ artifact_store.

from superduperdb.

blythed avatar blythed commented on May 23, 2024

@jieguangzhou to add proposal for saving list of artifacts in model.

from superduperdb.

jieguangzhou avatar jieguangzhou commented on May 23, 2024

Original Artifact workflow

Input: x, x is defined as an artifact in _artifacts

  1. object.dict().encode()

    1. convert artifact to Encodable → Encodable(x)
    2. Encodable(x).encode(){'_content': xxx}
  2. Save to artifact store

    1. if r['_content']['leaf_type']==encodable, save r['_content']

      save bytes to artifact and delete bytes

  3. create or update message to metadata_store

Loading

  1. load info from metadata_store

  2. check _content in info and decode them and load artifact using _content

    load bytes and use datatype to decode it

    rename the key _content.bytes to _content.x

New Artifact workflow with saving a directory

New DataType instance: file

x is path or directory

encode(x): check x exist and return x

decode(x): check x exiet and return x

Saving

Input: x, x is defined as an artifact in _artifacts

  1. object.dict().encode()

    1. convert artifact to Encodable → Encodable(x)
    2. Encodable(x).encode(){'_content': xxx}
  2. Save to artifact store

    1. if r['_content']['leaf_type']== encodable , save r['_content']

      • if datatype is file datatype:
        _save_path (new function of arfifact_store , copy local file form local file system to artifact _store)
      • else:
        _save_bytes
  3. create or update message to metadata_store

Loading

  1. load info from metadata_store

  2. check _content in info and decode them and load artifact using _content

    • if datatype is file datatype :

    _load_path (new function of artifact_store, copy file from artifact_store to local file system) return new path

    • else:

    load_bytes

    use datatype to decode the output

Other

rename the key _content.bytes to _content.x

from superduperdb.

blythed avatar blythed commented on May 23, 2024

Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore?

What problem are we solving here?

from superduperdb.

jieguangzhou avatar jieguangzhou commented on May 23, 2024

Can we go one level higher. Am I correct in thinking you want to build a local cache/ copy of some content in the artifact store? My question is, is this necessary? Especially, for instance, if we are using FileSystemArtifactStore?

If we use FileSystemArtifactStore, we can use symlink or copy a new directory.

But if we only use symlink, the directory will not actually be saved in artifact_store.

For example: If I want to training a model on server1, and deploy the service on server2.

We need copy the whole directory of FileSystemArtifactStore to server2, But it doesn’t work when using symlink

I think all the artifacts need to save into ArtifactStore.

What problem are we solving here?

This solves the problem that artifact_store could only save bytes data before, but now it can support bytes and files/directory. Not all models and data should be saved in bytes format.

from superduperdb.

blythed avatar blythed commented on May 23, 2024

@jieguangzhou I agree with the general proposal.

What will the schema inside _content look like for file/ directory types?

Also, how will we synchronize directories to the artifact store? With, for instance, aws s3 that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on the gridfs files.

from superduperdb.

jieguangzhou avatar jieguangzhou commented on May 23, 2024

@jieguangzhou I agree with the general proposal.

What will the schema inside _content look like for file/ directory types?

the _content is same as before, just save the path to bytes

Also, how will we synchronize directories to the artifact store? With, for instance, aws s3 that will easy. But with MongoDB, there's no native support for directories, so you would need to create an additional field on the gridfs files.

I posted a quick implementation for this, please help to take a look. #1805 @blythed

from superduperdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.