Git Product home page Git Product logo

Comments (2)

BaohaoLiao avatar BaohaoLiao commented on May 21, 2024

I'm also interested in reproducing the fine-tuned results in the X-MoE paper. If you can release some related scripts and pre-trained models, that would be perfect.

from torchscale.

shumingma avatar shumingma commented on May 21, 2024

Hello there,

i'm interested in using the XMOE network and I have some questions regarding how to evaluate its performance on a validation set, how to save checkpoints, and how to resume training from a saved checkpoint.

Evaluation on Validation Set: Could you please provide some guidance on how to evaluate the XMOE network on a validation set? Also, I'm using the Distributed Data Parallel (DDP) mode, and I'm wondering whether I need to evaluate the XMOE network on all devices or only one device?

Saving Checkpoints: How can I save the XMOE model's checkpoints during training? What's the recommended way of doing this? Given that each GPU has its own experts and shared parameters, should I save all the parameters on each device or is there an API that can centralize the parameters and save them to avoid redundancy?

Resuming Training from a Saved Checkpoint: How can I resume training the XMOE model from a saved checkpoint? What's the recommended way of doing this? Is there any specific API or command I should use?

Thank you in advance for your help. I'm looking forward to using the XMOE network in my projects.

Evaluation:
You can check the code for more details regarding the evaluation/generation of MoE models. It should be feasible to evaluate on a single device as long as the GPU memory is enough.

Checkpoint:
Here is an example of saving and loading MoE checkpoints. The dense part and the expert parts are stored separately to avoid redundancy.

from torchscale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.