Git Product home page Git Product logo

Comments (5)

brjathu avatar brjathu commented on August 24, 2024

Hi, @JoyHuYY1412, just a average over all the tasks. No we havent used any normalizations.

See

ll = torch.stack(reptile_grads[i])

from itaml.

JoyHuYY1412 avatar JoyHuYY1412 commented on August 24, 2024

Hi, @JoyHuYY1412, just a average over all the tasks. No we havent used any normalizations.

See

ll = torch.stack(reptile_grads[i])

Thank you for your reply.
So if different tasks have parameters of different scales, e.g, A>>B, it seems the averaged network will be biased toward A. So when we try to recover the network for task B, we assume the memory samples of B can help fit? I don't know do I understand correctly.

from itaml.

brjathu avatar brjathu commented on August 24, 2024

Yes, that's one reason the classifier for a task is trained only in the inner loop. Also, to minimize the biased model we take a weighted average of the weights as we progress. Yes, finetuning with the memory samples helps to get a better model.

from itaml.

JoyHuYY1412 avatar JoyHuYY1412 commented on August 24, 2024

Yes, that's one reason the classifier for a task is trained only in the inner loop. Also, to minimize the biased model we take a weighted average of the weights as we progress.
Thank you so much. I have two more questions.

  1. I read the pseudo code (algorithm 1) in your paper, so after we update phi in line 14, does the theta used in line 7 for task 1 is initialized from the updated phi ? and then task 2 initialized its theta from task 1?
  2. If so, does this operation somehow relieve the imbalance between tasks? Since after each update in the outer loop, the backbone network is reset.

from itaml.

brjathu avatar brjathu commented on August 24, 2024
  1. No, in the inner loop theta for all tasks are initialized with last updated phi, and once we learned all thetas we combine them to get the new phi, which is later used to initialize thetas for the next batch.
  2. Yes, the outer loop meta updated tries to minimize the forgetting, while imbalance is minimized mostly because of the exponential averaging of the weights.

from itaml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.