Git Product home page Git Product logo

Comments (4)

donglixp avatar donglixp commented on June 14, 2024 1

The usage example is at https://github.com/microsoft/unilm/blob/master/beit3/modeling_utils.py#L21

from torchscale.

andreapdr avatar andreapdr commented on June 14, 2024

Thank you for your prompt response, @donglixp.

From my understanding, is the MultiwayNetwork which is supposed to process the visual modality (V-FFN), the textual modality (L-FNN), or both (VL-FNN). It routes the information to the corresponding FFN (A, B, or both) according to the split_position attribute.

What I can't understand is how/where you set split_position (or in BeIT3, multiway_split_position) to specifically pass multi-modal information to both A and B FNN only in the top three layers, while routing it to either A or B in the lower ones.

class MultiwayNetwork(nn.Module):
    def __init__(self, module, dim=1):
        super().__init__()
        self.dim = dim
        self.A = module
        self.B = copy.deepcopy(module)
        self.B.reset_parameters()
        self.split_position = -1

    def forward(self, x, **kwargs):
        if self.split_position == -1:
            return self.A(x, **kwargs)
        if self.split_position == 0:
            return self.B(x, **kwargs)
        x1, x2 = torch.split(
            x,
            [self.split_position, x.size(self.dim) - self.split_position],
            dim=self.dim,
        )
        # x1, x2 = x[:self.split_position], x[self.split_position:]
        y1, y2 = self.A(x1, **kwargs), self.B(x2, **kwargs)
        return torch.cat([y1, y2], dim=self.dim)

from torchscale.

wenhui0924 avatar wenhui0924 commented on June 14, 2024

Hi @andreapdr, for the Multiway Transformer implemented in torchscale, we remove the VL-FFN exports and use different attention and FFN parameters for vision and language. We perform VL fusion via concatenating Q/K/V of vision and language. Please refer to Table 16 in our Supp. For the VL-expert implementation, please refer to this code.

from torchscale.

andreapdr avatar andreapdr commented on June 14, 2024

Thank you @wenhui0924 for the clarification: I totally skipped over the supplementary material when reading the paper! Closing the issue now 😄

from torchscale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.