Comments (4)
The usage example is at https://github.com/microsoft/unilm/blob/master/beit3/modeling_utils.py#L21
from torchscale.
Thank you for your prompt response, @donglixp.
From my understanding, is the MultiwayNetwork which is supposed to process the visual modality (V-FFN), the textual modality (L-FNN), or both (VL-FNN). It routes the information to the corresponding FFN (A, B, or both) according to the split_position
attribute.
What I can't understand is how/where you set split_position
(or in BeIT3, multiway_split_position
) to specifically pass multi-modal information to both A and B FNN only in the top three layers, while routing it to either A or B in the lower ones.
class MultiwayNetwork(nn.Module):
def __init__(self, module, dim=1):
super().__init__()
self.dim = dim
self.A = module
self.B = copy.deepcopy(module)
self.B.reset_parameters()
self.split_position = -1
def forward(self, x, **kwargs):
if self.split_position == -1:
return self.A(x, **kwargs)
if self.split_position == 0:
return self.B(x, **kwargs)
x1, x2 = torch.split(
x,
[self.split_position, x.size(self.dim) - self.split_position],
dim=self.dim,
)
# x1, x2 = x[:self.split_position], x[self.split_position:]
y1, y2 = self.A(x1, **kwargs), self.B(x2, **kwargs)
return torch.cat([y1, y2], dim=self.dim)
from torchscale.
Hi @andreapdr, for the Multiway Transformer implemented in torchscale, we remove the VL-FFN exports and use different attention and FFN parameters for vision and language. We perform VL fusion via concatenating Q/K/V of vision and language. Please refer to Table 16 in our Supp. For the VL-expert implementation, please refer to this code.
from torchscale.
Thank you @wenhui0924 for the clarification: I totally skipped over the supplementary material when reading the paper! Closing the issue now 😄
from torchscale.
Related Issues (20)
- How to use retention in RetNet for cross-attention?
- Question about learnable segment lengths and dilation rates
- can't use longvit
- Where is the offset implemented in Multi-head dilated attention ?
- pip error
- Query about Retentive Network's Recurrent Representation HOT 1
- Chunk recurrent representation incorrect results HOT 7
- typo in normalization denominator in parallel retention? HOT 1
- about gamma/decay in RetNet HOT 2
- Question about RetNetRelPos HOT 2
- Question about the normalization in attention HOT 2
- [Minor issue] Discrepancy inside arxiv paper
- Training RetNet on A100 GPUs HOT 1
- Question regarding the configuration of decoder_retention_heads HOT 2
- Introducing padding_mask to RetNet HOT 2
- Wrong Rnm Normalization. HOT 1
- about the longnet's ppl HOT 2
- about attention mask
- What WSI level was used for pretraining LongVit? HOT 1
- Checkpoint for RetNet
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchscale.