jefferyyu / tombert Goto Github PK
View Code? Open in Web Editor NEWDataset and codes for our IJCAI 2019 paper "Adapting BERT for Target-Oriented Multimodal Sentiment Classification"
Dataset and codes for our IJCAI 2019 paper "Adapting BERT for Target-Oriented Multimodal Sentiment Classification"
I have read your code but did not find the part of TomBERT(all-text). So this is not in the code?
Hi author,
I'm confused about the representations of "intra-modality dynamics including target-text and target-image alignments and inter-modality dynamics, i.e., text-image alignments" mentioned in the paper. In my opinion, intra-modality dynamics should refer to what happened between the same modality, such as text to text rather than target (text) -image, similarly inter-modality dynamics should refer to what happened between the different modality, such as text-image. So, is there something that I ignored?
Thank you for your trouble!
Hi authors, thank you for the well-written paper and detailed documentation of your work. I have a question regarding the multimodal attention mask.
Under TomBERT/my_bert/mm_modelling.py:
class MBertForMMSequenceClassification(PreTrainedBertModel):
"""BERT model for classification with text and image inputs, pooling-1+text (MBERT I)
"""
def __init__(self, config, num_labels=2, pooling="cls"):
super(MBertForMMSequenceClassification, self).__init__(config)
self.num_labels = num_labels
self.pooling = pooling
self.bert = BertModel(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.vismap2text = nn.Linear(2048, config.hidden_size)
#self.img_attention = BertLayer(config)
self.comb_attention = MultimodalEncoder(config)
if pooling == "cls":
self.text_pooler = BertText1Pooler(config)
self.classifier = nn.Linear(config.hidden_size, num_labels)
elif pooling == "first":
self.img_pooler = BertPooler(config)
self.classifier = nn.Linear(config.hidden_size, num_labels)
else:
self.text_pooler = BertText1Pooler(config)
self.img_pooler = BertPooler(config)
self.classifier = nn.Linear(config.hidden_size * 2, num_labels)
self.apply(self.init_bert_weights)
def forward(self, input_ids, s2_input_ids, visual_embeds_att, token_type_ids=None, s2_type_ids=None,
attention_mask=None, s2_mask=None, added_attention_mask=None, labels=None, copy_flag=False):
# Concatenate Bert-based Text, Text-Aware Image and Image-Aware Text
sequence_output, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
output_all_encoded_layers=False)
# apply entity-based attention mechanism to obtain different image representations
vis_embed_map = visual_embeds_att.view(-1, 2048, 49).permute(0, 2, 1) # self.batch_size, 49, 2048
vis_pooled_output, _ = vis_embed_map.max(1) # self.batch_size, 2048
converted_vis_embed_map = self.vismap2text(vis_pooled_output) # self.batch_size, hidden_dim
transpose_img_embed = converted_vis_embed_map.unsqueeze(1)
text_img_output = torch.cat((transpose_img_embed, sequence_output), dim=1)
comb_attention_mask = added_attention_mask[:, 48:] # only the first dimension is for image
extended_attention_mask = comb_attention_mask.unsqueeze(1).unsqueeze(2)
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
`
I am trying to understand the extended_attention_mask here.
Hope to hear from you, thanks so much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.