Git Product home page Git Product logo

hvpnet's People

Contributors

flow3rdown avatar jinfish avatar njcx-ai avatar zxlzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hvpnet's Issues

关系数据集中object image的获取方案?

作者,您好!

作者提供的多模态关系数据集中,关于object image存在两个文件夹img_detect和image_vg,模型使用image_vg中的数据时效果更好。
img_vg和img_detect获取时方法上的区别是什么?为什么img_vg中的图片比img_detect的图片分辨率大很多,似乎超过原图?

非常感谢您提前提供帮助。 @

Question about get_visual_prompt

Great works, but I hava some questions.

  1. prompt_guids is 4 * [bsz, 256, 2, 2]. Shouldn't the dimension of torch.cat(prompt_guids, dim=1) is [bsz, 1024, 2, 2], but it's [bsz, 3840, 2, 2] in fact.
prompt_guids, aux_prompt_guids = self.image_model(images, aux_imgs)  # [bsz, 256, 2, 2], [bsz, 512, 2, 2]....
prompt_guids = torch.cat(prompt_guids, dim=1).view(bsz, self.args.prompt_len, -1)   # bsz, 4, 3840
  1. What does this code do? Is it just calculating the k,v values of superimposing four resnet blocks onto that self-attention layer?
for i in range(4):
     key_val = key_val + torch.einsum('bg,blh->blh', prompt_gate[:, i].view(-1, 1), split_prompt_guids[i])
  1. Why choose 64?
    key, value = key_val[0].reshape(bsz, 12, -1, 64).contiguous(), key_val[1].reshape(bsz, 12, -1, 64).contiguous() # bsz, 12, 4, 64

  2. If I want to switch to roberta plm, where I need to modiy?
    I met

IndexError: index out of range in self

File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\transformers\models\roberta\modeling_roberta.py", line 846, in forward
past_key_values_length=past_key_values_length,
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\transformers\models\roberta\modeling_roberta.py", line 132, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\sparse.py", line 160, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "E:\deeplearning\Anaconda3\envs\py36\lib\site-packages\torch\nn\functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

in bert_output = self.bert(input_ids=input_ids,

                        attention_mask=prompt_attention_mask,
                        token_type_ids=token_type_ids,
                        past_key_values=prompt_guids,
                        return_dict=True)

I guess the num_embeddings of bert is 30522, but roberta is 50265

What are the details of using the visual grouding tool?

Hi, I noticed in your README that you would first extract the nouns in the text and then perform visual grounding.

My understanding is to use the nouns in the text as a query and get the corresponding image area by visual grounding.

However, I have some questions about this: (1) if there is more than one noun, is each noun a separate query or all the nouns are one query; (2) in the relational extraction, I found that there are some data with the same text and the same image, only the head entity or the tail entity is different, but the image region is different for these data, according to what you have done in the README, these data have the same text and image, it should be able to get the same image area.

Anyway, I would like to know the exact details of the visual grounding you performed to solve my two doubts above, thanks a lot, your work is great.

Twitter15数据集修改部分实体类别

你好,对比论文中使用的twitter15数据集和其它论文(UMT、MAF...)所使用的数据集存在部分实体类别不一致情况,其中test.txt存在较多修改实体。在数据不一致情况下与其它模型实验结果进行比较是否合适,修改实体类别的意义是什么?

data preprocess

Hello, could I inquire about how you handle data preprocessing specifically? Is it possible to make the relevant code public?

Question about Twitter-2015

Hello,

I ran your code on the Twitter-2015 dataset and achieved an F1-score of 75.30, which closely aligns with the results reported in your paper. However, I later discovered that some labels in your dataset had been modified, and you mentioned that the paper's results are based on the original version. In an attempt to replicate the reported results, I replaced the 'train.txt,' 'valid.txt,' and 'test.txt' files with the original text files. This time, I only managed to achieve an F1-score of 73.27, which is clearly not an ideal outcome.

I've made several attempts to tune the parameters, but I'm still unable to match the results presented in the paper. Could you kindly provide me with some guidance or hints to help me reproduce your reported results?

Thank you very much.

Not putting parameters into optimizer but still trainable

In the modules/train.py file, lines 484 and 492 both add parameters to the optimizer, but the parameters for crf and fc are not added to the optimizer, but they can still be trained. As far as I know, if the model parameters are not added to the optimizer it cannot be trained, do you know why?

How to test the RE models

Hi,

I want to test the RE model which you provided. However, I cannot find the correct command to run testing. I read the run.py and it seems that the file cannot run testing even if the --do_test is activated. Could you provide instructions for testing?

Request for the object image data

Thanks for your work! I would like to re-train your model, and could you please upload your originally obtained object files (such as img_vg)? Thank you!

关于论文中对比实验的疑问?

您好,我想问一下,关于论文中对比实验的MEGA,该文章《Multimodal Relation Extraction with Efficient Graph Alignment》的原文只有多模态关系抽取的实验结果,没有关于MNER的实验数据结果,而且原项目也没有提供关于MNER的数据集处理之后的数据结果(比如imgSG,rel_1等数据),请问您是如何修改代码得到结果的呢?可以提供一下MNER数据集处理之后的结果(比如imgSG,rel_1等数据)吗?

RuntimeError: mat1 dim 1 must match mat2 dim 0

I have an issue when training,

RuntimeError                              Traceback (most recent call last)
[<ipython-input-51-25f39c1f99c1>](https://localhost:8080/#) in <module>
    201 
    202 if __name__ == "__main__":
--> 203     main()

10 frames
[<ipython-input-51-25f39c1f99c1>](https://localhost:8080/#) in main()
    187     if args.do_train:
    188         # train
--> 189         trainer.train()
    190         # test best model
    191         args.load_path = os.path.join(args.save_path, 'best_model.pth')

[<ipython-input-50-1e697a6369dc>](https://localhost:8080/#) in train(self)
    278                     self.step += 1
    279                     batch = (tup.to(self.args.device)  if isinstance(tup, torch.Tensor) else tup for tup in batch)
--> 280                     attention_mask, labels, logits, loss = self._step(batch, mode="train")
    281                     avg_loss += loss.detach().cpu().item()
    282 

[<ipython-input-50-1e697a6369dc>](https://localhost:8080/#) in _step(self, batch, mode)
    459             images, aux_imgs = None, None
    460             input_ids, token_type_ids, attention_mask, labels = batch
--> 461         output = self.model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels=labels, images=images, aux_imgs=aux_imgs)
    462         logits, loss = output.logits, output.loss
    463         return attention_mask, labels, logits, loss

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[<ipython-input-48-116255bafcd3>](https://localhost:8080/#) in forward(self, input_ids, attention_mask, token_type_ids, labels, images, aux_imgs)
    191     def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, labels=None, images=None, aux_imgs=None):
    192         if self.args.use_prompt:
--> 193             prompt_guids = self.get_visual_prompt(images, aux_imgs)
    194             prompt_guids_length = prompt_guids[0][0].shape[2]
    195             # attention_mask: bsz, seq_len

[<ipython-input-48-116255bafcd3>](https://localhost:8080/#) in get_visual_prompt(self, images, aux_imgs)
    227         aux_prompt_guids = [torch.cat(aux_prompt_guid, dim=1).view(bsz, self.args.prompt_len, -1) for aux_prompt_guid in aux_prompt_guids]  # 3 x [bsz, 4, 3840]
    228 
--> 229         prompt_guids = self.encoder_conv(prompt_guids)  # bsz, 4, 4*2*768
    230         aux_prompt_guids = [self.encoder_conv(aux_prompt_guid) for aux_prompt_guid in aux_prompt_guids] # 3 x [bsz, 4, 4*2*768]
    231         split_prompt_guids = prompt_guids.split(768*2, dim=-1)   # 4 x [bsz, 4, 768*2]

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py](https://localhost:8080/#) in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py](https://localhost:8080/#) in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

[/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: mat1 dim 1 must match mat2 dim 0

Full error is here. what's wrong in here? I just run this repository except any changes.

question

Hello, I did the training and testing exactly as README did, but it didn't matter what I predicted. The tensor output was also Nan. According to the debugging, it is Bert. In PY, "Prompt = [] for name, layer in self.resnet.named () : if name = = 'FC' or name = = 'avgpool' : continue x = Layer (x) # (BSZ, 256,56,56)" causes X to all be 0. Do not know how to change to normal training and testing, request guidance. Here are the predictions I printed out after my test. The two white lines above are the correct answers, and below are the predictions. You can see that they are all zeros, corresponding to the dictionary“No relationship between entities.”.
f3bc339e4986f7d42f19f3a150764a2

RE数据集下载太慢

您好,用该命令wget 120.27.214.45/Data/re/multimodal/data.tar.gz下载数据很慢,请问是否可以把数据上传到谷歌云盘供下载

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.