Git Product home page Git Product logo

e2e-coref-pytorch's Issues

CUDA out of memory

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:17:00.0 Off | N/A |
| 0% 52C P8 20W / 260W | 148MiB / 11016MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:A6:00.0 Off | N/A |
| 0% 45C P8 1W / 250W | 5MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3330 G /usr/lib/xorg/Xorg 96MiB |
| 0 N/A N/A 4039 G /usr/bin/gnome-shell 49MiB |
| 1 N/A N/A 3330 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
执行python train.py是出错,提示cuda内存不足,上面是我的显卡情况
cuda11+pytorch1.7.1
错误提示为:
Traceback (most recent call last):
File "train.py", line 171, in
train()
File "train.py", line 136, in train
loss.backward()
File "/home/dell/miniconda3/envs/nlp/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/dell/miniconda3/envs/nlp/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 652.00 MiB (GPU 0; 10.76 GiB total capacity; 8.17 GiB already allocated; 193.69 MiB free; 9.43 GiB reserved in total by PyTorch)

模型支持再训练

你好!

请问一下,这个代码支持模型再训练吗?比如,我在ontonotes数据上训练好模型,然后将该模型使用我自己的数据进行再训练。

祝好!

loss直接降为0的问题

我想请问博主,训练的时候经常出现loss直接降为0的情况正常吗?我查看了对应gold_scores,发现很多-inf

ERROR: could not find a version that satisfies the requirement torch==1.15.0

ERROR: Could not find a version that satisfies the requirement torch==1.15.0 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0)
ERROR: No matching distribution found for torch==1.15.0

When I install torch=1.15.0, it happened.
Could you confirm that torch's version is 1.15.0 or others?

数据预处理

哥,按照你的readme进行数据预处理之前,需要将*_skel转换成 *_conll 文件吗

train 发生错误

你好!
打扰一下。
实现代码时出错。

train.py

coref_model(sentences_ids, sentences_masks, sentences_valid_masks, speaker_ids, sentence_map, subtoken_map, genre, transformer_model)

当我运行这部分时,我收到以下错误:
TypeError: only integer tensors of a single element can be converted to an index

我很好奇如何解决这个问题。

”top_span_ratio“指的是什么

请问一下:model.py 第267行
m = min(int(self.config["top_span_ratio"] * len(tokens_embed)), spans_len) # 文本中最多的span数量
这个”top_span_ratio“代表什么意思,为什么是乘以len(tokens_embed),而不是len(spans_embed)

help

博主 您好!请问可以将中文是的数据发一份给我吗,我去官网下载遇到了一些困难,拜托了 [email protected]

英文效果不好

大佬你好,先赞一下代码,比原版tf看起来舒适多了👍。
然后我跑了你的代码,中文完美复现,2万步F1到0.64;
英文的话,只改了下文件路径和Bert模型位置,别的没改,直接用效果就很差,从4万步开始,一直快到6万步,F1最好效果还是只有0.55;
我看你在别的问题下回答可以改下数据处理部分,方便说下具体(可能)改哪些会部分的代码有用吗?谢谢!

训练停不下来

博主,您好,我用自己的数据在您的模型上训练,一直停不下来,已经训练二十多天了,请问博主这个正常吗?请问博主训练的时候用了多长时间呀?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.