microsoft / hmnet Goto Github PK

Official Implementation of "A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining""

License: Other

Python 53.91% Mako 0.43% HTML 44.55% Dockerfile 1.11%

hmnet's Issues

How to solve cuda out of memory error?

I have encountered errors like this
" RuntimeError: CUDA out of memory. Tried to allocate 2.40 GiB (GPU 3; 15.78 GiB total capacity; 12.06 GiB already allocated; 2.39 GiB free; 212.27 MiB cached) "
when trying to fine tune the model on both the data sets.The same error occurs if I try to evaluate the model with the fine tuned weights downloaded from the link given in the repo. Can you specify the hardware specifications to reproduce this project?

Which version of spacy are you using?

tokenizer.convert_ids_to_tokens not generating special tokens with predefined position offset

HMNet/Models/Networks/MeetingNet_Transformer.py

Lines 50 to 58 in 1f5a24d

 self.tokenizer = self.tokenizer_class.from_pretrained(self.pretrained_tokenizer_path) 

 special_tokens_tuple_list = [("eos_token", 128), ("unk_token", 129), ("pad_token", 130), ("bos_token", 131)] 

 for special_token_name, special_token_id_offset in special_tokens_tuple_list: 

 if getattr(self.tokenizer, special_token_name) == None: 

 setattr(self.tokenizer, special_token_name, self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset)) 

 self.config[special_token_name] = self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset) 

 self.config[special_token_name+'_id'] = len(self.tokenizer)-special_token_id_offset

In this snippet of code, it set up a default special_token_name with offset. Then later, the special token (pad_token, bos_token are not exist in pretrained_tokenizer) need to be added into tokenizer. I tried to load pretrained tokenizer from transof-xl-wt103 under ExampleInitModel and generate tokens from ids base on the predefined offset.

tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset))

The returned tokens turn out to be specific words, not '<pad>' or '<bos>' tokens.

When the token_name is "pad_token" or "bos_token" with offset of "130", "131":
'The return: Islahul 267605,McShan 267604'

May I ask how did you setup the offset value of these special tokens? Is it normal that the 'transof-xl-wt103' doesn't need pad_token and bos_token or these special tokens actually should be set up somewhere else?

The order of token_attn and sent_attn in decoder is different between the code and the paper, in MeetingNet_Transformer.py

@xrc10
In the paper, src-tgt attention on sentences is after the src-tgt attention on tokens. However, in the code, the order is opposite.
At line 1000 in MeetingNet_Transformer.py,

def forward(self, y, token_enc_key, token_enc_value, sent_enc_key, sent_enc_value):
query, key, value = self.decoder_splitter(y)
# batch x len x n_state

    # self-attention
    a = self.attn(query, key, value, None, one_dir_visible=True)
    # batch x len x n_state

    n = self.ln_1(y + a) # residual

    if 'NO_HIERARCHY' in self.opt:
        q = y
        r = n
    else:
        # src-tgt attention on sentences
        q = self.sent_attn(n, sent_enc_key, sent_enc_value, None)
        r = self.ln_3(n + q) # residual
        # batch x len x n_state

    # src-tgt attention on tokens
    o = self.token_attn(r, token_enc_key, token_enc_value, None)
    p = self.ln_2(r + o) # residual
    # batch x len x n_state


    m = self.mlp(p)
    h = self.ln_4(p + m)
    return

I would like to confirm Is this intended code or not?

How to train models with mine own data sets？

Docker building, Tensor Size issues, may be related to package versions.

Hi, I've tried to build your docker container using the provided Dockerfile and it fails upon python -m spacy download en. It couldn't link to libcuda.so.1. To Fix I changed the dockerfile to link to the stub for compile time with:

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH && \
        ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
        python -m spacy download en

The build then works. The next issue comes in Models/Networks/MeetingNet_Transformer.py where
spacy.load('en', parser = False) fails because the parser keyword has been removed. I fixed by changing to
nlp = spacy.load('en_core_web_sm', exclude=['parser']). That also fixed the warning that en shortcut is deprecated.

The last thing I had to change to get things working was that the Language object from Spacy no longer has tagger and entity fields. I had to access the pipeline to add them as below.

tagger = [x[1] for x in nlp.pipeline if x[0] == 'tagger']
assert len(tagger) == 1
tagger = tagger[0]

entity = [x[1] for x in nlp.pipeline if x[0] == 'ner']
assert len(entity) == 1
entity = entity[0]

POS = {w: i for i, w in enumerate([''] + list(tagger.labels))}
ENT = {w: i for i, w in enumerate([''] + list(entity.move_names))}

Finally, after the code was able to execute the code ran into a tensor size issue with the linked finetuned ami model which can be seen below:

Error(s) in loading state_dict for MeetingNet_Transformer:
	size mismatch for encoder.pos_embed.weight: copying a param with shape torch.Size([51, 16]) from checkpoint, the shape in current model is torch.Size([50, 16])

I think this may be due to a spacy model change since the code was compiled against a different version.

Could you provide a requirements.txt with versions or tell me if I'm wrong and the tensor size error is unrelated to the spacy tags?

Thanks!

Problems while building docker

When I ran sudo docker build . -t hmnet，errors occurred in Step 10/35 : RUN apt-get update && apt-get install -y --allow-change-held-packages --no-install-recommends software-properties-common openssh-client openssh-server pdsh curl sudo net-tools vim iputils-ping wget perl libxml-parser-perl libcudnn7=${CUDNN_VERSION} libnccl2=${NCCL_VERSION} libnccl-dev=${NCCL_VERSION} --allow-downgrades:

E: Unable to locate package libcudnn7
E: Version '2.4.7-1+cuda10.0' for 'libnccl2' was not found
E: Version '2.4.7-1+cuda10.0' for 'libnccl-dev' was not found

It seems that in the docker apt can't find the package. Is it my fault somewhere? or the docker may exist some bugs?

Modules Versions are not specified

Hello. I am trying to run the experiments. Unfortunately, since the pip modules' versions are not specified in the Docker file, I am getting hierarchical errors. Could you please specify the versions in the Docker file? For example, your code is not compatible with latest Spacy version (3.0.50). I guess you should have used version 2.3.5 in you code.

How to build a new data set with the same format

Hi, I have successfully run through your code, and the effect is quite good. Is it convenient for you to open source pre training data? Or can you tell me how to get a POS_ ID and ENT_ ID? Thanks.

Cuda out of memory

Hello. I am trying to reproduce the paper results. I am currently running the code on 2 Tesla V100 GPUs each containing 16GB of memory, but still I am getting out-of-memory error. I also tried to decrease MAX_TRANSCRIPT_WORD to 1000, but it did not help. Could you please let me what hardware and GPU it requires to run?

Preprocessing my own data for inference

Could you please share the steps and the files to preprocess my own data for the model ?

cublas runtime error

I'm following the readme to try and Finetune HMNet on the AMI dataset. My only modification to the instructions is that I have only 1 visible device (my full command thus becomes CUDA_VISIBLE_DEVICES="0" mpirun -np 1 --allow-run-as-root python PyLearn.py train ExampleConf/conf_hmnet_AMI).

The process exits with an error.

Here's the full output.

{'MODEL': 'MeetingNet_Transformer', 'TASK': 'HMNet', 'CRITERION': 'MLECriterion', 'SEED': 1033, 'RESUME': True, 'MAX_NUM_EPOCHS': 20, 'SAVE_PER_UPDATE_NUM': 400, 'UPDATES_PER_EPOCH': 2000, 'OPTIMIZER': 'RAdam', 'NO_AUTO_LR_SCALING': True, 'START_LEARNING_RATE': 0.001, 'LR_SCHEDULER': 'LnrWrmpInvSqRtDcyScheduler', 'WARMUP_STEPS': 16000, 'WARMUP_INIT_LR': 0.0001, 'WARMUP_END_LR': 0.001, 'GRADIENT_ACCUMULATE_STEP': 20, 'GRAD_CLIPPING': 2, 'USE_REL_DATA_PATH': True, 'TRAIN_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/train_ami.json', 'DEV_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/valid_ami.json', 'TEST_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/test_ami.json', 'ROLE_DICT_FILE': '../ExampleRawData/meeting_summarization/role_dict_ext.json', 'MINI_BATCH': 1, 'MAX_PADDING_RATIO': 1, 'BATCH_READ_AHEAD': 10, 'DOC_SHUFFLE_BUF_SIZE': 10, 'SAMPLE_SHUFFLE_BUFFER_SIZE': 10, 'BATCH_SHUFFLE_BUFFER_SIZE': 10, 'MAX_TRANSCRIPT_WORD': 8300, 'MAX_SENT_LEN': 30, 'MAX_SENT_NUM': 300, 'DROPOUT': 0.1, 'VOCAB_DIM': 512, 'ROLE_SIZE': 32, 'ROLE_DIM': 16, 'POS_DIM': 16, 'ENT_DIM': 16, 'USE_ROLE': True, 'USE_POSENT': True, 'USE_BOS_TOKEN': True, 'USE_EOS_TOKEN': True, 'TRANSFORMER_EMBED_DROPOUT': 0.1, 'TRANSFORMER_RESIDUAL_DROPOUT': 0.1, 'TRANSFORMER_ATTENTION_DROPOUT': 0.1, 'TRANSFORMER_LAYER': 6, 'TRANSFORMER_HEAD': 8, 'TRANSFORMER_POS_DISCOUNT': 80, 'PRE_TOKENIZER': 'TransfoXLTokenizer', 'PRE_TOKENIZER_PATH': '../ExampleInitModel/transfo-xl-wt103', 'PYLEARN_MODEL': '../ExampleInitModel/HMNet-pretrained', 'EXTRA_IDS': 1000, 'BEAM_WIDTH': 6, 'MAX_GEN_LENGTH': 512, 'MIN_GEN_LENGTH': 320, 'EVAL_TOKENIZED': True, 'EVAL_LOWERCASE': True, 'NO_REPEAT_NGRAM_SIZE': 3, 'cuda': True, 'confFile': 'ExampleConf/conf_hmnet_AMI', 'datadir': 'ExampleConf', 'basename': 'conf_hmnet_AMI', 'command': 'train', 'conf_file': 'ExampleConf/conf_hmnet_AMI', 'cluster': 'local', 'dist_init_path': './tmp', 'fp16': False, 'fp16_opt_level': 'O1', 'no_cuda': False}
Using Cuda

Saving logs, model, checkpoint, and evaluation in ExampleConf/conf_hmnet_AMI_conf~/run_2
 1.2.0  is high
Number of GPUs is  1 
Effective batch size is increased from  1  to  1 
Gradient accumulation steps =  20 
Effective batch size =  20 
[9d66c296629d:03515] pml_ucx.c:285  Error: UCP worker does not support MPI_THREAD_MULTIPLE
Select command: train
train on rank 0
-----------------------------------------------
Initializing model...
Loading Tokenizer from ExampleConf/../ExampleInitModel/transfo-xl-wt103...
Using pad_token, but it is not set yet.
Using bos_token, but it is not set yet.
Use POS and ENT
USE_ROLE

Total trainable parameters: 204488240
Loaded data on rank 0.
Using custom optimizer: RAdam
Optimizer parameters: {'lr': 0.001}
Using custom lr scheduler: LnrWrmpInvSqRtDcyScheduler
Lr scheduler parameters: {'warmup_steps': 16000, 'warmup_init_lr': 0.0001, 'warmup_end_lr': 0.001}
Cannot find checkpoint path from conf_hmnet_AMI_resume_checkpoint.json.
Make sure ExampleConf/conf_hmnet_AMI_resume_checkpoint.json exists.
Continue without loading checkpoint
Epoch 0
Traceback (most recent call last):
  File "PyLearn.py", line 71, in <module>
    trainer.train()
  File "/root/HMNet/Models/Trainers/HMNetTrainer.py", line 273, in train
    self.update(batch)
  File "/root/HMNet/Models/Trainers/HMNetTrainer.py", line 358, in update
    loss = self.network(batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Trainers/HMNetTrainer.py", line 38, in forward
    output = self.model(batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/MeetingNet_Transformer.py", line 100, in forward
    outputs = self._forward(**batch)
  File "/root/HMNet/Models/Networks/MeetingNet_Transformer.py", line 125, in _forward
    token_encoder_outputs, sent_encoder_outputs = self.encoder(encoder_input_ids, encoder_input_roles, encoder_input_pos, encoder_input_ent)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/MeetingNet_Transformer.py", line 1130, in forward
    embedded = self.embedder(vocab_x.view(batch_size, -1))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/Transformer.py", line 387, in forward
    x_pos = self.pos_emb(torch.arange(x_len).type(torch.cuda.FloatTensor)) # len x n_state
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/Transformer.py", line 86, in forward
    sinusoid_inp = torch.ger(pos_seq, self.inv_freq)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:120
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[42501,1],0]
  Exit code:    1
--------------------------------------------------------------------------

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

microsoft / hmnet Goto Github PK

hmnet's Issues

How to solve cuda out of memory error?

Which version of spacy are you using?

tokenizer.convert_ids_to_tokens not generating special tokens with predefined position offset

The order of token_attn and sent_attn in decoder is different between the code and the paper, in MeetingNet_Transformer.py

How to train models with mine own data sets？

Docker building, Tensor Size issues, may be related to package versions.

Problems while building docker

Modules Versions are not specified

How to build a new data set with the same format

Cuda out of memory

Preprocessing my own data for inference

cublas runtime error

This repo is missing important files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self.tokenizer = self.tokenizer_class.from_pretrained(self.pretrained_tokenizer_path)
	special_tokens_tuple_list = [("eos_token", 128), ("unk_token", 129), ("pad_token", 130), ("bos_token", 131)]

	for special_token_name, special_token_id_offset in special_tokens_tuple_list:
	if getattr(self.tokenizer, special_token_name) == None:
	setattr(self.tokenizer, special_token_name, self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset))
	self.config[special_token_name] = self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset)
	self.config[special_token_name+'_id'] = len(self.tokenizer)-special_token_id_offset