This repo is missing important files

HMNet

This is the official code for the Microsoft's paper of HMNet model at EMNLP 2020. It is implemented under PyTorch framework. The related paper to cite is:

@Article{zhu2020a,
author = {Zhu, Chenguang and Xu, Ruochen and Zeng, Michael and Huang, Xuedong},
title = {A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining},
year = {2020},
month = {November},
url = {https://www.microsoft.com/en-us/research/publication/end-to-end-abstractive-summarization-for-meetings/},
journal = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
}

Finetune HMNet

It is recommended to run our model inside a docker:

Build docker image

cd Docker
sudo docker build . -t hmnet

Run container from image

sudo nvidia-docker run -it hmnet /bin/bash

Get the pretrained HMNet ready at ExampleInitModel/HMNet-pretrained. Please see document.

Finetune on AMI dataset

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py train ExampleConf/conf_hmnet_AMI

The training log/model/settings could be found at ExampleConf/conf_hmnet_AMI_conf~/run_1

Data paths

ExampleRawData/meeting_summarization/AMI_proprec: The preprocessed AMI dataset. The *.json files point to the path to each split. Each folder (train, dev or test) contains the compressed chunks of data in the format for infinibatch.
ExampleRawData/meeting_summarization/ICSI_proprec: Same as above for ICSI dataset.
ExampleInitModel/transfo-xl-wt103: Here we only used the vocabulary from Transformer-XL, provided by Huggingface.

Evaluation

Step 1: specify the model path

In ExampleConf/conf_eval_hmnet_AMI, for the line

PYLEARN_MODEL ###

Replace ### to the real checkpoint path. Use the relative path w.r.t the location of this configuration file.

Step 2: run the evaluate pipeline

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py evaluate ExampleConf/conf_eval_hmnet_AMI

The decoding results could be found at ExampleConf/conf_eval_hmnet_AMI_conf~/run_1

Microsoft Open Source Code of Conduct

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations.

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's Microsoft's definition of a security vulnerability, please report it to us as described below.

Reporting Security Issues

Please do not report security vulnerabilities through public GitHub issues.

Instead, please report them to the Microsoft Security Response Center (MSRC) at https://msrc.microsoft.com/create-report.

If you prefer to submit without logging in, send email to [email protected]. If possible, encrypt your message with our PGP key; please download it from the the Microsoft Security Response Center PGP Key page.

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at microsoft.com/msrc.

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
Full paths of source file(s) related to the manifestation of the issue
The location of the affected source code (tag/branch/commit or direct URL)
Any special configuration required to reproduce the issue
Step-by-step instructions to reproduce the issue
Proof-of-concept or exploit code (if possible)
Impact of the issue, including how an attacker might exploit the issue

This information will help us triage your report more quickly.

If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our Microsoft Bug Bounty Program page for more details about our active programs.

Preferred Languages

We prefer all communications to be in English.

Policy

Microsoft follows the principle of Coordinated Vulnerability Disclosure.

	self.tokenizer = self.tokenizer_class.from_pretrained(self.pretrained_tokenizer_path)
	special_tokens_tuple_list = [("eos_token", 128), ("unk_token", 129), ("pad_token", 130), ("bos_token", 131)]

	for special_token_name, special_token_id_offset in special_tokens_tuple_list:
	if getattr(self.tokenizer, special_token_name) == None:
	setattr(self.tokenizer, special_token_name, self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset))
	self.config[special_token_name] = self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset)
	self.config[special_token_name+'_id'] = len(self.tokenizer)-special_token_id_offset

microsoft / hmnet Goto Github PK

hmnet's Introduction

HMNet

Finetune HMNet

Data paths

Evaluation

Step 1: specify the model path

Step 2: run the evaluate pipeline

Trademarks

Security

Reporting Security Issues

Preferred Languages

Policy

hmnet's People

Contributors

Stargazers

Watchers

Forkers

hmnet's Issues

Recommend Projects

Recommend Topics

Recommend Org