Light

yangzhiye / papers-for-text-summarization Goto Github PK

View Code? Open in Web Editor NEW

18.0 2.0 4.0 8 KB

papers

License: Apache License 2.0

papers-for-text-summarization's Introduction

Papers-For-Graduation-Project

论文调研

文本摘要相关论文

1. LCSTS: A Large-Scale Chinese Short Text Summarization Dataset 论文地址

爬取并过滤了240W+条微博蓝V发布的[摘要,短文本],这是本系统所采用的语料。
本文提出了word-based和character-based(为了改善UNK问题)两种数据处理的方法,并给出了RNN和RNN+context两种模型做baseline。
RNN+Context+Char组合表现最好，ROUGE-1:0.299 ROUGE-2:0.174 ROUGE-L:0.272

本文的主要贡献就是提供短文本摘要的训练集，并给出了baseline。我之前用tfidf提关键句ROUGE-1达到了0.28，没干过他。感觉长文本和短文本摘要还是有一些区别的，可能短文本摘要要更注重句子压缩，长文本摘要更注重信息提取。暂时不做短文本了，如果毕设需要使用该数据集就回头再看。

2. The Automatic Creation of Lierature Abstracts 论文地址

TFIDF计算关键词->通过关键词的密集程度计算关键句->通过关键句形成摘要

1958年的古董文章，打印还把实验室的打印机给整坏了。简单粗暴，在短文本数据集和NLPCC2017-task3新闻数据集上实现了，效果还不错，可以作为毕设baseline。

3. TextRank:Bringing Order into Texts 论文地址

使用Textrank方法提取文本中关键词/句

使用Textrank提取摘要是很常见的方法，在NLPCC2017-task3新闻数据集上实现了，但效果不如tfidf，目测IDF立功。

评价指标

1. ROUGE: A Package for Automatic Evaluation of Summaries 论文地址

一种自动评价摘要的方法，包括ROUGE-N、ROUGE-L、ROUGE-W、ROUGE-S、SOUGE-SU。

目前最权威的自动摘要评价方法，网上给出的英文版居多。我实现了中文版的部分ROUGE，但并不权威。

papers-for-text-summarization's People

Contributors

Stargazers

Watchers

Forkers

qingshan3537 db-li theodoreshaw

papers-for-text-summarization's Issues

关于数据集

大佬有文本摘要的数据集吗？可否分享一下，谢谢！

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.