Light

0x404 / bit-nlp-p2 Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 1.0 1.08 MB

BIT-自然语言理解初步-大作业2

Python 79.06% HTML 19.91% JavaScript 1.02%

bit-nlp-p2's Introduction

文本自动摘要

BIT自然语言理解初步大作业2 项目部署：www.0x404.tech

项目使用说明

直接在线访问本应用

本项目为web项目，项目已部署到云服务器，可通过网址http://www.0x404.tech 直接在线访问与使用本应用。
建议使用google游览器或者edge游览器
本文档为本地部署教程。

项目本地运行

下载项目

 #克隆github仓库到本地
 git clone [email protected]:0x404/BIT-NLP-P2.git
 
 #进入Django项目目录下
 cd BIT-NLP-P2/django_online/

需要的资源下载

由于使用的词向量表示和训练数据集过大，并没有作为提交文件的一部分提交，故如需在本地进行部署，请完成如下资源的下载：

基于微博语料库训练的$300$维词向量
NLPCC2017摘要数据

使用命令行命令自动下载（推荐）

使用脚本自动下载所需的数据文件，执行下列命令后无需手动下载资源，可以直接运行本项目。

#如果python没有wget库则先下载该库
pip install wget

#使用脚本下载语料库
python make.py

手动下载

基于微博语料库训练的$300$维词向量300$维词向量，来源于https://github.com/Embedding/Chinese-Word-Vectors
- 为加快下载，请从此处下载。
NLPCC2017摘要数据，来源于https://github.com/liucongg/GPT2-NewsTitle
- 为加快下载，请从此处下载。

下载完成将如上资源放在目录django_online/keyword_extraction/textrank/data中即可

运行本Web应用

#运行程序
python manage.py runserver

#Starting development server at http://127.0.0.1:8000/

在浏览器地址栏输入http://127.0.0.1:8000/本地访问项目。

任务

待做

检查一个词在停用词表中用字典树优化
计算关键句中用字典树优化
更细粒度的分句
textrank可以进一步改进，如加入句子长度的惩罚，或者使用句向量判断相似性

参考资料来源

停用词表来源：https://github.com/goto456/stopwords
清华数据集来源：http://thuctc.thunlp.org/
清华新闻数据集来源：https://thunlp.oss-cn-qingdao.aliyuncs.com/THUCNews.zip
jieba(后续可以用作业1的库代替)
re

bit-nlp-p2's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.