devinz1993 / chinese-poetry-generation Goto Github PK
View Code? Open in Web Editor NEWAn RNN-based Chinese Poem Generator
License: MIT License
An RNN-based Chinese Poem Generator
License: MIT License
Current implementation cannot handle words end with 'N', 'G'.
Because _get_vowel function would return "" when pinyin ends with 'IANG', thus _get_rhyme would not ever return 10.
might change to this:
def _get_vowel(pinyin):
i = len(pinyin) - 1
if pinyin.endswith('N'):
i -= 1
if pinyin.endswith('NG'):
i -= 2
while i >= 0 and \
pinyin[i] in ['A', 'O', 'E', 'I', 'U', 'V']:
i -= 1
return pinyin[i+1 : ]
I find there are two word2vec model
one is in plan.py, we train a model called kw_model.bin
another one is in word2vec.py, we train a model called word2vec.npy
I think both of their input is quatrains
so what's the difference between them?
THX
Hi Devin,
Thank you for your amazing job, I've learned a lot from this.
I am just not sure where does this raw data come from. I think all poems data might come from Zhang and Lapata (EMNLP, 2014). How about pinyin dictionary?
Hi, I was trying out the code after reading the paper. It seems like some of the data processing step failed. After running data_utils, there was only one output (sxhy_dict.txt) in the data folder, and gave the following error:
(tensorflow2) Vera-MacBook-Pro:Chinese-Poetry-Generation Vera$ python data_utils.py
Building prefix dict from the default dictionary ...
Dumping model to file cache /var/folders/rj/yxfk4xl915l_0drx9d_stdgh0000gn/T/jieba.cache
Loading model cost 1.845 seconds.
Prefix dict has been built succesfully.
Generating the vocabulary ...
Parsing raw/qts_tab.txt ...
Traceback (most recent call last):
File "data_utils.py", line 127, in <module>
train_data = get_train_data()
File "data_utils.py", line 72, in get_train_data
_gen_train_data()
File "data_utils.py", line 34, in _gen_train_data
poems = get_pop_quatrains()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 40, in get_pop_quatrains
cnts = get_word_cnts()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 27, in get_word_cnts
_gen_word_cnts()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 14, in _gen_word_cnts
quatrains = get_quatrains()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/quatrains.py", line 21, in get_quatrains
_, ch2int = get_vocab()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/vocab.py", line 31, in get_vocab
_gen_vocab()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/vocab.py", line 15, in _gen_vocab
corpus = get_all_corpus()
File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/corpus.py", line 63, in get_all_corpus
corpus.extend(data)
UnboundLocalError: local variable 'data' referenced before assignment
Below is the lists of packages installed:
# packages in environment at /anaconda/envs/tensorflow2:
#
boto 2.46.1 <pip>
bz2file 0.98 <pip>
funcsigs 1.0.2 <pip>
gensim 2.1.0 <pip>
jieba 0.38 <pip>
mock 2.0.0 <pip>
numpy 1.12.1 <pip>
openssl 1.0.2k 2
pbr 3.0.1 <pip>
pip 9.0.1 py27_1
pip 9.0.1 <pip>
protobuf 3.3.0 <pip>
python 2.7.13 0
readline 6.2 2
requests 2.14.2 <pip>
scipy 0.19.0 <pip>
setuptools 27.2.0 py27_0
six 1.10.0 <pip>
smart-open 1.5.3 <pip>
sqlite 3.13.0 0
tensorflow 1.1.0 <pip>
tk 8.5.18 0
Werkzeug 0.12.2 <pip>
wheel 0.29.0 py27_0
zlib 1.2.8 3
Any suggestion?
Hello,
When I run '$python data_utils.py', it gives the error message like that:
File "corpus.py", line 63, in get_all_corpus
corpus.extend(data)
UnboundLocalError: local variable 'data' referenced before assignment
My python version is 2.7.10.
I don't know where the problem is.
Hi Nan,
I have a question pretty confused. On generate.py line 204, why do you use random.random() < prob_list[j]/prob_sums[j] to get the character? I've tried the argmax method, but it always generate frequently appeared character like "不" “一”.
How can I train it to generate english poems?
i have some errors that i don't have .txt files needed in codes
I know the meaning of corpus following:
qts : 全唐诗
qsc : 全宋诗
qsc : 全宋词
yuan :元朝诗歌
ming:明代诗歌
qing: 清代诗歌
Am I correct?
And what's the meaning of qtais.txt?
thanks!
Hi,
this is an very interesting project. However, I can't follow the whole model just according the source code. So is there any reference documents or papers about the model in this project ?
hi,author:
this is good job,thank you for your share ! I have two request:
can you explain the process of keyword extraction , key word expansion and generation based on keyword?
In addition,do you have the code of Zhe Wang et al. Chinese Poetry Generation with Planning based Neural Network. 2016 ?
looking forward to your reply,thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.