Git Product home page Git Product logo

chinese-poetry-generation's Issues

_get_rhyme function cannot handle 'N' or 'G'

Current implementation cannot handle words end with 'N', 'G'.
Because _get_vowel function would return "" when pinyin ends with 'IANG', thus _get_rhyme would not ever return 10.

might change to this:

def _get_vowel(pinyin):
    i = len(pinyin) - 1
    if pinyin.endswith('N'):
        i -= 1
    if pinyin.endswith('NG'):
        i -= 2
    while i >= 0 and \
            pinyin[i] in ['A', 'O', 'E', 'I', 'U', 'V']:
        i -= 1
    return pinyin[i+1 : ]

what's the difference between two word2vec model?

I find there are two word2vec model
one is in plan.py, we train a model called kw_model.bin
another one is in word2vec.py, we train a model called word2vec.npy
I think both of their input is quatrains
so what's the difference between them?
THX

Where does source data come from?

Hi Devin,
Thank you for your amazing job, I've learned a lot from this.
I am just not sure where does this raw data come from. I think all poems data might come from Zhang and Lapata (EMNLP, 2014). How about pinyin dictionary?

data_utils.py failed with error

Hi, I was trying out the code after reading the paper. It seems like some of the data processing step failed. After running data_utils, there was only one output (sxhy_dict.txt) in the data folder, and gave the following error:

(tensorflow2) Vera-MacBook-Pro:Chinese-Poetry-Generation Vera$ python data_utils.py

Building prefix dict from the default dictionary ...
Dumping model to file cache /var/folders/rj/yxfk4xl915l_0drx9d_stdgh0000gn/T/jieba.cache
Loading model cost 1.845 seconds.
Prefix dict has been built succesfully.
Generating the vocabulary ...
Parsing raw/qts_tab.txt ...
Traceback (most recent call last):
  File "data_utils.py", line 127, in <module>
    train_data = get_train_data()
  File "data_utils.py", line 72, in get_train_data
    _gen_train_data()
  File "data_utils.py", line 34, in _gen_train_data
    poems = get_pop_quatrains()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 40, in get_pop_quatrains
    cnts = get_word_cnts()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 27, in get_word_cnts
    _gen_word_cnts()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/cnt_words.py", line 14, in _gen_word_cnts
    quatrains = get_quatrains()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/quatrains.py", line 21, in get_quatrains
    _, ch2int = get_vocab()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/vocab.py", line 31, in get_vocab
    _gen_vocab()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/vocab.py", line 15, in _gen_vocab
    corpus = get_all_corpus()
  File "/Users/Vera/Documents/projects/ura/Chinese-Poetry-Generation/corpus.py", line 63, in get_all_corpus
    corpus.extend(data)
UnboundLocalError: local variable 'data' referenced before assignment

Below is the lists of packages installed:

# packages in environment at /anaconda/envs/tensorflow2:
#
boto                      2.46.1                    <pip>
bz2file                   0.98                      <pip>
funcsigs                  1.0.2                     <pip>
gensim                    2.1.0                     <pip>
jieba                     0.38                      <pip>
mock                      2.0.0                     <pip>
numpy                     1.12.1                    <pip>
openssl                   1.0.2k                        2  
pbr                       3.0.1                     <pip>
pip                       9.0.1                    py27_1  
pip                       9.0.1                     <pip>
protobuf                  3.3.0                     <pip>
python                    2.7.13                        0  
readline                  6.2                           2  
requests                  2.14.2                    <pip>
scipy                     0.19.0                    <pip>
setuptools                27.2.0                   py27_0  
six                       1.10.0                    <pip>
smart-open                1.5.3                     <pip>
sqlite                    3.13.0                        0  
tensorflow                1.1.0                     <pip>
tk                        8.5.18                        0  
Werkzeug                  0.12.2                    <pip>
wheel                     0.29.0                   py27_0  
zlib                      1.2.8                         3

Any suggestion?

local variable 'data' referenced before assignment

Hello,
When I run '$python data_utils.py', it gives the error message like that:

File "corpus.py", line 63, in get_all_corpus
corpus.extend(data)
UnboundLocalError: local variable 'data' referenced before assignment

My python version is 2.7.10.
I don't know where the problem is.

why do you sample using a random function?

Hi Nan,
I have a question pretty confused. On generate.py line 204, why do you use random.random() < prob_list[j]/prob_sums[j] to get the character? I've tried the argmax method, but it always generate frequently appeared character like "不" “一”.

why is that the generated poetry doesn't contain the keywords?

why is that the generated poem doesn't contain the keywords?
Input Text: 春天桃花开了
Keywords: 桃花开 相宜 春天 江边
Poem Generated:
梅花香满绿荫时,石木无春午六滨。
江水东归灯下望,江南车径与谁槟。
In the paper “ChinesePoetryGenerationwithPlanningbasedNeuralNetwork”,it seems that the generated poem contains the keywords.
1505962616 1

what's the source of qtais.txt?

I know the meaning of corpus following:
qts : 全唐诗
qsc : 全宋诗
qsc : 全宋词
yuan :元朝诗歌
ming:明代诗歌
qing: 清代诗歌
Am I correct?
And what's the meaning of qtais.txt?

thanks!

Thank you for your contribution and have two requests

hi,author:
this is good job,thank you for your share ! I have two request:
can you explain the process of keyword extraction , key word expansion and generation based on keyword?
In addition,do you have the code of Zhe Wang et al. Chinese Poetry Generation with Planning based Neural Network. 2016 ?
looking forward to your reply,thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.