Git Product home page Git Product logo

ekonlpy's Introduction

eKo(nomic)NLPy

eKoNLPy is a Korean NLP Python Library for Economic Analysis, which supports Korean Language.

KoNLPy의 Mecab tagger를 기반으로 경제관련 전문용어, 금융기관, 기업명 등을 하나의 명사로 분류하도록 후처리 기능을 추가.

통화정책(Monetary Policy)의 어조(Hawkish/Dovish)를 판단할 수 있는 Sentiment Analysis 기능 포함.

경제의 불확실성(Uncertain/Stable)을 판단할 수 있는 Economic Uncertainty Analysis 기능 포함.

경제 문서의 주제를 분류할 수 있는 Topic Analysis 기능 포함.

Usage

Part of speech tagging

KoNLPy와 동일하게 Mecab.pos(phrase)를 입력합니다. 먼저 KoNLPy의 Mecab 형태소 분석기로 처리한 후, 템플릿에 등록된 연속된 토큰의 조합이 사용자 사전에 등록되어 있으면 복합명사로 어절을 분리합니다.

from ekonlpy.tag import Mecab
mecab = Mecab()
mecab.pos('금통위는 따라서 물가안정과 병행, 경기상황에 유의하는 금리정책을 펼쳐나가기로 했다고 밝혔다.')

> [('금통위', 'NNG'), ('는', 'JX'), ('따라서', 'MAJ'), ('물가', 'NNG'), ('안정', 'NNG'), ('과', 'JC'), ('병행', 'NNG'), (',', 'SC'), ('경기', 'NNG'), ('상황', 'NNG'), ('에', 'JKB'), ('유의', 'NNG'), ('하', 'XSV'), ('는', 'ETM'), ('금리정책', 'NNG'), ('을', 'JKO'), ('펼쳐', 'VV+EC'), ('나가', 'VX'), ('기', 'ETN'), ('로', 'JKB'), ('했', 'VV+EP'), ('다고', 'EC'), ('밝혔', 'VV+EP'), ('다', 'EF'), ('.', 'SF')]

Lemmatisation and synoyms

Sentiment 분석의 정확도를 높이기 위해, 동의어 처리와 lemmatization 기능을 제공한다.

Add words to dictionary

ekonlpy.tag의 Mecab은 add_dictionary를 통하여 str 혹은 list of str 형식의 단어를 사전에 추가할 수 있습니다.

from ekonlpy.tag import Mecab
mecab = Mecab()
mecab.add_dictionary('금통위', 'NNG')

Sentiment analysis

To use the Korean Monetary Policy dictionary, create an instance of the MPKO class in ekonlpy.sentiment

from ekonlpy.sentiment import MPKO
mpko = MPKO(kind=1)
tokens = mpko.tokenize(text)
score = mpko.get_score(tokens)

kind parammeter for MPKO class: a parameter to select a lexicon file

0: a lexicon file generated using Naive-bayes classifier with 5-gram tokens as features and
    changes of call rates as positive/negative label.

1: a lexicon file generated by polarity induction and seed propagation method with 5-gram tokens.

Classifier를 이용하여 통화정책 센티멘트를 분석하기 위해서는 ekonlpy.sentiment의 MPCK 클래스를 사용한다.

from ekonlpy.sentiment import MPCK
mpck = MPCK()
tokens = mpck.tokenize(text)
ngrams = mpck.ngramize(tokens)
score = mpck.classify(tokens + ngrams, intensity_cutoff=1.5)

intensity_cutoff parameter를 사용해 분류정확도가 낮은 문장을 neutral로 분류하는 강도를 설정할 수 있다. (default: 1.3)

KSA is a korean sentiment analyzer for general korean texts. KSA는 일반적인 한국어 감성분석 용도로 사용합니다. 형태소 분석기는 서울대학교 IDS 연구실에서 만든 꼬꼬마를 사용한다. 감성사전 또한 동 연구소의 것을 사용한다. (참고: http://kkma.snu.ac.kr/)

from ekonlpy.sentiment import KSA
ksa = KSA()
tokens = ksa.tokenize(text)
score = ksa.get_score(tokens)

Similarly, to use the Harvard IV-4 dictionary for general english sentiment analysis:

from ekonlpy.sentiment import HIV4
hiv = HIV4()
tokens = hiv.tokenize(text)
score = hiv.get_score(tokens)

Similarly, to use the Loughran and McDonald dictionary for financial domain sentiment analysis:

from ekonlpy.sentiment import LM
lm = LM()
tokens = lm.tokenize(text)
score = lm.get_score(tokens)

Economic uncertainty analysis

To use the Korean Economic Uncertainty dictionary, create an instance of the EUKO class in ekonlpy.sentiment

from ekonlpy.sentiment import EUKO
euko = EUKO(kind=1)
tokens = euko.tokenize(text)
score = euko.get_score(tokens)

kind parammeter for EUKO class: a parameter to select a lexicon file

0: a lexicon file generated using Naive-bayes classifier with 5-gram tokens as features and levels of VKOSPI as positive/negative label.
1: a lexicon file generated by seed propagation method with 5-gram tokens.

Topic analysis

To analyze the Monetary Policy Topics, create an instance of the MPTK class in ekonlpy.topic

from ekonlpy.topic import MPTK
mptk = MPTK()
tokens = mptk.nouns(text)
bow = mptk.doc2bow(tokens)
dtm = mptk.get_document_topic(bow)

parammeters for get_document_topic fucntion

include_names: If True, return tuples of list including topic names. 
                    ex) (topic_id, topic_name, topic_weight)
               If False (default), return tuples of list without topic name. 
                    ex) (topic_id, topic_weight)  

min_weight: If min_weight is set, return topics with the topic weight is greather than the min_weight.
            Otherwise, return all available topics.

Install

$ git clone https://github.com/entelecheia/eKoNLPy.git

$ cd eKoNLPy

$ pip install .

$ pip install . --upgrade (for upgrade)

Requires

  • KoNLPy >= 0.4.4
  • nltk >= 2.0
  • gensim >= 3.1.0
  • scipy >= 0.19.1
  • numpy >= 1.13

License

eKoNLPy is Open Source Software, and is released under the license GPL v3.

  • Lee, Young Joon, eKoNLPy: A Korean NLP Python Library for Economic Analysis, 2018. https://github.com/entelecheia/eKoNLPy.

  • Lee, Young Joon, Soohyon Kim, and Ki Young Park. "Deciphering Monetary Policy Board Minutes with Text Mining: The Case of South Korea." Korean Economic Review 35 (2019): 471-511.

BibTeX entry:

@misc{lee2018ekonlpy,
    author= {Lee, Young Joon},
    year  = {2018},
    title = {{eKoNLPy: A Korean NLP Python Library for Economic Analysis}},
    note  = {\url{https://github.com/entelecheia/eKoNLPy}}
}

ekonlpy's People

Contributors

aria-sj avatar entelecheia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.