Git Product home page Git Product logo

kint's Introduction

KINT(Korean Internet New Terms) ๐Ÿ‡ฐ๐Ÿ‡ท

ํ•œ๊ตญ ์ธํ„ฐ๋„ท ์‹ ์–ด API

๐Ÿซ ๊ณ ๋ ค๋Œ€ํ•™๊ต ๋น…๋ฐ์ดํ„ฐ ์บ ํผ์Šค ํ”„๋กœ์ ํŠธ

๐Ÿ“… ํ”„๋กœ์ ํŠธ ์ผ์ •: 2020๋…„ 6์›” 22์ผ ~ 2020๋…„ 9์›” 18์ผ

๐Ÿ˜ƒ ํ”„๋กœ์ ํŠธ ํŒ€์›: ๊น€์œค๊ธฐ, ์„œ๋ฏผ์˜, ์œ ์ฃผ์—ฐ, ์กฐ์ •ํ˜„

KINT๋Š” ๋ฌธ์„œ๋‚˜ ๋ฌธ์žฅ์—์„œ ์‹ ์–ด๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ, ์ž๋™์œผ๋กœ ๊ฐ์ง€๋œ ์‹ ์–ด์— ๋Œ€ํ•ด์„œ ๊ฐ์„ฑ ๋ถ„์„์„ ์ง„ํ–‰ํ•˜๋Š” ๋ชจ๋ธ, ์˜ˆ๋ฌธ์„ ์ถœ๋ ฅํ•˜๋Š” ๋ชจ๋ธ์„ ํ†ตํ•ฉ Open API๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ตฌ์ถ•๋œ ์‹ ์–ด DB๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์›น ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด KINT๋Š” ์ƒˆ๋กœ์šด ๋‹จ์–ด ์ •๋ณด๋ฅผ API๋กœ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์ƒˆ๋กœ์šด ๋‹จ์–ด๋ฅผ ํ•ด์„ํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ, ์–ธ์–ดํ•™ ๋ถ„์•ผ, ์‚ฌํšŒ ๋ฐ ๊ณต๊ณต ๋ถ„์•ผ์— ๊ธฐ์—ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ๋Š” ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ๋‹จ์–ด๋ฅผ ๊ฐ์ง€ํ•˜๊ณ  ์ด๋ฅผ ์‚ฌ์ „์— ์ž๋™์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ƒˆ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋‹จ์–ด๋ฅผ ๊ฐ์ง€ํ•˜๊ณ  ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ๋‹จ์–ด์— ๋Œ€ํ•œ ๊ฐ์„ฑ ๋ถ„์„ ๋ฐ ์˜ˆ๋ฌธ์„ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ด 2๊ฐ€์ง€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.

์ง„ํ–‰

๋‹ค์Œ์€ ํ”„๋กœ์ ํŠธ์˜ ์ž๋™ํ™” ์‹œ์Šคํ…œ ๊ตฌ์„ฑ๋„์ž…๋‹ˆ๋‹ค. system

Model 1. ์‹ ์–ด ๊ฐ์ง€ ๋ฐ ์ž๋™ ๋ถ„๋ฅ˜

model1

  1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋‹จ๊ณ„

    • ๋ถ„์•ผ๋ณ„ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์›น ํฌ๋กค๋Ÿฌ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ํฌ๋กค๋Ÿฌ๋Š” ์ œ๋ชฉ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ฐ ๋‚ ์งœ / ์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ฉ๋‹ˆ๋‹ค.
    • ์œ ๋จธ ํฌ๋กค๋Ÿฌ
      • natepan : 10๋Œ€ ์ด์•ผ๊ธฐ, 20๋Œ€ ์ด์•ผ๊ธฐ, ํ†ก์ปค๋“ค์˜ ์„ ํƒ ๋ช…์˜ˆ์˜ ์ „๋‹น (์ผ๋ณ„)
      • ์˜ค๋Š˜์˜ ์œ ๋จธ : ๋ฒ ์˜ค๋ฒ , ๋ฒ ์ŠคํŠธ ๊ฒŒ์‹œ๋ฌผ
      • ilbe : ์ผ๋ฒ -์ผ๊ฐ„ ๋ฒ ์ŠคํŠธ
      • dcinside : ์•ผ๊ตฌ ๊ฐค๋Ÿฌ๋ฆฌ
      • ppomppu : ์ž์œ  ๊ฒŒ์‹œํŒ
    • ์ •์น˜ ํฌ๋กค๋Ÿฌ
      • ์ผ๋ฒ  : ์ •์น˜ / ์‹œ์‚ฌ ๊ฒŒ์‹œํŒ
      • ๋ณด๋ฐฐ๋“œ๋ฆผ : ์ •์น˜ ์ปค๋ฎค๋‹ˆํ‹ฐ
    • ์—”ํ„ฐํ…Œ์ธ๋จผํŠธ ํฌ๋กค๋Ÿฌ
      • dcinside : ์ธํ„ฐ๋„ท๋ฐฉ์†ก ๊ฐค๋Ÿฌ๋ฆฌ, ๋‚จ์ž/์—ฌ์ž ์—ฐ์˜ˆ์ธ ๊ฐค๋Ÿฌ๋ฆฌ
      • instize : ์ด์Šˆ
    • ๋‰ด์Šค ํฌ๋กค๋Ÿฌ
      • ํ•œ๊ฒจ๋ ˆ
      • ๊ฒฝํ–ฅ์‹ ๋ฌธ
      • ๋งค์ผ๊ฒฝ์ œ
      • ์กฐ์„ ์ผ๋ณด
      • ๋””์ง€ํ„ธํƒ€์ž„์Šค
      • ๋™์•„์ผ๋ณด
      • SBS๋‰ด์Šค
      • ํ•œ๊ตญ๊ฒฝ์ œ
  2. ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„

    • soynlp ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์šฉ์–ด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ๋‚˜์˜จ ์šฉ์–ด๋ฅผ ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ „์ž์‚ฌ์ „์— ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ „์ž์‚ฌ์ „์— ์—†๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ „์ž์‚ฌ์ „์— ์žˆ๋Š” ๊ฒฝ์šฐ ์‹ ์–ด๊ฐ€ ์•„๋‹Œ ๊ฒƒ์œผ๋กœ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.
  3. ์‹ ์–ด ๊ฐ์ง€ ๋‹จ๊ณ„

    • ์ข…์†๋ณ€์ˆ˜: ์‹ ์–ด ์—ฌ๋ถ€

    • ๋…๋ฆฝ๋ณ€์ˆ˜

      Independent Variable

    • ์œ„ ์ข…์†๋ณ€์ˆ˜, ๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ํ•™์Šต๋œ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์— ๋„ฃ์–ด ์‹ ์–ด ์—ฌ๋ถ€ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค.

  4. 1,2,3 ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Model 2. ์‹ ์–ด ๋ถ„์„ ๋ชจ๋ธ

model2

  1. ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

    • Model 1์—์„œ ๊ฐ์ง€๋œ ์‹ ์–ด๊ฐ€ ํฌํ•จ๋œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
  2. ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„

    • ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒกํ„ฐํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ฐ์„ฑ ๋ถ„์„ ๋ฐ ์˜ˆ๋ฌธ ์ถœ๋ ฅ ๋‹จ๊ณ„

    • ์‹ ์–ด ๊ฐ์„ฑ ๋ถ„์„ API (์‹ ์–ด ๊ฐ์„ฑ ๋ถ„์„ ๋ชจ๋ธ)๋Š” Point Mutual Information (PMI) ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹ ์–ด์— ๋Œ€ํ•œ ๊ฐ์„ฑ์„ ๋ถ„์„ํ•˜๋ฉฐ ๊ด€๋ จ ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ์‹ ์–ด ์˜ˆ๋ฌธ์ถ”์ถœ API(์‹ ์–ด ์˜ˆ๋ฌธ ์ถ”์ถœ ๋ชจ๋ธ)๋Š” Bidirectional GRU์„ ํ†ตํ•ด ์˜ˆ๋ฌธ์„ ์ถ”์ถœํ•˜๊ณ , N-grame์„ ํ™œ์šฉํ•˜์—ฌ ์ถ”์ถœ๋œ ์˜ˆ๋ฌธ์˜ ๋„์–ด์“ฐ๊ธฐ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

KINT(Korean Internet New Terms) ๐Ÿ‡บ๐Ÿ‡ธ

Korean Internet New Terms API

Korea Univ Bigdata Campus Project

This service provides the top 5-newly detected internet terms, sentiment analysis of newly detected internet terms, and example sentence of newly detected internet terms on the website.

Through this, KINT aims to contribute to Natural Language Processing, Linguistics, Social and Public sectors that have difficulty interpreting new internet terms by providing new internet terms information through API.

This project automatically detects new internet terms and updates this in our dictionary.

For this, the 1st Model detects new internet terms, and the 2nd Model extracts sentiment analysis result of them and example sentence of them.

Progress

system

Model 1. New internet terms detection and automatic classification model

model1

  1. Data collecting Step

    • Web crawlers collect data at each sector community
    • These crawlers collect head data, DateTime data
    • Humor Crawler
      • natepan : 10 ๋Œ€ ์ด์•ผ๊ธฐ, 20 ๋Œ€ ์ด์•ผ๊ธฐ, ํ†ก์ปค๋“ค์˜ ์„ ํƒ ๋ช…์˜ˆ์˜ ์ „๋‹น (์ผ๋ณ„)
      • ์˜ค๋Š˜์˜ ์œ ๋จธ : ๋ฒ ์˜ค๋ฒ , ๋ฒ ์ŠคํŠธ ๊ฒŒ์‹œ๋ฌผ
      • ilbe : ์ผ๋ฒ -์ผ๊ฐ„ ๋ฒ ์ŠคํŠธ
      • dcinside : ์•ผ๊ตฌ ๊ฐค๋Ÿฌ๋ฆฌ
      • ppomppu : ์ž์œ  ๊ฒŒ์‹œํŒ
    • Politic Crawler
      • ์ผ๋ฒ  : ์ •์น˜ / ์‹œ์‚ฌ๊ฒŒ์‹œํŒ
      • ๋ณด๋ฐฐ๋“œ๋ฆผ : ์ •์น˜์ปค๋ฎค๋‹ˆํ‹ฐ
    • Entertainment Crawler
      • dcinside : ์ธํ„ฐ๋„ท๋ฐฉ์†ก ๊ฐค๋Ÿฌ๋ฆฌ, ๋‚จ์ž/์—ฌ์ž์—ฐ์˜ˆ์ธ ๊ฐค๋Ÿฌ๋ฆฌ
      • instize : ์ด์Šˆ
    • News Crawler
      • ํ•œ๊ฒจ๋ ˆ
      • ๊ฒฝํ–ฅ์‹ ๋ฌธ
      • ๋งค์ผ๊ฒฝ์ œ
      • ์กฐ์„ ์ผ๋ณด
      • ๋””์ง€ํ„ธํƒ€์ž„์Šค
      • ๋™์•„์ผ๋ณด
      • SBS๋‰ด์Šค
      • ํ•œ๊ตญ๊ฒฝ์ œ
  2. Text data preprocessing step

    • We use the soynlp library to extract terms.
    • Extract terms are searched in the electronic dictionary of the National Institute of Korean Language.
    • If not in the electronic dictionary of the National Institute of Korean Language, it will be used in the next step.
    • If it is in the electronic dictionary of the National Institute of Korean Language, it is judged that it is not a new internet term.
  3. New internet word detecting step

  • Dependent Variable: Whether it is a new internet term

    • Independent Variable Independent Variable
  • Put the above dependent and independent variables into the learned classification model to determine whether it is a new internet term or not

  1. We apply step 1,2,3 repeatedly.

Model 2. New Internet terms analysis Model

model2

  1. Text data loading step

    • We load the text data containing the new internet terms detected in Model 1.
  2. Text data preprocessing step

    • Text data vectorize.
  3. Sentiment analysis and example sentence extracting step

    • The New Internet Terms Sensitivity Analysis API (New Internet Terms Sensitivity Analysis Model) analyzes the sensitivity of new internet terms based on Point Mutual Information (PMI) and extracts related keywords.

    • The New Internet Terms sample extraction API (New Internet Terms sample extraction model) extracts the sample through the Bidirectional GRU and applies the spacing of the extracted sample using N-Gram.

-----------------------------------------------------ใ…‡

API

  1. soynlp
  2. ๊ตญ๋ฆฝ๊ตญ์–ด์› ์ „์ž์‚ฌ์ „

Contribute

kint's People

Contributors

kimyungi avatar smy999 avatar yujuyeon0511 avatar wtt5857 avatar

Watchers

 avatar

kint's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.