Git Product home page Git Product logo

sayoeti-core's Introduction

Sayoeti Core

Sayoeti is the PR's assistant of Komisi Pemberantasan Korupsi(KPK) that powered by Artificial Intelligence. He helps KPK by watching all mass media in Indonesia and provide sentiment analysis of the mass media.

This is one of component that Sayoeti built on. Basically, sayoeti-core is created to answer the following question:

Human: Given a document, is this document about corruption news in Indonesia? (Yes/No)
Sayoeti: Yes, it is.

Learn more here https://sayoeti.xyz (Indonesian).

Requirements

We use supervised learning method here. Sayoeti need to learn from corpus of corruption news first. Example of corpus from Kompas.com and Liputan6.com can be found here.

To decrease the bias we need to remove commonly used words in Indonesian like dan & di that have no meaning in our context. To do this, use Indonesian stopwords.

Sayoeti only tested on x86_64. Compiled using gcc.

Setup

Clone the repository

git clone https://github.com/pyk/sayoeti-ai
cd sayoeti-ai

Install dependencies

make libmill

Build Sayoeti

make

Run Sayoeti

LD_LIBRARY_PATH=/usr/local/lib ./sayoeti -c /path/to/corpusdir -s /path/to/stopwords/file

Sayoeti will listening on port 9090 by default.

Example

Running Sayoeti

$ LD_LIBRARY_PATH=/usr/local/lib ./sayoeti -c corpus -s stopwords_id.txt 
sayoeti: Create stop words dictionary from stopwords_id.txt
sayoeti: stop words dictionary from stopwords_id.txt is created.
sayoeti: Create index vocabulary from corpus corpus
sayoeti: Index vocabulary from corpus corpus created.
sayoeti: compute global IDF for each term in index vocabulary
sayoeti: create a problem
*
optimization finished, #iter = 9
obj = 278.605220, rho = 23.603209
nSV = 24, nBSV = 23
sayoeti: listening on port :9090

Throw document for Sayoeti to read

$ telnet localhost 9090
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
202 OK sayoeti ready

License

Copyright 2015 Bayu Aldi Yansyah <[email protected]>

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

sayoeti-core's People

Contributors

pyk avatar

Watchers

skyformat99 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.