Git Product home page Git Product logo

chinese_sf's Introduction

RPI BLENDER Chinese Slot Filling System


This is RPI BLENDER Chinese slot filling system. Definition of slot filling: Slot filling aims at collecting from a large-scale multi-source corpus the values (“slot fillers”) for certain attributes (“slot types”) of a query entity, which is a person or some type of organization.[1]

[1] Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, Chi Wang, Shi Zhi, Jiawei Han, Clare Voss and Malik Magdon-Ismail. The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding. (COLING 2014)

Example

  • System's input is a XML file.
<?xml version="1.0" encoding="utf-8"?>
<kbpslotfill>
  <query id="SF14_CMN_TRAINING_001">
    <name>郭全宝</name>
    <enttype>PER</enttype>
    <docid>cmn-NG-4-76762-9532908</docid>
    <beg>2536</beg>
    <end>2538</end>
  </query>
</kbpslotfill>
  • Outputs are KBP slot filling '.tab' file and a HTML file.
SF14_CMN_TRAINING_001	per:age	SYS	xin_cmn_20040714.0019:58-174	82	xin_cmn_20040714.0019:171-172	1
SF14_CMN_TRAINING_001	per:country_of_death	SYS	xin_cmn_20071015.0207:216-219,xin_cmn_20040714.0019:58-174	**	xin_cmn_20071015.0207:216-217	1
...

Install

  1. Clone the project.

  2. Download external softwares and KBP Chinese slot filling source

  3. Required Dependencies:

    • JDK 7 (1.7) or higher.

    • Apache Ant

    • Python 2.7

    • GCC

    • Python packages:

      • pexpect >= 2.4
      • xmltodict >= 0.4.6
      • jianfan >= 0.0.2
      • jinja2 >= 2.7.3
      • python-Levenshtein >= 0.12
  4. Environment and dependencies setting:

    • Pylucene requires ant. If ant is not at '/usr/bin/ant', please go to 'externals/pylucene-4.10.1-1/Makefile' and change 'ANT' to where you installed ant.

    • Default python path is '/usr/bin/python'. If you are using Python virtual environment, please go to 'externals/pylucene-4.10.1-1/Makefile' and change 'PREFIX_PYTHON' to the directory of your python virtual environment. If you installed python in other directory, change 'PREFIX_PYTHON' in 'externals/pylucene-4.10.1-1/Makefile' to directory where your python installed.

    • ANT_HOME=(ANT directory)

    • JAVA_HOME=(Java directory)

    • JCC_JDK=$JAVA_HOME

  5. Install pylucene

    Use command:

    python install_pylucene.py
    
  6. Add Chinese_SF path to PYTHONPATH in bash profile. Use command

    export PYTHONPATH=$PYTHONPATH:<Chinese_SF path>
    

Usage

Examples are provided in 'example/'.

Use command

python RPISlotFilling/slot_filling/ChineseSlotFilling.py <input_query_file_path> <output_directory>

Please notice that the first argument is the file path and the second argument is a directory path. Two results will be created under output directory: 'cn_sf_result.tab' in KBP SF format and 'cn_sf_result.html'.

Contact

chinese_sf's People

Contributors

boliangz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.