Git Product home page Git Product logo

omniscient's Introduction

Omniscient

Knowledge Extraction, Graph Construction and exciting Applications

SPARQL query in command line

python -m omniscient.kg.tdb_query --index /path/to/index --query "select * where { ?s ?p ?o .} limit 10"

Note:

  • pipeline: Wrap the Stanford NLP into python. Extractor, still under development.
  • structure: Basic structure for graph, still under development

TODO

Use download cli tool to reslove the dependency issue.

https://github.com/explosion/spaCy/blob/master/spacy/cli/download.py

Requirement

  • Package Install

    git clone https://github.com/Impavidity/Omniscient.git
    cd Omniscient
    python setup.py install
    
  • Spacy

    conda config --add channels conda-forge
    conda install spacy
    python -m spacy download en
    
  • pyjnius

    You may fail in installing the package because of pyjnius. You might need to setup some config for conda lib

    If you have this error,

    anaconda3/compiler_compat/ld: cannot find -lpthread
    anaconda3/compiler_compat/ld: cannot find -lc
    

    try to cd anaconda3/lib and do

    ln -s /lib/x86_64-linux-gnu/libpthread.so.0 libpthread.so
    ln -s /lib/x86_64-linux-gnu/libc.so.6 libc.so 
    
  • Dependencies

    mkdir resource
    cd resource
    mkdir jars
    cd jars
    wget https://git.uwaterloo.ca/p8shi/jar/raw/master/tdbquery.jar
    

Build Inverted Index for Freebase

  • Download the freebase dump from here
  • Extract names from dumps
    nohup python -u -m kg.freebase.name_extraction --input /path/to/freebase/freebase-rdf-latest.gz --output_path /path/to/index/ --output_file freebase_name.json --language_filter "\"['en', 'zh']\""> freebase_name_extraction.log &
    
  • Build Index from the json file
    nohup python -u -m kg.freebase.inverted_index --input /path/to/freebase_name.json --index /path/to/index/path > inverted_index_freebase_db.log &
    
  • Search
    python -m kg.freebase.candidate_retrieval --index /path/to/index/path  --query obama
    

Query with TDB dataset

With freebase dump, you can use jena to build index to support SPARQL query. Here, we use TDBLoader2

apache-jena-3.6.0/bin/tdbloader2 --loc path_to_index/d-freebase path_to_freebase_dump

--loc specifies the path of index. Then you can run

python -m omniscient.kg.tdb_query --index path_to_index/d-freebase

for query. There are two type of query query(single query) and parallel_query(batch query with specific thread number). For more example, you can refer to kg/tdb_query.py.

Known bugs

  • Encoding Use
query.encode("utf-8")

instead of

query

as query argument.

omniscient's People

Contributors

impavidity avatar michaelazmy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.