Git Product home page Git Product logo

entity2vec's Introduction

entity2vec

entity2vec computes vector representations of Knowledge Graph entities that preserve semantic similarities and are suitable for classification tasks. It generates a set of property-specific entity embeddings by running node2vec on property specific subgraphs, i.e. K(p) = (s,p,o) where p is a given property. The repository includes:

  • A reimplementation of node2vec, which introduces the possibility of avoiding the preprocessing of the transition probabilities, which has the effect of reducing memory effort, but slowing down the computation

  • entity2vec, which generates a set of entity embeddings from Knowledge Graphs corresponding to different properties. Entity2vec can work with a set of pre-downloaded dumps or download them from a SPARQL endpoint.

Requirements

  • Python 2.7 or above
  • numpy
  • gensim
  • networkx
  • pandas
  • SPARQL Wrapper

If you are using pip:

    pip install -r requirements.txt

Property-specific entity embeddings

The set of properties can be defined in the configuration file config/properties.json, otherwise the software will run on each file that is located in datasets/your_dataset/graphs or if a SPARQL endpoint is provided, it will download all the graphs for all properties in datasets/your_dataset/graphs.

    python src/entity2vec.py --dataset dataset --config_file config_file --entities entities --sparql sparql --default_graph default_graph

Alternatively, e2v can be loaded as a module and used like:

    from entity2vec.entity2vec import Entity2Vec

    e2v = Entity2Vec(False, False, False, 1, 1, 10, 5,
                 128, 10, 8, 5, 'path/to/properties.json', False,
                 'dataset_name', 'all', False, False,
                 'path/to/feedback.edgelist')
option default description
dataset null (Required) name of the dataset. It will be used to create folders and retrieve properties from config file
config_file config/properties.json path of the configuration file
entities all a list of entities for which the embeddings have to be computed. By default, it will use them all.
sparql null endpoint from which property-specific graphs are obtained. If not provided, it assumes that the graphs are already stored in datasets/your_dataset/graphs
default_graph null whether using a default_graph in the SPARQL endpoint
num_walks 500 number of random walks per entity
feedback_file null Path to a DAT file that contains all the couples user-item. If not defined, it assumes that is the file datasets/<my_dataset>/graphs/feedback.edgelist

Entity classification

Generate unique vector representation for an entity, without considering the role of semantic properties, to use in classification tasks.

  1. Create empty directory called emb

  2. Run node2vec on the whole graph to create a single global embedding of the entity

     python src/node2vec.py --input datasets/aifb/aifb.edgelist --output emb/aifb_p1_q4.emd  --p 1 --q 4
    
  3. Obtain scores, e.g.:

     cd ml
    
     python rdf_predict.py --dataset aifb --emb ../emb/aifb_p1_q4.emd --dimension 500
    

entity2vec's People

Contributors

enricopal avatar giusepperizzo avatar pasqlisena avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.