Git Product home page Git Product logo

unigraph-schema's Introduction

This version is deprecated

Unigraph now employs a combination of the strict freebase schema and the loose ontology of wikidata. The new schema is available for download as .json here and is explained in great detail here. This repository will no longer be updated, but we've seen serious interest in it as it provides insights as to how data was organized in freebase prior to its closure.


The Unigraph Schema

The Unigraph Schema is the backbone of the UniGraph Knowledge Graph. It is inspired by freebase with major modifications in the areas described in more detail below.

You can use the schema to map the objects and their relations from the world we live in. For example, the data from the London DataStore about the Birth and Death rates by Ward can be represented like this:

Ward > dataworld.sameas.uk_ons_gss_code
Births 2002 > measured_dimension.dated_float.value
Ward Name > type.object.name
Ward Name > location.statistical_region.births > measured_dimension.dated_float
Ward Name > location.location
Borough > type.object.name
Borough > location.location
Borough > location.location.contains > Ward Name

Key Concepts

Contributions

Domains, Types and Properties

The UniGraph schema is expressed via Types and Properties. Types have properties and are grouped in Domains:
In the architecture domain, the architecture.building type has 4 properties, one of which is: architecture.building.building_complex.

Mediators

The Mediator Types are used to express complex data, usually with a time dimension. For example the employment tenure mediator holds the information about the start and end date of an employment, the employee, the employeer and the title of the position (if any).

Differences between Types and Mediator Types

  • Mediator types have the "Mediator" attribute set to true.
  • Mediator types don't have Expected types, their "ExpectedTypes" attribute is null.
  • Mediator types have at least two required properties.
    In the employment tenure example the required properties are: business.employment_tenure.company and business.employment_tenure.person. The required properties define the minimum set of data required for a Mediator type to hold meaning. A statement describing a relationship between a company and an employee is complete and valueble in itslef, while a statement holding information about the time period of the employment but with missing information about the company or the employee is not. Usually the required properties are also unique and have their "Unique" attribute set to true - in order to force the creation of another Mediator type for the next employment of the same person. An exception of this rule are single property mediators, like the people.sibling_relationship which models the sibling relationship through its only propert people.sibling_relationship.sibling which is Required but not Unique.

Object Types

All properties have assigned ObjectTypes. This approach enforces the schema rules, controlls the data entering the system and infers types seamlessly. For example the ObjectType of the business.employment_tenure.person property is business.employee.

{
	"Id": "business.employment_tenure.person",
	"Name": "Person",
	"Description": "",
	"ObjectType": "business.employee",
	"Unique": true,
	"Required": true
}

As a result every object in the business.employment_renure.person relationship will receive the business.employee type.

Expected Types

Expected types is a flag to the UniGraph bots and users that a certain node will most probably have other types associated with it. For example, the architecture.architect type indicates that a node will most probably also be of type people.person and common.topic.

{
	"Id": "architecture.architect",
	"Name": "Architect",
	"Description": "\"Architect\" is used for individual contributors to the Built Environment. Also see the type \"Architecture Firm\" for collections of architects. A topic that is of the type \"Structure\" can have one or more \"Architects\" or \"Architecture Firms\" listed as properties, due to the sometimes ambiguous way designs are credited.",
	"Mediator": false,
	"ExpectedTypes": [
		"people.person",
		"common.topic"
	]
...
}

Measured dimensions

Measured dimensions are everywhere: people height, mountain elevation, engine power etc. We've created all measured dimensions from scratch and combined them in a single domain together with their respective measurement_unit.

{
    "Id": "measured_dimension.distance",
    "Name": "Distance",
    "Description": "",
    "Mediator": true,
    "ExpectedTypes": null,
    "Enumerated": false,
    "Properties": [
        {
            "Id": "measured_dimension.distance.measurement_unit",
            "Name": "Unit of Distance",
            "Description": "",
            "ObjectType": "measured_dimension.distance_unit",
            "Unique": true,
            "Required": true
        },
        {
            "Id": "measured_dimension.distance.value",
            "Name": "Value",
            "Description": "",
            "ObjectType": "type.float",
            "Unique": true,
            "Required": true
        }
    ]
}

Measurement Units

Measurement units include all necessary conversion information to the International System of Units (SI).

{
    "Id": "measured_dimension.distance_unit",
    "Name": "Unit of Length",
    "Description": "A Unit of length is any measure used for linear distance (height, width, etc.). If the unit is fixed, such as the American inch, its length in the SI base unit, meters, should be given. If it is variable or unknown, such as the Biblical cubit, then the meter equivalence should not be specified.",
    "Mediator": false,
    "ExpectedTypes": [
        "common.topic"
    ],
    "Enumerated": true,
    "Properties": [
        {
            "Id": "measured_dimension.distance_unit.abbreviations",
            "Name": "Abbreviations",
            "Description": "Abbreviated text representations for this unit",
            "ObjectType": "type.rawstring",
            "Unique": false,
            "Required": false
        },
        {
            "Id": "measured_dimension.distance_unit.canonical_abbreviation",
            "Name": "Canonical abbreviation",
            "Description": "Globally recognised abbreviated text representation for this unit.",
            "ObjectType": "type.rawstring",
            "Unique": true,
            "Required": false
        },
        {
            "Id": "measured_dimension.distance_unit.measurement_system",
            "Name": "Measurement System",
            "Description": "The measurement system this unit belongs to",
            "ObjectType": "measured_dimension.measurement_system",
            "Unique": false,
            "Required": false
        },
        {
            "Id": "measured_dimension.distance_unit.dimension",
            "Name": "Measured dimension",
            "Description": "The dimension measured by this unit",
            "ObjectType": "measured_dimension.dimension",
            "Unique": true,
            "Required": false
        },
        {
            "Id": "measured_dimension.distance_unit.si_base_conversion_formula",
            "Name": "Convertion formula",
            "Description": "Convertion formula to the System International base unit",
            "ObjectType": "type.rawstring",
            "Unique": true,
            "Required": false
        }
    ]
}

Self contained domains

Properties can not have properties defined in other domains as their "Object Types". The only exception is when they point to mediator types, the type as they hold basic information shared across all domains. Cross-domain references is handled by type inheritance via the "ExpectedType" parameter of the types. In the below example the "book.book_edition_location" type will inherit all properties from its expected type: "location.location", the location in which the book was published will in turn receive the "book.book_edition_location" type in addition its "location.location":

{
    "Id": "book.book_edition_location",
    "Name": "Book Edition Location",
    "Description": "The place, usually a city, where a book edition has been published.",
    "Mediator": false,
    "ExpectedTypes": [
        "location.location",
        "common.topic"
    ],
    "Enumerated": true,
    "Properties": null
}

This makes domains indipendant and keeps the schema clean.

Identifiers (SameAs)

In the domain dataworld, we have created a type dataworld.sameas which contains more than 1600 unique identifiers linking data from external repositories of information to UniGraph. Many include examples.

        {
            "Id": "dataworld.sameas.uk_companies_house_id",
            "Name": "UK Companies House Company ID",
            "Description": "The assigned number of the company by the UK Companies House. Examples: 8209948, SC421617, FC031362, IP10067R",
            "ObjectType": "type.rawstring",
            "Unique": true,
            "Required": false
        }

Strict one to one connections

We've done out best to keep all connections one to one. An example of this is the "book.author" type which has no properties of its own:

{
    "Id": "book.author",
    "Name": "Author",
    "Description": "An author is a creator of a written or published work. The Author type is used for anyone who has written prose (whether fiction, essay, journalism, or scholarship), poetry, drama, or written or edited a book of any sort. This therefore includes editors of anthologies, whether or not the editor has written any material included in the anthology, and also includes artists in other media such as the fine arts and music, who may have had monographs or songbooks (for example) published. It also can include corporate authors, such as organizations, companies and government agencies, when a written work is credited to one, rather than to a person. It does not include scriptwriters for television and film (use TV Writer and Film Writer for these).",
    "Mediator": false,
    "ExpectedTypes": [
        "common.topic"
    ],
    "Enumerated": false,
    "Properties": null
}

It has many incoming properties though, for example from the "book.written_work" type:

        {
            "Id": "book.written_work.author",
            "Name": "Author",
            "Description": "",
            "ObjectType": "book.author",
            "Unique": false,
            "Required": false
        }

Unified periods representation

With small exceptions all references to a date that something started or ended follow the same pattern:

.start_date - denoting the beginning of the period
.end_date - denoting the end of the period

{
    "Id": "business.employment_tenure",
    "Name": "Employment tenure",
    "Description": "'Employment tenure' represents the relationship between a company and a person who has worked there. The company type typically tracks key employees, such as the management team, not all employees who have worked for a company.",
    "Mediator": true,
    "ExpectedTypes": null,
    "Enumerated": false,
    "Properties": [
...
        {
            "Id": "business.employment_tenure.start_date",
            "Name": "From",
            "Description": "",
            "ObjectType": "type.datetime",
            "Unique": true,
            "Required": false
        },
...
        {
            "Id": "business.employment_tenure.end_date",
            "Name": "To",
            "Description": "",
            "ObjectType": "type.datetime",
            "Unique": true,
            "Required": false
        }
    ]
}

The few exceptions are easy to predict:

{
    "Id": "organization.organization",
    "Name": "Organization",
    "Description": "An organization is an organized body of members or people with a particular purpose; in the UniGraph context, a organization doesn't have a business association, like a company.  Companies have their own type in the business domain, located here. In addition, many types of organizations have their own, more specialized co-types that give them additional properties.",
    "Mediator": false,
    "ExpectedTypes": [
        "common.topic"
    ],
    "Enumerated": false,
    "Properties": [
...
        {
            "Id": "organization.organization.date_founded",
            "Name": "Date founded",
            "Description": "The date this organization first came into being.",
            "ObjectType": "type.datetime",
            "Unique": true,
            "Required": false
        }

Contributions are welcome

Passionate about a subject? Feel free to fork, edit and improve the schema. All pull requests are highly appreciated. You can always drop us a line on the contacts listed at unigraph.rocks and follow us on twitter.

Build with love in Slovakia. With the support of ODINE

unigraph-schema's People

Contributors

gaspiman avatar marfi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.