Unigraph now employs a combination of the strict freebase schema and the loose ontology of wikidata. The new schema is available for download as .json here and is explained in great detail here. This repository will no longer be updated, but we've seen serious interest in it as it provides insights as to how data was organized in freebase prior to its closure.
The Unigraph Schema is the backbone of the UniGraph Knowledge Graph. It is inspired by freebase with major modifications in the areas described in more detail below.
You can use the schema to map the objects and their relations from the world we live in. For example, the data from the London DataStore about the Birth and Death rates by Ward can be represented like this:
Ward > dataworld.sameas.uk_ons_gss_code
Births 2002 > measured_dimension.dated_float.value
Ward Name > type.object.name
Ward Name > location.statistical_region.births > measured_dimension.dated_float
Ward Name > location.location
Borough > type.object.name
Borough > location.location
Borough > location.location.contains > Ward Name
- Object Types
- Expected Types
- Measured dimensions
- Measurement units
- Self Contained Domains
- Identifiers
- Strictly One to One connections
- Unified Time Periods Representation
The UniGraph schema is expressed via Types and Properties. Types have properties and are grouped in Domains:
In the architecture domain, the architecture.building type has 4 properties, one of which is: architecture.building.building_complex.
The Mediator Types are used to express complex data, usually with a time dimension. For example the employment tenure mediator holds the information about the start and end date of an employment, the employee, the employeer and the title of the position (if any).
- Mediator types have the
"Mediator"
attribute set totrue
. - Mediator types don't have Expected types, their
"ExpectedTypes"
attribute isnull
. - Mediator types have at least two required properties.
In the employment tenure example the required properties are:business.employment_tenure.company
andbusiness.employment_tenure.person
. The required properties define the minimum set of data required for a Mediator type to hold meaning. A statement describing a relationship between a company and an employee is complete and valueble in itslef, while a statement holding information about the time period of the employment but with missing information about the company or the employee is not. Usually the required properties are also unique and have their"Unique"
attribute set totrue
- in order to force the creation of another Mediator type for the next employment of the same person. An exception of this rule are single property mediators, like the people.sibling_relationship which models the sibling relationship through its only propertpeople.sibling_relationship.sibling
which is Required but not Unique.
All properties have assigned ObjectTypes. This approach enforces the schema rules, controlls the data entering the system and infers types seamlessly. For example the ObjectType of the business.employment_tenure.person property is business.employee.
{
"Id": "business.employment_tenure.person",
"Name": "Person",
"Description": "",
"ObjectType": "business.employee",
"Unique": true,
"Required": true
}
As a result every object in the business.employment_renure.person relationship will receive the business.employee type.
Expected types is a flag to the UniGraph bots and users that a certain node will most probably have other types associated with it. For example, the architecture.architect
type indicates that a node will most probably also be of type people.person
and common.topic
.
{
"Id": "architecture.architect",
"Name": "Architect",
"Description": "\"Architect\" is used for individual contributors to the Built Environment. Also see the type \"Architecture Firm\" for collections of architects. A topic that is of the type \"Structure\" can have one or more \"Architects\" or \"Architecture Firms\" listed as properties, due to the sometimes ambiguous way designs are credited.",
"Mediator": false,
"ExpectedTypes": [
"people.person",
"common.topic"
]
...
}
Measured dimensions are everywhere: people height, mountain elevation, engine power etc. We've created all measured dimensions from scratch and combined them in a single domain together with their respective measurement_unit.
{
"Id": "measured_dimension.distance",
"Name": "Distance",
"Description": "",
"Mediator": true,
"ExpectedTypes": null,
"Enumerated": false,
"Properties": [
{
"Id": "measured_dimension.distance.measurement_unit",
"Name": "Unit of Distance",
"Description": "",
"ObjectType": "measured_dimension.distance_unit",
"Unique": true,
"Required": true
},
{
"Id": "measured_dimension.distance.value",
"Name": "Value",
"Description": "",
"ObjectType": "type.float",
"Unique": true,
"Required": true
}
]
}
Measurement units include all necessary conversion information to the International System of Units (SI).
{
"Id": "measured_dimension.distance_unit",
"Name": "Unit of Length",
"Description": "A Unit of length is any measure used for linear distance (height, width, etc.). If the unit is fixed, such as the American inch, its length in the SI base unit, meters, should be given. If it is variable or unknown, such as the Biblical cubit, then the meter equivalence should not be specified.",
"Mediator": false,
"ExpectedTypes": [
"common.topic"
],
"Enumerated": true,
"Properties": [
{
"Id": "measured_dimension.distance_unit.abbreviations",
"Name": "Abbreviations",
"Description": "Abbreviated text representations for this unit",
"ObjectType": "type.rawstring",
"Unique": false,
"Required": false
},
{
"Id": "measured_dimension.distance_unit.canonical_abbreviation",
"Name": "Canonical abbreviation",
"Description": "Globally recognised abbreviated text representation for this unit.",
"ObjectType": "type.rawstring",
"Unique": true,
"Required": false
},
{
"Id": "measured_dimension.distance_unit.measurement_system",
"Name": "Measurement System",
"Description": "The measurement system this unit belongs to",
"ObjectType": "measured_dimension.measurement_system",
"Unique": false,
"Required": false
},
{
"Id": "measured_dimension.distance_unit.dimension",
"Name": "Measured dimension",
"Description": "The dimension measured by this unit",
"ObjectType": "measured_dimension.dimension",
"Unique": true,
"Required": false
},
{
"Id": "measured_dimension.distance_unit.si_base_conversion_formula",
"Name": "Convertion formula",
"Description": "Convertion formula to the System International base unit",
"ObjectType": "type.rawstring",
"Unique": true,
"Required": false
}
]
}
Properties can not have properties defined in other domains as their "Object Types". The only exception is when they point to mediator types, the type as they hold basic information shared across all domains. Cross-domain references is handled by type inheritance via the "ExpectedType" parameter of the types. In the below example the "book.book_edition_location" type will inherit all properties from its expected type: "location.location", the location in which the book was published will in turn receive the "book.book_edition_location" type in addition its "location.location":
{
"Id": "book.book_edition_location",
"Name": "Book Edition Location",
"Description": "The place, usually a city, where a book edition has been published.",
"Mediator": false,
"ExpectedTypes": [
"location.location",
"common.topic"
],
"Enumerated": true,
"Properties": null
}
This makes domains indipendant and keeps the schema clean.
In the domain dataworld, we have created a type dataworld.sameas which contains more than 1600 unique identifiers linking data from external repositories of information to UniGraph. Many include examples.
{
"Id": "dataworld.sameas.uk_companies_house_id",
"Name": "UK Companies House Company ID",
"Description": "The assigned number of the company by the UK Companies House. Examples: 8209948, SC421617, FC031362, IP10067R",
"ObjectType": "type.rawstring",
"Unique": true,
"Required": false
}
We've done out best to keep all connections one to one. An example of this is the "book.author" type which has no properties of its own:
{
"Id": "book.author",
"Name": "Author",
"Description": "An author is a creator of a written or published work. The Author type is used for anyone who has written prose (whether fiction, essay, journalism, or scholarship), poetry, drama, or written or edited a book of any sort. This therefore includes editors of anthologies, whether or not the editor has written any material included in the anthology, and also includes artists in other media such as the fine arts and music, who may have had monographs or songbooks (for example) published. It also can include corporate authors, such as organizations, companies and government agencies, when a written work is credited to one, rather than to a person. It does not include scriptwriters for television and film (use TV Writer and Film Writer for these).",
"Mediator": false,
"ExpectedTypes": [
"common.topic"
],
"Enumerated": false,
"Properties": null
}
It has many incoming properties though, for example from the "book.written_work" type:
{
"Id": "book.written_work.author",
"Name": "Author",
"Description": "",
"ObjectType": "book.author",
"Unique": false,
"Required": false
}
With small exceptions all references to a date that something started or ended follow the same pattern:
.start_date - denoting the beginning of the period
.end_date - denoting the end of the period
{
"Id": "business.employment_tenure",
"Name": "Employment tenure",
"Description": "'Employment tenure' represents the relationship between a company and a person who has worked there. The company type typically tracks key employees, such as the management team, not all employees who have worked for a company.",
"Mediator": true,
"ExpectedTypes": null,
"Enumerated": false,
"Properties": [
...
{
"Id": "business.employment_tenure.start_date",
"Name": "From",
"Description": "",
"ObjectType": "type.datetime",
"Unique": true,
"Required": false
},
...
{
"Id": "business.employment_tenure.end_date",
"Name": "To",
"Description": "",
"ObjectType": "type.datetime",
"Unique": true,
"Required": false
}
]
}
The few exceptions are easy to predict:
{
"Id": "organization.organization",
"Name": "Organization",
"Description": "An organization is an organized body of members or people with a particular purpose; in the UniGraph context, a organization doesn't have a business association, like a company. Companies have their own type in the business domain, located here. In addition, many types of organizations have their own, more specialized co-types that give them additional properties.",
"Mediator": false,
"ExpectedTypes": [
"common.topic"
],
"Enumerated": false,
"Properties": [
...
{
"Id": "organization.organization.date_founded",
"Name": "Date founded",
"Description": "The date this organization first came into being.",
"ObjectType": "type.datetime",
"Unique": true,
"Required": false
}
Passionate about a subject? Feel free to fork, edit and improve the schema. All pull requests are highly appreciated. You can always drop us a line on the contacts listed at unigraph.rocks and follow us on twitter.
Build with love in Slovakia. With the support of ODINE