Git Product home page Git Product logo

eubg-data's Introduction

Semantic Data Model (Ontology) for Company Data

The euBusinessGraph project aims at simplifying cross-border and cross-lingual collection, reconciliation, aggregation and analysis of company-related information from several authoritative and non-authoritative sources.

The euBusinessGraph has drawn on the experience of its data providers and technology providers to tackle the complex task of combining company data from multiple sources. We have defined a common semantic model (ontology) to represent companies and their attributes in a consistent way.

  • Based on project needs and provider datasets
  • Rooted in and reuses existing ontologies and datasets
  • Expressed in comprehensive EBG Semantic Model doc
  • Formalized as ebg: ontology using schema:(domain|range)Includes
  • Also defines URL patterns and authorities/lookup lists to use
  • Will be validated with RDF Shapes

Common Semantic Model (ontology)

We created an initial company data model considering related works, data available from the partners, and the needs of their business cases. The model covers the following requirements:

  • Capture the concept of a company and represent different types of companies.
  • Represent company jurisdictions and registration information.
  • Capture company contact information, such as the address and other locations.
  • Capture social data of companies, such as their websites (together with Web languages), RSS/Atom feeds and Wikipedia URLs.
  • Answer if a company is publicly traded or not, if it is state owned or not, and if it is registered in a startup register.
  • Support languages: EN, IT, NO.

In developing the company data model we have reused from appropriate ontologies such as:

  • EU Core Vocabs: W3C Org, RegOrg, Location, Person (not W3C)
  • schema.org: widely used, some relevant properties (e.g., dates)
  • ADMS: datasets and identifiers

Figure 1: Towards a common semantic model for company data

Since none of the existing ontologies covers the complete scope we need, we reuse where possible and extend and compose by:

  • Add some classes and properties of our own (ebg: ontology)
  • Use schema:(domain|range)Includes instead of rdfs:(domain|range) for easier composition (polymorphic vs monomorphic)

In addition we define RDF Shapes (SHACL and ShEx) to validate incoming data.

Company data

In its first release, the model focuses on capturing key company information present in official registers such as legal name, registered address and economic classification, and also information coming from online resources related to the company such as company websites, blogs and social media accounts. These aspects are explicitly incorporated into the model and describe company information that is shared across data providers and directly accessible through the graph. Additionally, the model supports advertising other company related information available from data providers directly.

Figure 2: Company data attributes that are covered by the model

Identifier System

We have performed a thorough analysis of identifiers in the context of euBusinessGraph. From the analysis of the different identifier systems and the requirements of the business cases of the project, we singled out key aspects about identifiers and addressed them in the common semantic model.

Achieving matching and reconciliation across jurisdictions and registers requires careful modelling of identifier use. This release models the different cases through properties that describe the lifecycle of each identifier issued and by encoding a series of characteristics of the identifier system to which the identifier belongs. We follow a pragmatic approach when describing identifier systems in terms of these characteristics.

We model expectations of a particular system that should help determine to which extent an indicator can be used for matching and reconciliation. Additionally, we model web resources that are frequently found for identifier systems such as search endpoints, templates for building identifier URLs through which company information can be reached and other resources that describe the system’s rules. Finally, the model supports the representation of the different agents that are in charge of setting and maintaining rules, issuing identifiers and publishing identifier databases.

Figure 3: Identifier System attributes that is covered by the modelFurther information

GitHub Repository

This repository contains the sources for the euBusinessGraph Semantic Model for representing company-related data. Here we will keep:

  • Prefixes file
  • Instance model file in Turtle format
  • Instance model files for diagrams
  • Generated ontology file in RDF format
  • Generated online documentation using LODE
  • RDF shapes for validation
  • RDF data (e.g. NACE csv sheet), conversion scripts and resulting RDF
  • Diagrams for the master document (links to full-size diagrams and source files here)

References

For further details about the euBusinessGraph ontology:

eubg-data's People

Contributors

vladimiralexiev avatar elvesater avatar fsesintef avatar bmzernichow avatar paniagua-sdati avatar bgrova avatar mihajenko avatar tarasova-spaziodati avatar skenaja avatar patzomir avatar

Stargazers

 avatar Evstifeev Roman avatar Michael WANG Fei avatar Andreas Motl avatar Miika Alonen avatar Periklis Papanikolaou avatar Eugeniu Costetchi avatar  avatar lawrence rowland avatar Nolan Nichols avatar 锅巴GG avatar Sam V avatar Steve Simmons avatar wincenzo avatar Henri Egle Sorotos avatar Mike Frager avatar Baudoin Delépine avatar Martins avatar Marcus Nölke avatar Matty Smith avatar  avatar Konstantin Sokolov avatar Carmen Chui avatar Renat Shigapov avatar Marcel Fröhlich avatar Fredrik Lindén avatar Alfredo Serafini avatar Joseph avatar danielksan81 avatar Dominic Wörner avatar  avatar Michalis Vafopoulos avatar Nyimbi Odero avatar Ping Zou avatar Paul Mackay avatar  avatar  avatar  avatar  avatar  avatar Nikolay Nikolov avatar  avatar  avatar  avatar Nils Blum-Oeste avatar Raphael Troncy avatar

Watchers

Andreas Kuckartz avatar  avatar James Cloos avatar Raphael Troncy avatar  avatar  avatar  avatar Michalis Vafopoulos avatar Arnt Henning Moberg avatar Stephan Gensch avatar Pete Rivett avatar Nikolay Nikolov avatar Tatiana Tarasova avatar  avatar  avatar Vladimir Rüntü avatar  avatar  avatar Seth Meldon avatar

eubg-data's Issues

identifier-BRC.ttl: define https://www.brreg.no

I see great additions in https://github.com/euBusinessGraph/eubg-data/blob/master/data/identifier/identifier-BRC.ttl and I think @bgrova made some changes so I see you now have access.

  • @bgrova please provide some info about <https://www.brreg.no> (see <https://www.registryagency.bg> in identifier-BG.ttl)
  • @bgrova go through the file and edit some of the comments (eg "conneg is expected to be functional on ..." and remove those at end of file)
  • @bmzernichow uses a slightly different URL in company data (dct:creator): <https://www.brreg.no> (no trailing slash). Please use the same URL: remove the trailing slash from identifier-BRC.ttl because it's just 1 file

add license URLs and images

<dataset/ONTO> a void:Dataset;
  dct:license <https://opendatacommons.org/licenses/by/>;
<dataset/ONTO/BG> a void:Dataset;
  dct:license <https://opendatacommons.org/licenses/by/>;

UK LAU

OCORP data includes all address admin units, eg

        locn:adminUnitL1   nuts:UK ;
        locn:adminUnitL2   nuts:UKI ;
        ebg:adminUnitL3    nuts:UKI2 ;
        ebg:adminUnitL4    nuts:UKI22 ;
        ebg:adminUnitL5    lau:E09000024 ;
        ebg:adminUnitL6    lau:E05000455 ;

Unfortunately we don't have UK LAU in eubg-data\data\LAU\rdf.
We have some tabular data in https://github.com/euBusinessGraph/eubg-data/tree/master/data/LAU, could you try to make RDF out of it?

  • data\LAU2\uk.csv
  • LAU216_LAU116_NUTS315_NUTS215_NUTS115_UK_LU.csv

problems with props added Oct 2019

Problems with elements introduced Oct 2019 (cc @elvesater @bmzernichow):

### What is the "value" of an event? The example makes this ever more puzzling
ebg:eventValue a owl:DatatypeProperty ;
  schema:domainIncludes sem:Event ;
  schema:rangeIncludes xsd:string ;
  rdfs:label "event type value" ;
  skos:definition "Value linked to an eventType that occurs to a company or a site" .
  skos:example "C" .

#### range should be date, dateTime
sem:hasTime a owl:DatatypeProperty ;
  schema:domainIncludes sem:Event ;
  schema:rangeIncludes xsd:string ;
  rdfs:label "time of event" ;
  skos:definition "Has time is used to indicate at which time an Event took place" .
  skos:example "2010-11-18"^^xsd:date ;

### This is just the same as org:unitOf, see https://www.w3.org/TR/vocab-org/. Remove
ebg:isUnitOf a owl:DatatypeProperty ;
  schema:domainIncludes rov:OrganizationalUnit ;
  schema:rangeIncludes rov:RegisteredOrganization ;
  rdfs:label "is unit of organization" ;
  skos:definition "Indicates that an entity is a sub-unit of a larger organization" .

### should be object not datatype property. 
### Make it subprop of org:hasUnit
### definition is wrong: "indicates" applies to a boolean prop not a relation
ebg:hasHQUnit a owl:DatatypeProperty ;
  schema:domainIncludes rov:RegisteredOrganization ;
  schema:rangeIncludes rov:OrganizationalUnit ;
  rdfs:label "headquarter unit" ;
  skos:definition "Indicates that an entity is a headquarter of an organization" .

add rdfs:isDefinedBy

Add this to all ebg: terms: classes, properties, concept schemes (but maybe not Concepts since they got inScheme)

rdfs:isDefinedBy ebg:  ;

I think do not add it to terms defined by other ontologies, since this would be like stealing credit

harmonize UK/GB jurisdictions

@bmzernichow @elvesater
Is it UK or GB? To a common joe like me these two mean the same, and unfortunately we got some confusion in the data.

  • OCORP and SDATI data uses both:
    • GB (dbo:jurisdiction "GB")
    • UK: nuts:UK nuts:UKI nuts:UKI2 nuts:UKI22
  • OCORP ID uses GB: "gb/123456"
  • identifier-OCORP.ttl and identifier-SDATI.ttl use GB: dct:spatial nuts:GB
  • dataset-OCORP.ttl and dataset-SDATI.ttl use eg GB: <dataset/SDATI/GB>
  • neogeo-nuts-0.91.ttl uses UK: nuts:UK and ramon:code "UK"
  • LAU tables use UK: E09000002 part of UKI52(#11)
  • named graphs use UK: <provider/ocorp/uk> <provider/sdati/uk> (https://github.com/euBusinessGraph/eubg-data/blob/master/data/dataset/inDataset.ru)

This causes a problem in metadata (dataset) display

problems with props added Nov 2019

@elvesater

### Why do you need this given we have adms:identifier (quoted below)?
### A string code without indication which system it belogs to is pointless. "related to an IdentifierSystem or an Identifier" is imprecise and untrue.
### This can include URL?? Why not use schema:sameAs for that case
### Remove
schema:identifier a owl:DatatypeProperty;
  schema:domainIncludes person:Person , schema:Organization;
  schema:rangeIncludes xsd:string ;
  rdfs:label "identifier" ;
  skos:definition "An identifier for a resource." ;
  skos:scopeNote "Used to represent the identifier (e.g. URL) of a Person or Organization related to an IdentifierSystem or an Identifier." .

adms:identifier a owl:ObjectProperty ;
  schema:domainIncludes rov:RegisteredOrganization, ebg:IdentifierSystem ;
  schema:rangeIncludes adms:Identifier ;
  rdfs:label "identifier" ;
  skos:definition "An identifier of a Company (the official identifier is rov:registration) or Identifier System / Register";

### Typo in the prop URL and range
schema:gender a owl:DatatypeProperty;
  schema:domainIncludes person:Person ;
  schema:rangeIncludes xsd:string ;
  rdfs:label "person birth date" ;
  skos:definition "The birth date of the person." .

Missing info from ontology doc

If you compare the old documentation (patched LODE) and new documentation (PyLODE), you'll find a lot of missing info. I will post them as individual issues in https://github.com/RDFLib/pyLODE (root issue RDFLib/pyLODE#108).

But until they are fixed, @elvesater please :

Class WebResource

Class RegisteredOrganization:

Property adminUnitL3

Property identifier

Other:

harmonize company types

cc @bmzernichow @bgrova
https://github.com/euBusinessGraph/eubg-data/tree/master/data/lookups

  • Currently we have these files:
    EBG-company-type.xlsx: saved from gsheet
    EBG-company-type.csv: saved from xls
    EBG-company-type.tarql: converts to ttl but only "BG" codes
    EBG-company-type.ttl: BG, converted by tarql
    EBG-company-type-brc.ttl: made by hand?
    EBG-company-type-ocorp-uk.ttl: made by hand?
    EBG-company-type-sdati-it.ttl: made by hand?
  • my idea was to edit the xls (or gsheet) and then generate for all jurisdictions from it. I know that @bgrova made a xls on dropbox, it was a matter of only unifying those xls. Do you agree to work in a common excel?
  • also, we have differing concept schemes:
<type/BG/ET>  rdf:type  skos:Concept ;
        skos:inScheme   <type> ;
# but then
<type/NO> a skos:ConceptScheme; 
        rdfs:label "NO company types"@en, "NO organisasjonsform"@no .

<type/NO/NUF> a skos:Concept ;
        skos:inScheme   <type/NO> ;
  • I think the model specifies a single scheme, not per-jurisdiction. If you agree, then I must make a <type> scheme.
  • BG has consecutive ebg:order whereas NO has all set to 1. This makes the order useless. If there is some meaningful order to those codes (eg he most popular first), use ebg:order else omit it. But don't use a constant 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.