Git Product home page Git Product logo

extra-examples's Introduction

extra-examples

Examples for the IPTC EXTRA classification engine. This repository contains the objects/resources that can be used to validate the proper functionality of IPTC EXTRA project.

See the other repositories of the IPTC EXTRA project:

There are five types of resources: rules, schemas, taxonomies, topics and corpora of documents. After a successful deployment of EXTRA Toolkit, following the guide you can find in the extra-ext repository, these objects should be inserted into the platform.

Note! You can import all the resources by running this python script.

$ python insert_resources.py params.json

The only change you need to make is to add the directory of the documents (documents_file) in the params.json file. To get the document please fill the form you can find here.

Taxonomies and Topics

In the current version of EXTRA, there are two taxonomies with their corresponding topics. More specifically, the two taxonomies are IPTC's Media Topics in english and german respectively.

To create these taxonomies in EXTRA use the corresponding EXTRA API method:

POST /taxonomies

{
  "language": "english",
  "name": "IPTC Media Topics"
}

POST /taxonomies

{
  "language": "german",
  "name": "IPTC Media Topics"
}

Upon successful creation, each of these methods will return the newly created taxonomy with a unique taxonomy id. For example:

Response

201 - Created

{
  "id": "5901b9e5c41479000146ced2",
  "language": "english",
  "name": "IPTC Media Topics"
}

To create topics in a taxonomy e.g. in the taxonomy with id=5901b9e5c41479000146ced2 created before:

POST /taxonomies/5901b9e5c41479000146ced2/topics

{
  "taxonomyId": "5901b9e5c41479000146ced2",
  "name": "arts, culture and entertainment",
  "definition": "Matters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions ",
  "topicId": "medtop:01000000"
}

To insert these two taxonomies and their corresponding topics use the script files and the guide can be found here.

Schemas

To create a new schema in EXTRA:

POST /schemas

{
  "name":"Apa Schema",
  "fields":[
    {"name":"title", "hasParagraphs":false, "hasSentences":true, "textual":true},
    {"name":"subtitle", "hasParagraphs":false, "hasSentences":true, "textual":true},
    {"name":"body", "hasParagraphs":true, "hasSentences":true, "textual":true},
    {"name":"id", "hasParagraphs":false, "hasSentences":false, "textual":false},
    {"name":"slugline", "hasParagraphs":false, "hasSentences":false, "textual":false},
    {"name":"versionCreated", "hasParagraphs":false, "hasSentences":false, "textual":false}],
    "language":"german"
}

In EXTRA there are two schemas one for articles from Thomson Reuters and one for Austrian Press Agency. You can see these schemas here.

Corpora

To create a new corpus that follows a specific schema and its documents are annotated with the topics of a specific taxony:

POST /corpora

{
  "name": "Apa Corpus",
  "language": "german",
  "schemaId": "591f070c30c49e00011de8eb" ,
  "taxonomyId": "5901b9ebc41479000146ced3"
}

To index documents, for developement and testing of rules, into the newly created corpus, follow the guide here.

Rules

To create a new rule for a given taxonomy and a topic within that taxonomy:

POST /rules

{
	"name": "Test Rule",
	"query": "(or (and (title adj/regexp \"\\d+\\-?\\s+?year\\-?\\s?old\") (body any/stemming \"boy child children girl infant juvenile kid newborn schoolboy schoolgirl toddler\") ) (and (title adj/regexp \"\\d+\\-?\\s+?month\\-?\\s?old\") (body any/stemming \"boy child children girl infant juvenile kid newborn schoolboy schoolgirl toddler\") ) )",
	"taxonomy": "5901b9e5c41479000146ced2",
	"topicId": "medtop:20000790"
}

The taxonomy id and the topic id must correspond to resources that have already created into the platform. The rules developed during EXTRA project can be found here. To insert them into EXTRA use the script and follow the corresponding guide here.

Contact for further details about the project

Technical details: Manos Schinas ([email protected]), Symeon Papadopoulos ([email protected])

Project as a whole: IPTC ([email protected])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.