Git Product home page Git Product logo

wibarab / featuredb Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 62.4 MB

WIBARAB is a project in the field of Arabic dialectology. It consists of various regional sub-projects (four PhD projects) and a large database about bedouin-type dialects of Arabic. The Feature Database will be the main point of integrating the results of the sub-projects. In this repository we collect the primary data of the database in TEI/XML.

License: Other

XSLT 1.43% HTML 93.50% CSS 0.02% Jupyter Notebook 5.06%
acdh-ch arabic-dialects linguistics

featuredb's Introduction

WIBARAB feature database

About WIBARAB

WIBARAB is a very nice project in the field of Arabic dialectology. It consists of various regional sub-projects (four PhD projects) and a large database about bedouin-type dialects of Arabic.

The Feature Database will be the main point of integrating the results of the sub-projects. In this repository we collect the primary data of the database in TEI/XML.

Principal Investigator: Stephan Procházka (University of Vienna)
National Cooperation Partner: Charly Mörth (Austrian Academy of Sciences)

See https://wibarab.acdh.oeaw.ac.at/ for more information

Contact us at [email protected] or follow us on Twitter.

Status of the data

THIS IS PRELIMINARY DATA AND COPYRIGHTED MATERIAL!

If you want to use any material in this repository please contact us at [email protected]

This will change at the end of the project.

Directory Structure

Directory Content Remarks
001_src Original sources Any external source data coming to the project
082_scripts_xsl XSLT scripts various XSLT scripts to convert the data scripts
102_derived_TEI TEI-XML documents TEI documents derived from a automatized conversion process (from 001_src or elsewhere)
010_manannot manually annotated TEI-XML documents TEI documents which are manually annotated / curated / edited. Automated processed are not expected to write into this directory. We want to make sure that a human curator has validated the data in this directory and that nothing manually curated is overwritten by some script.
802_tei_odd TEI customization (ODD) This is the source of truth for the WIBARAB FeatureDB Schema and the HTML documentation generated from it.
804_xsd XML Schemas These are derived from the ODD in 802_tei_odd. Each version of the schema should bear its number in the file name.
850_docs Documentation Further data documentation, encoding guidelines etc.

Schema Development

At this point, the model of the WIBARAB Feature Database schema is still evolving to a certain extent while new data is being curated, existing data being curated etc. In order to make sure that transitioning from one version of the schema to the next happens in a structured manner, we set up the following rules:

  • Any development of the schema is done in 802_tei_odd/featuredb.odd. This file might also contain unpublished, unfinished, backwards-incompatible changes not reflected in any derived schema or documentation.
  • Naming conventions: We follow the Semantic Versioning Best Practices 2.0.0 which - applied to our case - boil down to the following principles:
    • If a change potentially makes documents invalid which were previously valid, it is a new MAJOR version (i.e. increment the first number)
    • If a change does not break validity of existing documents (e.g. in that it only adds optional elements or attributes or adds a significant portion of prose to the documentation) it is a new MINOR version (i.e. increment second number)
    • If a change in the schema is merely a bug fix (typo etc.) or a minor addition to the documentation (change in wording, added examples etc.) this constitutes a PATCH version (i.e. the third number is incremented).

Schema release workflow

When a new version of the schema is to be released:

  • In the ODD document:
    • update @n on <edition> to only contain the exact version number (e.g. 2.1.3b).
    • change <edition> to include the version number. These elements are treated only as labels and can thus include human-readble additions (like e.g. Version 2.1.3 Beta)
    • add a <change> element with your editor ID and the current date, setting @status="published". Ideally add a <list> with all the changes you did in the ODD.
    • Do not change the filename of the ODD document.
  • In oXygen:
    • Generate the XSD schema from the ODD by right-clicking on 802_tei_odd/featuredb.odd and selecting Transform > Transform with > TEI ODD to XML Schema. The resulting files are placed into a new directory 802_tei_odd/out.
    • create a new subfolder named {versionnumber} in 804_xsd/, e.g. 804_xsd/2.1.3b/ and move the files from 802_tei_odd/out to that folder.
  • Generate the html documentation and place it under 850_docs/featuredb_{versionnumber}.html
  • Afterwards delete 802_tei_odd/out.
  • Write a conversion script to transform documents from the previous schema version to the current one.
    • Important: make sure that the conversion script updates the @xsi:schemaLocation in the migrated document instance.
    • Place the XSLT script under 082_scripts_xsl/migrations and name it migrate_to_{versionnumber}.xsl (e.g. migrations/migrate_to_1.0.0b.xsl`).
  • Run the conversion script on the oddtest.xml document in 802_tei_odd and check it does produce the wanted results.
  • Apply the conversion script to the files in 010_manannot. They should be output to 102_derived_TEI
  • Commit all changes to git and add a tag named after the schema version number.
  • Curators have to check the converted TEI documents and move them from 102_derived_TEI to 010_manannot to approve the change.

About this file

This README file has a long-wound and dark history of editing. If you dare, you can check it out here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.