adiwg / mdtranslator Goto Github PK
View Code? Open in Web Editor NEWMetadata translation tool built using Ruby
Home Page: https://www.adiwg.org/mdTranslator/
License: The Unlicense
Metadata translation tool built using Ruby
Home Page: https://www.adiwg.org/mdTranslator/
License: The Unlicense
Mapping of sbJson identifier to mdTranslator internal data structure:
"identifiers":
[
{
"type":"Type",
"scheme":"Scheme",
"key":"Key"
},
{
"type":"Type2",
"scheme":"Scheme2",
"key":"Key2"
}
]
Definition...
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
citation: {
identifiers: [
{
identifier: 'Key',
namespace: 'Scheme',
version: nil,
description: 'Type',
citation: {}
},
{
identifier: 'Key2',
namespace: 'Scheme2',
version: nil,
description: 'Type2',
citation: {}
}
]
}
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"citation": {
"identifier": [
{
"identifier": "Key",
"namespace": "Scheme",
"description": "Type"
},
{
"identifier": "Key2",
"namespace": "Scheme2",
"description": "Type2"
}
]
}
}
}
}
In the dev branch, a default is no longer being supplied. This causes a problem when calling the CLI - you get a non-specific error message.
We should use the NGDC codelists for ISO output until we get our own lists up.
http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml
The ISO lists are older and don't include the NGDC additions, e.g. CI_RoleCode > collaborator.
Are the 19115-1 lists available online yet?
The way the mdJSON reader converts GeoJSON to the internal format is too ISO-specific. This makes creating other writers more difficult, including mdJSON. AS it stands now, it's not possible to accurately re-create the mdJSON passed in to the reader.
Some issues I'm seeing:
bbox
properties. This should probably be responsibility of the writer. There's currently no way to re-attach the 'bbox' to the GeoJSON element. I suggest we only generate bbox elements in ISO for Features with bboxes where geometry = null.I'll probably just patch in enough functionality to support the mdJSON writer for now. We'll need to re-factor the other writers if we make breaking changes.
Mapping of sbJson provenance to mdTranslator internal data structure:
"provenance":
{
"annotation":"Provenance1",
"dataSouce":"Input directly",
"dateCreated":"2015-11-09T19:02:45Z",
"lastUpdated":"2015-11-09T19:02:45Z",
"lastUpdatedBy":"[email protected]",
"createdBy":"[email protected]",
"fileProcess": "???",
"linkProcess": "???"
}
Definition...
Set to 'generated using ADIwg mdTranslator 2.0.0'
Map to metadata metadataInfo metadataDate:
[schema][metadata][metadataInfo][metadataDate][n].date
view mdTools
[schema][metadata][metadataInfo][metadataDate][n].dateType
view mdTools
lastUpdated: The date and time the item was last updated.
Map to metadata metadataInfo metadataDate:
[schema][metadata][metadataInfo][metadataDate][n].date
view mdTools
[schema][metadata][metadataInfo][metadataDate][n].dateType
view mdTools
createdBy: The person or organization who created the item.
Reading sbJson: ignore
Writing sbJson:
[schema][metadata][metadataInfo][metadatacontact][n].role when in list
[ originator author resourceProvider coAuthor ]. All [party][contactId] > [contact][n][name] will be concatenated.
view mdTools
lastUpdatedBy: The last person or organization to update the item.
Reading sbJson: ignore
Writing sbJson:
[schema][metadata][metadataInfo][metadatacontact][n].role when in list
[ custodian ]. All [party][contactId] > [contact][n][name] will be concatenated.
view mdTools
Not Mapped:
Mapping to mdTranslator from sbJson
def newBase
{
metadata: {
metadataInfo: {
metadataIdentifier: {},
parentMetadata: {},
defaultMetadataLocale: {},
otherMetadataLocales: [],
metadataContacts: [],
metadataDates: [
{
date: '2015-11-09T19:02:45Z',
dateResolution: 'YMDhmsZ',
dateType: 'creation',
description: nil
},
{
date: '2015-11-09T19:02:45Z',
dateResolution: 'YMDhmsZ',
dateType: 'lastUpdate',
description: nil
}
],
metadataLinkages: [],
metadataMaintenance: {},
alternateMetadataReferences: [],
metadataStatus: nil,
extensions: []
}
}
}
end
The tests are failing with some seed patterns.
Try running: rake TESTOPTS="--seed=10387"
TestTranslation_v1_0#test_minimum_success is failing. I suspect there is some sort of caching issue??
I noticed looking through the Schema Viewer that under domain>member we have three elements: name, value and definition. I'm not sure what is meant by name and value. But the elements needed are code and name. Code being the coded value in the data resource, and name the English value associated with the code.
Add the ContentInformation class to mdTranslator to document the connection between the resource metadata record and accompanying feature catalog metadata records (FC_FeatureCatalogue - iso 19110) describing data dictionaries. There is a one to many relationship between the resource and feature catalog.
Mapping of sbJson citation to mdTranslator internal data structure:
{
"citation": "Jay Diffendorfer, Roger Compton, Louisa Kramer, Zach Ancona, and Donna Norton, 2015-05, Onshore Industrial Wind Turbine Locations for the United States to March 2014: United States Geological Survey (USGS): Denver, CO, http://pubs.usgs.gov/ds/817/downloads/USGSWind_Turbine_03_2014.zip, http://dx.doi.org/10.3133/ds817, http://eerscmap.usgs.gov/windfarm/."
}
Definition: The citation that can be used to reference the item.
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
citation: {
title: nil,
alternateTitles: [],
dates: [],
edition: nil,
responsibleParties: [],
presentationForms: [],
identifiers: [],
series: {},
otherDetails: [
'Jay Diffendorfer, Roger Compton, Louisa Kramer, Zach Ancona, and Donna Norton, 2015-05, Onshore Industrial Wind Turbine Locations for the United States to March 2014: United States Geological Survey (USGS): Denver, CO, http://pubs.usgs.gov/ds/817/downloads/USGSWind_Turbine_03_2014.zip, http://dx.doi.org/10.3133/ds817, http://eerscmap.usgs.gov/windfarm/.'
],
onlineResources: [],
browseGraphics: []
}
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"citation": {
"otherCitationDetails": [
"Jay Diffendorfer, Roger Compton, Louisa Kramer, Zach Ancona, and Donna Norton, 2015-05, Onshore Industrial Wind Turbine Locations for the United States to March 2014: United States Geological Survey (USGS): Denver, CO, http://pubs.usgs.gov/ds/817/downloads/USGSWind_Turbine_03_2014.zip, http://dx.doi.org/10.3133/ds817, http://eerscmap.usgs.gov/windfarm/."
]
}
}
}
}
Mapping of sbJson link to mdTranslator internal data structure:
"link":
{
"rel":"self",
"url":"<full url of item or search>",
"nextlink":
{
"rel":"next",
"url":"<url of next result>"
},
"prevlink":
{
"rel":"prev",
"url":"<url of previous result>"
},
"relatedItems":
{
"rel":"related",
"url":"<url of related items list>"
}
}
Definition...
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
citation: {
onlineResources: [
{
olResURI: '<full url of item or search>',
olResProtocol: nil,
olResName: nil,
olResDesc: 'self',
olResFunction: 'navigation'
}
]
}
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"citation": {
"onlineResource": [
{
"uri": "<full url of item or search>",
"protocol": "",
"name": "",
"description": "self",
"function": "navigation"
}
]
}
}
}
}
Add the alternateTitle element to Citation ...
"citation": {
"title": "",
"alternateTitle": "",
"date": [
{
"date": "0000-00-00",
"dateType": ""
}
],
The SRS section says Well-Know-Test. Should be Well-Known-Text. ๐
Writing the HTML writer I have run into a problem of sorts that got me thinking about the structure of the internal data store. We normalized contacts in mdJson and that works well. I kept that normalization in the internal data store and now find it inconvenient when writing liquid templates since liquid does not allow any code imbedded within a template. I can likely find a way to work through this but raised a bigger question.
If we normalize contacts in the internal data store then every reader will either need to have a normalized contact section or figure out how to collect the scattered contacts and build the normalized contact block in the data store. Is this a restriction we want to place on all future readers? If not work needs to be done to the mdTranslator.
De-normalizing the contacts in the internal data store will not be too much work. It should also be relatively easy to change in the two ISO writers. The bulk of the work will be in the mdJson reader which will need to distribute the contacts to each responsibleParty. Also most all the unit test cases will need to be rewritten - the examples should be fine - since this is the content we wrote test cases for first.
Thoughts?
How about wrapping elements that typically have pre-formatted text blocks, i.e. paragraphs, in <pre>
tags. Otherwise my new lines are ignored(along with any other white space) when rendered as HTML. Would be especially useful for abstract.
The ruby docs are incomplete: http://www.rubydoc.info/gems/adiwg-mdtranslator/0.12.1
We should pick between rdoc or yard.
See http://docs.seattlerb.org/rdoc/ or http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md
Running rake TESTOPTS="--seed=26538"
will cause a failure (tc_reader_mdjson_metadataInfo.rb) due to the order in which tests are being run, combined with the use of clone
on a class variable(@@hIn
). Clone produces a shallow copy. See here for an explanation.
Using Marshal::load(Marshal.dump(@@hIn))
will fix the problem. A cleaner solution would be to rewrite the tests and remove dependency on the class variable(s).
In the codelist URIs http://mdtranslator.herokuapp.com should be http://mdtranslator.adiwg.org.
See: https://github.com/adiwg/mdTranslator/search?q=mdtranslator.herokuapp.com&type=Code
Josh - I started looking into where and when messages are formatted by mdTranslator and see the following...
Error formatting done by the CLI depends on the chosen options.
Were you experiencing something different than what is stated above that I need to fix?
Or did you need something to work differently from the above?
HTML writer fails when responsibleParty contactId does not have a matching contactId in the contact array. The ISO writer does not fail but also does not report the error. This condition should be trapped in the mdJson reader, set readerPass to false, and prevent execution of the writer.
SB Core Model
Need to support the following "facets":
Notes from meetings with SB team(login required): https://my.usgs.gov/confluence/display/fortproj/LCC+Metadata+Support+Meeting+Minutes
Link to Crosswalk
See also v1.0 Implementation
Add a readme file in kramdown format for each reader and writer. Provide access to these files from mdtranslator-rails to provide the content for the API web pages describing the particular reader or writer.
It would be useful to add support for -v or --version
Waiting on adiwg/mdJson-schemas#34
The current parameter order for translate is (file, reader, writer, validation level, show empty).
The issue is validation applies to the reader, but the writer parameter is in between. That makes it awkward to specify a validation level when no writer is needed. Ruby does not allow the following syntax translator(file, reader,, validation level) even when a default value for the missing parameter is defined.
If I move the parameters around to translate(file, reader, validation level, writer, show empty) I think translate(file, reader, validation level) will pass for polymorphism.
This should not effect the CLI or API as these are both named parameter driven.
What you think?
Add a computed bounding box to each geometryCollection, feature, featureCollection, and geographicExtent placed into the internal object. The native GeoJSON will not be modified. The computed bounding box is in addition to bbox(s) provided in the GeoJSON and will not replace or modify them.
A non-semantic version, e.g. 1 causes a failure:
TypeError: no implicit conversion of nil into String
.../Projects/ADIwg/adiwgTranslator/mdTranslator/lib/adiwg/mdtranslator/readers/mdJson/mdJson_validator.rb:72:in `+'
.../Projects/ADIwg/adiwgTranslator/mdTranslator/lib/adiwg/mdtranslator/readers/mdJson/mdJson_validator.rb:72:in `validate'
Mapping of sbJson body to mdTranslator internal data structure:
{
"body": "The main goal of this item is to show you how to create an item"
}
Definition: The ScienceBase Body is an open descriptive field for an item that may contain rich formatting (accessible by clicking the arrow button at the top right of the form field). The Body should provide users with a clear understanding of an item with as much formatting as necessary to give the user a clear reading experience.
{
"summary": "The main goal of this item is to show ..."
}
Definition: A concatenated version of the body auto generated by ScienceBase.
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
abstract: 'The main goal of this item is to show you how to create an item',
shortAbstract: 'The main goal of this item is to show ...'
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"abstract": "The main goal of this item is to show you how to create an item",
"shortAbstract": "The main goal of this item is to show ..."
}
}
}
All our testing to date has been with a 'full_test_example.json' input file. We need to find the effects of a minimal input file. How will the schema validator and mdTranslator react to the lack of data.
Assuming semantic versioning as ...
X. -> major release (not backwards compatible)
X.Y. -> minor release (backwards compatible, might also add "new" functionality)
X.Y.Z -> patch (backwards compatible)
Here are the things bugging me about versioning mdTranslator:
The iso19115_2 writer is not outputting the onlineResource for metadata{ } > resourceInfo{ } > citation{ }.
It's time to code the RESTful API for our hosted mdTranslator. Here's a first pass at defining our API. Please contribute your thoughts.
GET:
http://...someplace.../api/v1 # returns documentation on version 1 API
POST:
format ...
http://...someplace.../api/v1/reader/writer?options
supported endpoints ...
http://...someplace.../api/v1/adiwgJson/iso19115_2
options ...
I placed the reader and writer as segments, which may not be strictly RESTful, but because the options list may vary with type of reader and writer. E,g, jsonValidation only applies if input is JSON and showEmpty only applies if write returns XML.
Mapping of sbJson titles to mdTranslator internal data structure:
{
"title":"Item Title"
}
Definition: (none)
{
"subTitle":"Item Subtitle"
}
Definition: A subtitle may be an explanatory or alternate title that provides additional information about the item. The subtitle may clarify the theme of the item, enhancing the title.
{
"alternateTitles": [
"Alternate Titles",
"Alternate Titles 2",
"Alternate Titles 3"
]
}
Definition: (none)
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
citation: {
title: 'Item Title',
alternateTitles: [
'Item Subtitle',
'Alternate Titles',
'Alternate Titles 2',
'Alternate Titles 3'
]
}
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"citation": {
"title": "Item Title",
"alternateTitle": [
"Item Subtitle",
"Alternate Titles",
"Alternate Titles 2",
"Alternate Titles 3"
]
}
}
}
}
Mapping of sbJson purpose to mdTranslator internal data structure:
{
"purpose":"Purpose"
}
Definition: The Purpose field in ScienceBase is an open text field that may contain a statement about the intended purpose of a given item.
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
purpose: 'Purpose'
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"purpose": "Purpose"
}
}
}
The open button doesn't trigger the click event on the summary header, so invalidateSize() is never run on the Leaflet maps (to force a redraw). Possible solution is to add an additional listener for the open button click event:
L.DomEvent.addListener(L.DomUtil.get('openAllDetails'), 'click', function() {
var me = this;
var i = 0;
setTimeout(function() {
check(i, me, bnds);
}, 100);
}, map);
SB Core Model
Need to support the following "facets":
Notes from meetings with SB team(login required): https://my.usgs.gov/confluence/display/fortproj/LCC+Metadata+Support+Meeting+Minutes
Link to Crosswalk
See also v1.0 Implementation
Mapping of sbJson id to mdTranslator internal data structure:
{
"id":"<unique id>"
}
Definition: The unique ScienceBase id of the item. ScienceBase assigns its own universally unique identifier to every item and uses it consistently throughout the architecture for all references. The UUID may be expressed as an HTTP URI (universal resource identifier) in some circumstances, but the basic ID is listed as a UUID string in the core model.
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
metadataInfo: {
metadataIdentifier: {
identifier: '<unique id>',
namespace: 'gov.sciencebase.catalog',
description: 'US Geological Survey ScienceBase identifier',
version: nil,
citation: {}
},
},
resourceInfo: {
citation: {
identifier: [
{
identifier: '<unique id>',
namespace: 'gov.sciencebase.catalog',
description: 'US Geological Survey ScienceBase identifier',
version: nil,
citation: {}
}
]
}
}
}
}
end
Translation to mdJson:
{
"metadata": {
"metadataInfo": {
"metadataIdentifier": {
"identifier": "<unique id>",
"nameSpace": "gov.sciencebase.catalog",
"description": "US Geological Survey ScienceBase identifier"
}
},
"resourceInfo": {
"citation": {
"identifier": [
{
"identifier": "<unique id>",
"nameSpace": "gov.sciencebase.catalog",
"description": "US Geological Survey ScienceBase identifier"
}
]
}
}
}
}
Need to catch this error and handle it gracefully.
Rails app is failing here: adiwg-mdtranslator (1.0.0rc1) lib/adiwg/mdtranslator/writers/iso19110/class_FCfeatureCatalogue.rb:31:in
writeXML'`
Convert the mdTranslator return hash $response to a class variable. This will likely involve inheriting all writer classes from a class containing the @@response object.
Currently the HTML outputs empty sections, i.e. no data supplied to reader. I think it would be better(less confusing) to only output sections for which data is provided. The TOC would also need to reflect this.
Mapping of sbJson rights to mdTranslator internal data structure:
{
"rights":"Rights"
}
Definition: Rights for cataloged information items refer to intellectual property restrictions. Permissions from data provider may be required for using information. Manuscript access may require fees.
Mapping to mdTranslator
def newBase
intObj = {
metadata: {
resourceInfo: {
constraints: [
{
type: 'legal',
useLimitation: [],
scope: {},
graphic: [],
reference: [],
releasability: {},
responsibleParty: [],
legalConstraint: {
accessCodes: ['MD_RestrictionCode'],
useCodes: ['MD_RestrictionCode'],
otherCons: [
'Rights'
]
},
securityConstraint: {}
}
]
}
}
}
end
Translation to mdJson
{
"metadata": {
"resourceInfo": {
"constraint": [
{
"type": "legal",
"legal": {
"otherConstraint": [
"Rights"
]
}
}
]
}
}
}
"I was just thinking that it would handy for the mdTranslator command line tool be able to print out what the valid options are for the various flags, or even just pointed them at a url." - Suggested by Will Fisher.
I just discovered it is possible (and desirable) to have multiple taxonomic definitions in the taxononic section of ISO metadata. The taxonomy hierarchy can repetitively branch at any level with each branch able to continue toward species or branch again within itself. Applying this patch will require changes to mdJson, the internal object, mdTranslator readers and writers, and json-schema-viewer. What a mess!
Fill hierarchyLevel in metadataInfo section with MD_ScopeCode provided by resourceType in resourceInfo section.
Use the mdCodes codelists in mdTranslator for generation of ISO code syntax.
Is this expected behavior, or should we provide a more useful error message?
The error output is:
TypeError: no implicit conversion of nil into String
.../Projects/ADIwg/adiwgTranslator/mdTranslator/lib/adiwg/mdtranslator.rb:93:in `+'
.../Projects/ADIwg/adiwgTranslator/mdTranslator/lib/adiwg/mdtranslator.rb:93:in `reader_module'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.