nceas / eml Goto Github PK
View Code? Open in Web Editor NEWEcological Metadata Language (EML)
Home Page: https://eml.ecoinformatics.org/
License: GNU General Public License v2.0
Ecological Metadata Language (EML)
Home Page: https://eml.ecoinformatics.org/
License: GNU General Public License v2.0
Author Name: David Blankman (David Blankman)
Original Redmine Issue: 365, https://projects.ecoinformatics.org/ecoinfo/issues/365
Original Date: 2001-12-03
Original Assignee: David Blankman
This will be an overview of EML and its relationship to Morpho. It will also be practical guide for
users to understand how to take conventionally reported metadata, such as text documents or
other legacy systems, and manually enter it into Morpho. (This is not to be confused with
automated conversions of metadata into EML).
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 471, https://projects.ecoinformatics.org/ecoinfo/issues/471
Original Date: 2002-04-15
Original Assignee: Chris Jones
Need an overview document that gives the background and rationale for eml. This
would likely have both normative and non-normative sections. Would include an
overview of the structure of EML and the rationale for that structure, and its
intended use. Descriptes packaging and triples in detail. Probably would have
a normative appendix that defines the semantics of every field, which could be
auto-generated from the XSD source documentation.
Chris -- you started an outline for this. Can you recreate it here or in a
document in CVS?
We should have this for the 2.0.0 release but will not likely have it for the
beta7 release.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 146, https://projects.ecoinformatics.org/ecoinfo/issues/146
Original Date: 2000-09-22
Original Assignee: Chad Berkley
Need to revise the research descriptors found in the current eml-context
metadata standard.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 429, https://projects.ecoinformatics.org/ecoinfo/issues/429
Original Date: 2002-02-14
Original Assignee: Peter McCartney
The current eml-entity module describes two types of entities: table-entities
and other-entities. Ultimately I think we need to be able to describe several
other specific types of entities, particularly spatial images and various GIS
objects.
General image support may also be useful (e.g., for jpg, gif, etc) so that photo
quadrats and other types of images used as data and metadata can easily be
included. We may be able to easily accomodate many of these generic entity
types but utilizing a MIME-type label (e.g., image/gif) in the entityType field,
although there may also be need for additional metadata for these entity types.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 472, https://projects.ecoinformatics.org/ecoinfo/issues/472
Original Date: 2002-04-16
Original Assignee: Matt Jones
EML currently uses namespaces of the form "eml:modulename" for each of the eml
modules (e.g., eml:dataset). In contrast, we use version specific public
identifiers for the EML dtds (e.g.,
"-//ecoinformatics.org//eml-dataset-2.0.0beta6//EN"). The formal public
identifiers will need to be updated with each revision of the standard, but
benefit in that they are allow one to specifically state which version of the
module a document uses. This is important in systems where we need to be able
to reliably validate documents.
So, I think we need to change the public namespaces for eml to be versioned like
the public identifiers are. A format like this would do:
"eml:eml-dataset-2.0.0beta7"
Note that I specifically did not choose to use an http URI for this namespace
because of the intense controversy over resolvability of namespace URIs, and the
later development of specs like RDDL. The namespace spec explicitly states that
processors should NOT expect that a schema will reside at the namespace URI, nor
even that the namespace URI is resolvable as an address. Thus, the "eml" scheme
in the URI makes it clear that it is not a resolvable URL. We should rely on
schemaLocation, or handle it in each schema processor.
This will need to be changed throughout the schema docs.
Also, need to add documentation in the DTDs describing the proper public
identifier that should be used with the DTDs so that it is clear.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 483, https://projects.ecoinformatics.org/ecoinfo/issues/483
Original Date: 2002-05-01
Original Assignee: Chad Berkley
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad, Peter
Peter's notes include a mention of "change logs" which I don't understand. Peter?
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 1, https://projects.ecoinformatics.org/ecoinfo/issues/1
Original Date: 2000-01-07
Original Assignee: Matt Jones
Need more extensive documentation for elements of all DTDs. Current single
sentence documentation is too terse. Best if mde can automatically read
documentation and incorporate it into online help.
Author Name: Chris Jones (Chris Jones)
Original Redmine Issue: 373, https://projects.ecoinformatics.org/ecoinfo/issues/373
Original Date: 2001-12-11
Original Assignee: Chris Jones
The geolcit, classcit, and idref elements in eml-coverage.xsd use a complex type
consisting of one element reference in a sequence. The reference is to the
citeinfo element, which is a single element defined in eml-coverage.xsd (with no
documentation). This element ref needs to be changed to point to an eml
literature citation field. Most likely, we would have to import citation into
eml-coverage for this to work appropriately.
Author Name: James Brunt (James Brunt)
Original Redmine Issue: 319, https://projects.ecoinformatics.org/ecoinfo/issues/319
Original Date: 2001-11-06
Original Assignee: Matt Jones
Matt,
Shouldn't the enumerated domain be repeatable? Perhaps I'm not using it right
but say I've got 5 life zones coded into integers 1-5 for a given attribute -
how do I define this domain?
James
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 144, https://projects.ecoinformatics.org/ecoinfo/issues/144
Original Date: 2000-09-22
Original Assignee: Matt Jones
We need a lineage and version control metadata standard that allows us to
specify precisely the versioning information among metadata files, data files,
and other objects in the system. This will likely be related to changes in the
current specification of eml-package, which involves showing links among
objects.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 143, https://projects.ecoinformatics.org/ecoinfo/issues/143
Original Date: 2000-09-22
Original Assignee: Chad Berkley
Need a metadata standard, or possibly extensions to existing standards, for
specifying quality assurance and quality control metadata about data entities.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 336, https://projects.ecoinformatics.org/ecoinfo/issues/336
Original Date: 2001-11-29
Original Assignee: Chris Jones
Need formal definitions for the responsible party roles that is usable by
applications like morpho to help guide the user in their choice of roles.
There's a lot of confusion about several roles, like
originator/owner/principalInvestigator.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 480, https://projects.ecoinformatics.org/ecoinfo/issues/480
Original Date: 2002-05-01
Original Assignee: Matt Jones
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Matt
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 264, https://projects.ecoinformatics.org/ecoinfo/issues/264
Original Date: 2001-08-14
Original Assignee: Matt Jones
Need to add distribution info (url, medium) to eml-resource to accomodate the
type of stand-alone data registry envisioned by systems like NRS and OBFS.
Probably will still have a need for distribution info at the entity level as well.
Author Name: Chad Berkley (Chad Berkley)
Original Redmine Issue: 174, https://projects.ecoinformatics.org/ecoinfo/issues/174
Original Date: 2000-11-15
Original Assignee: Matt Jones
Work with Peter on attribute, relation and entity modules
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 3, https://projects.ecoinformatics.org/ecoinfo/issues/3
Original Date: 2000-01-07
Original Assignee: Matt Jones
Some required elements in the DTD should be optional. An example: the "country"
element from "meta_address" should be optional but is not. Others should be
inspected in fixed too.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 338, https://projects.ecoinformatics.org/ecoinfo/issues/338
Original Date: 2001-11-29
Original Assignee: Chris Jones
Many of the fields in eml-coverage are inadequately documented. Need to
thoroughly fill ouot the documentation in the annotation tags of the schema files.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 339, https://projects.ecoinformatics.org/ecoinfo/issues/339
Original Date: 2001-11-29
Original Assignee: Matt Jones
The current eml-coverage requires a bounding box described by two points. Many
ecological data sets are collected at a site with a point location but no know
bounding box. How can we accomodate point coverage? Two possibilities: 1)
change the content model to make one of the points in the bounding box optional,
or 2) change the documentation to tell the user to fill in identical points in
both bounding box coordinates if it is a point.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 139, https://projects.ecoinformatics.org/ecoinfo/issues/139
Original Date: 2000-09-22
Original Assignee: Chad Berkley
need to make the various revisions to eml-resource that would make it compatible
with the older eml-dataset dtd. See notes that nottrott has on the topic.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 427, https://projects.ecoinformatics.org/ecoinfo/issues/427
Original Date: 2002-02-14
Original Assignee: Matt Jones
The current eml-constraint module is designed to reference table and attribute
identifers so that the relationships between two particular entities can be
established. However, we do not currently indicate how the values for these
identifiers should be obtained or constrained. Are they the eml-identifiers
(which doesn't work for attributes), or are they names (entityName,
attributeName) which might run into many problems with uniqueness issues? We
need an easy, consistent, approach that we recommend or require as part of the
semantics of this module.
In addition, constraints will always apply to one or more entities, so it is
reasonable to consider merging the entire eml-constraint module onto eml-entity.
However, doing this means that constraints that affect a table may be only
described in the description of a different table, which could definitely cause
some problems in locating the information. By maintining the independence of
the eml-constrain module, we create a single, identifiable location where both
participants in a constraint can be enumerated. This will be far easier for
applications to use to identify both sides of a constraint, at the cost of
having to specify both sides in the constraint description. Of course, this
does not apply to constraints that apply to only a single entity such as UNIQUE
constraints.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 269, https://projects.ecoinformatics.org/ecoinfo/issues/269
Original Date: 2001-08-31
Original Assignee: Matt Jones
There are some contentious issues surrounding the use of packaging (ie, the
triple element) in EML. Some would prefer inclusion via namespaces directly to
make the schema more explicit. But using triples to associate data and metadata
files is more flexible and allows new types of metadata to be added over time
without changes to the original structure.
One complaint is that the current structure requires multiple files to deliver
all of the metadata. One possible solution is to include an element 'metadata'
with content model 'ANY' as the root element, which can contain all of the other
modules, and they in turn can use namespaces to indicate how validation can be
performed.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 140, https://projects.ecoinformatics.org/ecoinfo/issues/140
Original Date: 2000-09-22
Original Assignee: Chris Jones
Need revisions to eml-access to allow XML-based communication between the
dmanclient and metacat servlet. Probably need to move all of the "distribution"
related fields in the old eml-access to a separate "distribution" module.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 335, https://projects.ecoinformatics.org/ecoinfo/issues/335
Original Date: 2001-11-29
Original Assignee: Matt Jones
Current eml identifiers are a string that symbolizes a unique revision of an
object (e.g., jones.14.1). The same identifer should always be associated with
the same stream of bytes (ie, checksums would match).
Suggestion that eml identifiers should be decomposed into two parts. The first
part is a "family" id (string) that represents a group of related objects. The
second is a revision # (integer) that indicates the revision number of one of
the objects in the family. The combination of the familyid and revisionnum
would always be unique, and would be usable as an accession number. In XML,
this could look something like:
Questions remain.
Author Name: Chad Berkley (Chad Berkley)
Original Redmine Issue: 175, https://projects.ecoinformatics.org/ecoinfo/issues/175
Original Date: 2000-11-15
Original Assignee: Chad Berkley
work with Mark on package and lineage modules
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 428, https://projects.ecoinformatics.org/ecoinfo/issues/428
Original Date: 2002-02-14
Original Assignee: Matt Jones
The current incarnation of eml-constraint allows the enumeration and definition
of integrity constraints that apply to entities. These are currently drawn from
the relational model, including UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK
constraints. It may also be extended to include other types of relationships
between entities that are not part of the relational model.
The "triple" element allows us to create arbitrary relationships between
identifiable objects in EML, and is used for associating data with metadata, and
groups of metadata and data objects together as a "package". This usage is very
similar to the relational model, in that it allows us to define 3-valued tuples
in a graph structure. Constraints between entities could conceivable be modeled
using this infrastructure, probably with some modifications to the concept of a
"relationship".
So, the question arises. Should we try to develop a unified approach to the
specification of constraints and the specification of packages? It might be
more elegant, but possibly at the cost of simplicity and ease-of-use. My gut
feeling is that this is not something we whould pursue, but would like to hear
other people's reasons for or against it.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 450, https://projects.ecoinformatics.org/ecoinfo/issues/450
Original Date: 2002-03-27
Original Assignee: Chad Berkley
The eml-entity module has a field called "entityType" that is supposed to
contain the type of the entity for "other" entities. The eml-physical file has
a field called "format" that is supposed to contain the name of the data forat
for the physical file. We need to clarify the difference between these fields.
If one is using a mime-type to indicate the format (e.g., image/gif), where
should that go? My guess is eml-physical/format.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 337, https://projects.ecoinformatics.org/ecoinfo/issues/337
Original Date: 2001-11-29
Original Assignee: Chad Berkley
KNB scientists wanted to classify storage type for attributes as "nominal",
"ordinal", "interval", rather than using the physical storage types we had
considered (e.g., test, integer, floating point). Need to clarify what the
contents of this field should be and possibly define a domain for the value-space.
The org.ecoinformatics.eml.EMLParser
does not perform well when processing large EML documents (for instance, a document with 250 to 1000 attribute
fully fleshed out elements defined). It can take 10, 30, 45 or more minutes to validate a document -- the duration scales with document size.
To try to alleviate this, change the parser to use a SAX-based model rather than a DOM.
org.ecoinformatics.eml.EMLParser
uses two methods to validate a document: parseKeys()
and parseKeyrefs()
, both of which call getPathContent()
and pass in an XPath selector. getPathContent()
creates a DOM and passes back an org.w3.dom.NodeList
.
See the attached file as an example.
Author Name: Dan Higgins (Dan Higgins)
Original Redmine Issue: 443, https://projects.ecoinformatics.org/ecoinfo/issues/443
Original Date: 2002-03-14
Original Assignee: Chad Berkley
eml-access has a duration element that is defined in terms of temporalCoverage.
This introduces a number of temporal concepts that are nonsensical when applied
to the duration of 'tickets' (e.g. geological time scales). It is suggested that
simple start and stop times/dates be used here rather than temporal coverage
concepts.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 145, https://projects.ecoinformatics.org/ecoinfo/issues/145
Original Date: 2000-09-22
Original Assignee: Mark Schildhauer
Need to revise the software metadata standard, especially considering the
changes made in eml-resource for bibliographic citations, and the version
control/lineage metadata information to be developed.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 142, https://projects.ecoinformatics.org/ecoinfo/issues/142
Original Date: 2000-09-22
Original Assignee: Chad Berkley
Need to revise eml-file and eml-variable to more fully support the description
of the structure and content of ASCII data files. Need to be able to specify
relational constraints among various data entities, possibly using a new
module. Need to be able to specify data formats for binary formats on a
per-data-entity basis.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 43, https://projects.ecoinformatics.org/ecoinfo/issues/43
Original Date: 2000-07-26
Original Assignee: Chad Berkley
Inthe resource.xsd XML Schema document, the ResourceVariation type is not
needed. Instead, it would be better to just have a set of top-level elements
defined (like dataset and literature) that can be used as the docroot for
particular resource documents. This would eliminate the need for the whole file
"resourceExample.xsd".
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 485, https://projects.ecoinformatics.org/ecoinfo/issues/485
Original Date: 2002-05-01
Original Assignee: Dan Higgins
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Dan
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 277, https://projects.ecoinformatics.org/ecoinfo/issues/277
Original Date: 2001-08-31
Original Assignee: Matt Jones
Need to extend EML, either by adding a new module or extending the current
entity/attribute system, so that semantic metadata can be accommodated.
Basically, this means being able to enter terms from an ontology (see bug 274)
so that a particular data table attribute can be tied into the ontology. See
the KDI proposal on canonical variables for more information.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 45, https://projects.ecoinformatics.org/ecoinfo/issues/45
Original Date: 2000-07-26
Original Assignee: Chris Jones
Using the Xerces parser (1.1.2), there are problems validating against the
resource.xsd XML Schema document, mainly pertaining to the content model of
originator. This may be a Xerces bug, but I doubt it because it works using
other examples.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 249, https://projects.ecoinformatics.org/ecoinfo/issues/249
Original Date: 2001-07-11
Original Assignee: Jing Tao
Need an updated set of XSLT stylesheets to transform EML 2.0 documents into HTML
for display on metacat and in other areas.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 477, https://projects.ecoinformatics.org/ecoinfo/issues/477
Original Date: 2002-04-20
Original Assignee: Matt Jones
In all modules we import other schemas and use components. Sometimes we define
elements that use a ComplexType, other times we import and use an element that
uses the complex type. We need to go through the modules systematically and
make sure that all inter-namespace references are done in the same way. This
most often involves the ways we use ResponsibleParty, the various coverage
types, and citations, but there are several others too. Check them all and fix
systematically.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 141, https://projects.ecoinformatics.org/ecoinfo/issues/141
Original Date: 2000-09-22
Original Assignee: Chad Berkley
Need revisions to the current eml-package and a possible new "eml-distribution"
metadata module that allows specification of the metadata about distributing
datasets. The distribution info can include both online information (a la
eml-package), offline information (e.g., addresses), contact information,
licensing, use constraints, copyright, and other information about distribution.
Author Name: Chad Berkley (Chad Berkley)
Original Redmine Issue: 176, https://projects.ecoinformatics.org/ecoinfo/issues/176
Original Date: 2000-11-15
Original Assignee: Chad Berkley
work with Bill and Corrina on the research protocol and qaqc modules
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 229, https://projects.ecoinformatics.org/ecoinfo/issues/229
Original Date: 2001-04-12
Original Assignee: Matt Jones
Incorporate revisions to eml-resource.xsd that were suggested in the EML 2
workshop. Includes separating dataset, literature, and software out as separate
modules that extend ResourceBase, and adding triplets to ResourceBase.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 470, https://projects.ecoinformatics.org/ecoinfo/issues/470
Original Date: 2002-04-15
Original Assignee: Chad Berkley
Request to be able to inline data in the same file as EML. For binary data,
this could be Base64 encoded. For text it could be in stream. Probably should
work from a current standard way to do it like XSIL.
Author Name: Chad Berkley (Chad Berkley)
Original Redmine Issue: 440, https://projects.ecoinformatics.org/ecoinfo/issues/440
Original Date: 2002-03-05
Original Assignee: Matt Jones
eml-entity needs a field to indicate whether the first line in the entity is a header row. currently, we have no way to know which row to start a process on. morpho collects this metadata but has no place to store it.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 481, https://projects.ecoinformatics.org/ecoinfo/issues/481
Original Date: 2002-05-01
Original Assignee: Chad Berkley
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 42, https://projects.ecoinformatics.org/ecoinfo/issues/42
Original Date: 2000-07-26
Original Assignee: Chad Berkley
A content model of "mixed" was used for several of the complex types in the
resource.xsd XML Schema documents. In general, because mixed content models
cannot be validated, I think they should not be used. In all of the cases here
the model should be changed to "elementOnly".
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 430, https://projects.ecoinformatics.org/ecoinfo/issues/430
Original Date: 2002-02-14
Original Assignee: Chad Berkley
The current set of DTD files checked into the eml module do not correspond in a
1:1 way with the XSD files. In particular, 1) parameter entities were resolved
(e.g., eml-dataset includes eml-resource) and should not be; and 2) multiple
global elements in the schema should be represented as possible root elements in
the DTD but in fact were eliminated. For example, in eml-entity, both
"table-entity" and "other-entity" should be root elements in the eml-entity.dtd,
but infact only "table-entity" is present because it caused some problems
withthe software we were using to parse DTDs. This needs to be fixed so that
all appropriate elements are available.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 484, https://projects.ecoinformatics.org/ecoinfo/issues/484
Original Date: 2002-05-01
Original Assignee: Chad Berkley
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Chad, Dan, David
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 482, https://projects.ecoinformatics.org/ecoinfo/issues/482
Original Date: 2002-05-01
Original Assignee: Peter McCartney
Changes as decided upon at the Sevilleta EML meeting, April 24-25, 2002:
Responsible: Matt
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 265, https://projects.ecoinformatics.org/ecoinfo/issues/265
Original Date: 2001-08-15
Original Assignee: Matt Jones
Tim Bergsma reported this issue:
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 266, https://projects.ecoinformatics.org/ecoinfo/issues/266
Original Date: 2001-08-15
Original Assignee: Matt Jones
attribute metadata describes the domain for the attributes using enumerated and
range domains, but does not currently allow for free-text domains. This could
be fixed using FGDC's unrepresentable domain.
Also, there has been a request to add 'paragraph' and 'citation' elements to the
'source' element to be more specific about the source for a domain.
Author Name: Matt Jones (Matt Jones)
Original Redmine Issue: 44, https://projects.ecoinformatics.org/ecoinfo/issues/44
Original Date: 2000-07-26
Original Assignee: Chad Berkley
givenName should be optional in iso-party.xsd. In general, these content
standards should be as minimally constricting as possible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.