Comments (29)
@mbjones When we push a dataset to a repository such as figshare and receive a DOI, where should we file this information? In <eml><dataset><publisher>...<something>
, or under <eml><additionalMetadata> ... ?
Or in <dataset><additionalIdentifer>
? or as an id attribute to some node?
Can you point to a good example of this?
from eml.
If its the main identifier for the EML data package, it should be put in the <eml @packageId>
attribute, which is the designated location for the package identifier. If it is a secondary identifier for the package (i.e., not the main one you want cited, but one that should also be synonymous with the package), then it could go into <dataset>/<additionalIdentifier>
.
from eml.
Perfect, thanks. Is there a way to denote the identifier is a doing? (Eg a
namespace for identifier?)
Carl Boettiger
http://carlboettiger.info
sent from mobile device; my apologies for any terseness or typos
On Jul 2, 2013 3:00 PM, "Matt Jones" [email protected] wrote:
If its the main identifier for the EML data package, it should be put in
the <eml @packageId> attribute, which is the designated location for the
package identifier. If it is a secondary identifier for the package (i.e.,
not the main one you want cited, but one that should also be synonymous
with the package), then it could go into /.—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/27#issuecomment-20382062
.
from eml.
I am dealing with the literature module at the moment and stumbeled upon somthing where I need input. The edited book citation type equals to the book citation type with the difference that there are some editors that edited the book whreas the chapters can have different authorships. The documentation says that the editors need to go into the "creators" field. But the book citation type does not have a "creators" field. Do I miss something?
from eml.
@cpfaff Every citation type inherits the ResourceGroup
, which provides creator
(and title, and lots of other stuff)
For an editedBook
, you would probably put the creator responsibleParty
elements with different positionName
, either "editor" or "author" accordingly.
I wouldn't worry about that at this stage. We first want to create the S4 class equivalent of the schema. For that purpose, editedBook should just inherit the book class:
setClass("editedBook", contains="book")
And make sure book
contains resourceGroup
, as well as it's unique slots. Point me to your code soon so I can give some feedback before you get too deep.
Once we have the S4 classes in place, we will write methods that map other citation objects into these types. For instance, R already has a native citation
class (based on Bibtex). It won't map perfectly into EML, but we'll do our best to make a reasonable mapping. (That stage is where we actually will worry about where we put editors vs authors, etc)
from eml.
@cboettig Ok thanks that sounds good. Here is what I did today. Not much but a beginning.
from eml.
@cpfaff Nice work, thanks for the link.
You may know this already, but you'll want all of the citation classes to inherit resourceGroup
. This is a bit annoying as the resourceGroup
slots should be listed before the type-specific slots, since order matters in the schema. e.g. you might think you can just add "contains="resourceGroup"
to the class definition, but this gets the wrong ordering, with those slots coming last (e.g. in the order given by slotNames(new("article"))
. So you have to do something like this:
setClass("A", slots = c(slot1 = "character", slot2="character"))
setClass("B_slots", slots = c(Bslot1 = "character", Bslot2= "character"))
setClass("B", contains =c("A", "B"))
e.g. for article:
setClass("article_slots",
slots = c(journal = "journal",
volume = "volume",
issue = "issue",
pageRange = "pageRange",
publisher = "publisher",
publicationPlace = "publicationPlace",
ISSN = "ISSN"))
setClass("article", contains = c("resourceGroup", "article_slots")
not super elegant, I know. I'm looking for a better way to handle ordering but this is pretty simple.
from eml.
Ok I see. Thanks!
from eml.
What to do best in the following case. Publisher in article comes from responsibleParty. Should I do it this way
where I include responsible party as inheritance:
setClass("article_slots",
slots = c(journal = "journal",
volume = "volume",
issue = "issue",
pageRange = "pageRange",
publicationPlace = "publicationPlace",
ISSN = "ISSN")
)
setClass("article",
contains = c("resourceGroup",
"responsibleParty"
"article_slots")
)
Or rather put the responsibleParty class in place for publisher as publisher is no where else mentioned and the slots needs to be named like this as otherwise we miss it.
setClass("article_slots",
slots = c(journal = "journal",
volume = "volume",
issue = "issue",
pageRange = "pageRange",
publisher = "responsibleParty",
publicationPlace = "publicationPlace",
ISSN = "ISSN")
)
setClass("article",
contains = c("resourceGroup",
"article_slots")
)
I would preferably choose the second one but I am not exactly sure
from eml.
@cpfaff I think you're overthinking this actually.
You want the slot names to match exactly with the elements, so article must contain a slot publisher
of class publisher
. That class should then simply responsibleParty
(and nothing else). You'll see that in fact I've already defined the publisher
class in this way, because it is used elsewhere: https://github.com/ropensci/reml/blob/master/R/dataset.R#L5 (yeah, dataset.R is not a good home for this, but I just put it there because I first needed this class while writing the dataset slots.)
Does that make sense?
Your second case would work if we took the node name from the parent slot, but we take it from the object itself because that is more modular (we don't need to know the parent). So, your second case would result in EML that had a <responsibleParty>
node where it should have a <publisher>
node.
It may look silly or verbose to have classes defined that are equivalent to responsibleParty, but it's actually very tidy this way. There are lots of kinds of responsibleParty, but we want them to have there own name.
from eml.
p.s. Officially your classes should all have referencesGroup
at the end of their inheritance, and have id_scope
for the attribute data, even though we might not be using these slots:
setClass("article",
contains = c("id_scope",
"resourceGroup",
"article_slots",
"referencesGroup"))
from eml.
Thanks @cboettig yes that makes sense. If this is a good place I do not know at the moment. Maybe there will be a better home later on. But makes sense as you describe it to be where it is at the moment and I can use it from there. Thanks for the explanations.
from eml.
I thought it would be a good idea to validate the initialization of the citation types just to make sure the required fields are in place on initialiization of the object. How can I do this best? What I have at the moment is (but
does not work like this):
setClass("article",
contains = c("id_scope",
"resourceGroup",
"article_slots",
"referencesGroup"
),
validity = function(object){
if(all.equal(object@journal, character(0))){ # check that non empty
return("Journal is reqired")
}else{TRUE}
})
After that I will do the mapping to bibtex. I thought to do this with one method per each citation type as signature that operates on Rs toBibtex
and after that create a method that operates on citation
or bibentry
for signature eml
. Depending on the citation type provided in the eml
the right function with right BibTeX mapping will be called to generate the bibtex representation. But this would require many manual decision on which citation type to call in the citation(eml) method. Is there a more elegant way to do this? Or am I on the right track so far?
from eml.
Well ok. The bibentry
feels more native that you mention in your first post.
setMethod("bibentry",
"article",
function(object, bibtype){
entry = bibentry(
author = object@creator
title = object@title,
journal = object@journal,
year = object@pubDate,
....)
entry
}
)
from eml.
@cpfaff Nice. Yes, if you map to bibentry
, we get the mapping to Bibtex for free, as well as other formats. (For instance, we can add a citation by DOI then using the knitcitations::cite
function.
You've written setMethod
above, (creates a function named "bibentry" that takes an "article" as it's signature), but this looks like a "Coercion" method (changes types) so I think it is best written as a setAs
instead. Obviously we'd want coercions both ways, from article
to bibentry
and the reverse. Make sense?
from eml.
@cboettig Ok good then I am set up to go on and hopefully finish this within the next few days. What do you think about checking that required fields in bibtypes are in place on initialization? I added a question in the post before. As what is displayed there does not really work and I am not sure why it now works like this. Maybe it is completely senseless to to this checking but if it is good to have there I would like to know how to best do this.
from eml.
@cpfaff yeah I meant to comment on the validation step. I haven't taken a
close look at your validation code, but in general I do not think it is
necessary for us to write validation methods, since:
a) We can already validate against the schema itself using eml_validate(),
so there is no need to also validate the S4
b) While validation is important and a great strength of using EML, it
doesn't really make sense from the user's perspective to have the function
fail because they are trying to read in some EML that isn't technically
valid (e.g. missing some required field, etc). In such cases, it seems the
software should try and do it's best with what it has (maybe with a
warning) rather than fail anyhow.
For more discussion of this, see past issues:
https://github.com/ropensci/reml/issues/7 and
https://github.com/ropensci/reml/issues/46 and feel free to weigh in on
those threads.
On Mon, Dec 9, 2013 at 8:18 AM, cpfaff [email protected] wrote:
@cboettig https://github.com/cboettig Ok good then I am set up to go on
and hopefully finish this within the next few days. What do you think about
about checking that required fields in bibtypes are in place on
initialization? I added a question in the post before. As what is displayed
there does not really work and I am not sure why it now works like this.
Maybe it is completely senseless to to this checking but if it is good to
have there I would like to know how to best do this.—
Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/27#issuecomment-30145669
.
Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/
from eml.
@cpfaff Okay, integrated. go ahead and pull from master to update your branch. Had to make a few minor changes:
- I updated the organization of file types, so I had to update your
@include
directives (which were correct based on the old version, so nice work). - You had a few of your classes defined with
Type
as part of the class name, e.g. citation =citationType
. Technically that makes sense, but because (as I mentioned earlier) we base the EML node name on the class name instead of the slot name (because that way we can write a node without knowing who it's parent is), we would get the wrong name for the node (e.g. we want<citation>
not<citationType>
). I just dropped theType
parts of all these names.
Anyway, nice work. Before we get any deeper on the literature module, we'll want some test functions for this. If you aren't familiar with writing unit tests with testthat
, read up a bit and check out the tests in inst/tests
. (You can run them with, e.g. test_file("inst/tests/test_data.set.R")
, or run (almost*) all of them with test_dir("inst/tests")
.
Some of the tests use optional libraries you may not have installed. (The functions may try to install these, but no guarantees).
Once we have a test suite that covers most of the class definitions (read an EML file with these types, write an EML file, and validate the file you write against eml_validate), you'll be in good shape to extend the module futher (bibtex etc).
- tests whose name doesn't start with
test_
are not run bytest_dir
. This is handy to exclude tests like knb_test.R, which needs user intervention to get the dataone certificate...
from eml.
Thanks for the integration work and fixing the type in names. I have already a bit experience in writing tests and have all the libraries in place. Tried this with our "rbefdata" package and most of the times it worked good. So I will try to set them up before I go deeper into the citation module. Further questions are highly likely to arise in the process. ;-)
a) We can already validate against the schema itself using eml_validate(), so there is no need to also validate > the S4
Ok I already thought about this, so I will just remove the one consitency check I have (see commit above). But just in case I would like to do this. How would I check consitency in this situation as the example above does not work to check if the slot has an empty string.
from eml.
Here comes the first questions. There is no citation in place in the eml file that is used to test reading eml. Can I add one manually or do we better use another eml file with citation? Many of the tests fail for me at the moment as something is wrong with my set up I think. For example the system.file()
call gives me back an empty string and no file path which will make the following stepts fail as well if they require the eml file to be read in. Will have to check this out first before I can start writing tests.
from eml.
@cpfaff Right, you'll need to create an example EML file with some citation entries. You can either find a real example (e.g. on the KNB), or just create one using your module.
Re: system.file
, make sure you have already installed the current version of reml
first? (e.g. install()
from devtools or R CMD INSTALL
?)
from eml.
@cboettig Thanks. Meanwhile already a step further. The tests work now and I am playing around with an example file with citation. Added the citation to eml
class below dataset
and it nicely reads and writes my citation elements. That is so cool. But I still need to write the tests.
from eml.
I think I am stuck a bit and have a few questions on how to exactly go on with the litrature module besides the testing that still needs to be done. Actually I like the idea of test driven development but find it hard to first write the tests and then the code. So I was just going on a bit further in code that I have something to test. The first question is now on how to best or actually where exactly to integrate the literature module into the rest of the classes. Citations can appear in varios different places and this confuses me a bit.
The first thing I did was to add the citation on the top most level in the eml
class which worked fine as I have written yesterday:
setClass("eml",
slots = c(packageId = "character",
system = "character",
scope = "character",
dataset = "dataset",
citation = "citation",
# software = "software",
# protocol = "protocol",
additionalMetadata = "ListOfadditionalMetadata",
namespaces = "character",
dirname = "character"),
# slots 'namespaces' and 'dirnames' are for internal use
# only and not written as XML child elements.
prototype = prototype(namespaces = eml_namespaces))
Now I tried to figure out in which classes to add the citation = "citation"
as well, so it can be recognized when we turn the eml into an s4 representation and vice versa, but I am not sure where to look this up best. I did not find this in the eml
html text documentation of the literature module or maybe I just overlooked it. Maybe I have to look this up in another way like for example derive it from the graphics that are provided in the documentation. If so I am not exactly sure how.
from eml.
@cpfaff Good question. citation
is used throughout the schema. For classes we haven't yet implemented, (e.g. protocol
), we'll just be able to include it when we write the rest of the class definition (you'll see it in protocol/proceduralStep/citation
)). For classes that we have implemented, ideally I would have made some note in class definition where I was unable to complete the definition without the module, but I haven't always done so (particularly in the beginning).
Yeah, I do a mix of browsing the documentation and browsing the images, but neither is a good way to find all the places the "CitationType" is used. For that, you might just want to grep against the .xsd definitions directly:
grep CitationType *.xsd
eml-attribute.xsd: <xs:element name="citation" type="cit:CitationType">
eml-coverage.xsd: <xs:element name="timeScaleCitation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-coverage.xsd: <xs:element name="classificationSystemCitation" type="cit:CitationType">
eml-coverage.xsd: <xs:element name="identificationReference" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-literature.xsd: <xs:element name="citation" type="CitationType">
eml-literature.xsd: <xs:complexType name="CitationType">
eml-methods.xsd: <xs:element name="citation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-methods.xsd: <xs:element name="citation" type="cit:CitationType">
eml-physical.xsd: <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml-project.xsd: <xs:element name="citation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-project.xsd: <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml-project.xsd: <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml.xsd: <xs:element name="citation" type="cit:CitationType">
And we see everywhere it is used. Notice that sometimes the element name is citation
while elsewhere we have to define a new class because the element name is actually something like identificationReference
. So that's probably the best answer to your question.
You can search for a list of places where citation
has already been used, e.g. at the command line in the R directory:
grep '"citation"' *.R
shows:
cboettig@strata:~/Documents/code/ropensci/reml/R$ grep '"citation"' *
coverage.R: slots = c(classificationSystemCitation = "citation",
coverage.R: identificationReference = "citation",
eml.R:# citation = "citation",
literature.R:setClass("citation",
literature.R: slots = c(citation = "citation")
literature.R:# setMethod("citation", "eml",
literature.R: # slots = c(classificationSystemCitation = "citation",
Here we see that I've used it in coverage
, and also that I've made a mistake -- I should have defined corresponding classes that inherit citation
(recall class names determine element names). This will create a node named citation
when we want a node named identificationReference
, e.g.:
t <- new("taxonomicSystem")
t@identificationReference@article@title = "the title" # So that the entry isn't empty
> reml:::S4Toeml(t)
<taxonomicSystem>
<citation>
<article scope="document">
<title>the title</title>
</article>
</citation>
</taxonomicSystem>
So that needs to be fixed. Note in testing this I saw a few errors you'll want to address:
- You haven't defined the
XMLElementNode
coercions methods forcitation
class itself. (Pull from master because I just added this case). - You've copied a mistake I made earlier in a few places: You've defined both coersions to use
emlToS4
. When going from the S4 class to EML, we actually want the method S4Toeml. (This error wasn't actually causing any mistakes because I wasn't calling the coercion method in the recursion, I was calling S4Toeml directly. I've now fixed this to rely on the defined coercion method though).
So, your coercions should look like:
setAs("article",
"XMLInternalElementNode",
function(from) S4Toeml(from)
)
setAs("XMLInternalElementNode",
"article",
function(from) emlToS4(from)
)
Again, I've fixed article
but not the rest, so pull from master and then send me a pull request with these fixed up. Sorry for the confusion.
from eml.
Thanks very much for the detailed answer which was quite helpful. The grepping against the .xsd is a good idea. I noticed the mistake already myself and fixed it for all coercions in the module yesterday in the evening. I also already had the coercions for citation
in place but I did not pushed the commits yet. Will pull the master and resolve the conflicts if any before I send you a small pull request.
from eml.
Ok seems that code boxes do not work in commit messages in reference thread, bummer. That would be so cool.
from eml.
A question. There is something that confuses me a bit and where I need some input. We have the creator from resource group which takes a ListOfcreator
. In edited books the editors should be listed in the creator field. But as said already the slots awaits a ListOfCreators
. And so the separate person type of editors with a ListOfeditor
cannot be placed in there but should. How to best realize this. Defining two types for slots like this does not work
as it generates creator1 and creator2 for the different types
"creator" = c("ListOfcreator", "ListOfeditor"),
from eml.
Well just found the solution with class unions. That is cool:
setClassUnion("ListOfcreatorOreditor", c("ListOfcreator", "ListOfeditor"))
...
"creator" = "ListOfcreatorOreditor",
from eml.
Looks like we have the literature module mostly in place, great work @cpfaff . This issue also raises the idea of adding and extracting a "canonical" citation (e.g. what people should cite when using the data). We have the function eml_get(eml, "citation_info")
(or the method citation_info
) which extracts a citation from an EML form based on creators, title, year, and publisher data from the dataset, but that's not quite ideal. Looking for something more standard that could house a journal article the dataset creators wanted cited, etc, see https://projects.ecoinformatics.org/ecoinfo/issues/6283 As this issue has already covered a lot of stuff an has almost 30 comments, will open this "canonical citation" as a new issue. I think we can close this one.
from eml.
Related Issues (20)
- set_attributes should not allow id to be set as NA HOT 1
- set_attributes forces all numeric fields to have storageType = "float" HOT 7
- Taxonomic Coverage and bibtex HOT 1
- Species name epithet is not handled the way specified in the EML schema HOT 2
- Error with molePerKilogram in unit list returned by get_unitList() HOT 3
- dataset and datatable entries from README example fail HOT 2
- `shiny_attributes` performance improvments HOT 8
- Revisit how users can find a learn to use the `eml$*` constructors HOT 2
- Add a minimum version requirement on taxadb and wait to release the next version of this package HOT 1
- Web scraping | sapply function | Error in readBin(5L, "raw", 65536L) : Failure when receiving data from the peer HOT 1
- Creating EML elements with XML attributes HOT 2
- Duplicate person when using `write_eml()` HOT 2
- Set attributes for properties, e.g. `<title xml:lang="eng">` HOT 3
- Function to convert DataCite metadata to EML: good fit for this package? HOT 7
- `<![CDATA[` not always recognized HOT 1
- [Units] Discussion about current unit list HOT 5
- `set_coverage()`: Express common names in `commonName` in `taxonomicCoverage` HOT 10
- `set_responsibleParty()`: allow to create organization parties HOT 1
- namespace conflict introduced when importing/exporting EML generated under older schema
- EML::eml_validate conflicts with knb.ecoinformatics.org parser & appears to introduce invalid xml into valid files HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eml.