Git Product home page Git Product logo

Comments (29)

cboettig avatar cboettig commented on July 18, 2024

@mbjones When we push a dataset to a repository such as figshare and receive a DOI, where should we file this information? In <eml><dataset><publisher>...<something>, or under <eml><additionalMetadata> ... ? Or in <dataset><additionalIdentifer>? or as an id attribute to some node?

Can you point to a good example of this?

from eml.

mbjones avatar mbjones commented on July 18, 2024

If its the main identifier for the EML data package, it should be put in the <eml @packageId> attribute, which is the designated location for the package identifier. If it is a secondary identifier for the package (i.e., not the main one you want cited, but one that should also be synonymous with the package), then it could go into <dataset>/<additionalIdentifier>.

from eml.

cboettig avatar cboettig commented on July 18, 2024

Perfect, thanks. Is there a way to denote the identifier is a doing? (Eg a
namespace for identifier?)


Carl Boettiger
http://carlboettiger.info

sent from mobile device; my apologies for any terseness or typos
On Jul 2, 2013 3:00 PM, "Matt Jones" [email protected] wrote:

If its the main identifier for the EML data package, it should be put in
the <eml @packageId> attribute, which is the designated location for the
package identifier. If it is a secondary identifier for the package (i.e.,
not the main one you want cited, but one that should also be synonymous
with the package), then it could go into /.


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/27#issuecomment-20382062
.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

I am dealing with the literature module at the moment and stumbeled upon somthing where I need input. The edited book citation type equals to the book citation type with the difference that there are some editors that edited the book whreas the chapters can have different authorships. The documentation says that the editors need to go into the "creators" field. But the book citation type does not have a "creators" field. Do I miss something?

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Every citation type inherits the ResourceGroup, which provides creator (and title, and lots of other stuff)

For an editedBook, you would probably put the creator responsibleParty elements with different positionName, either "editor" or "author" accordingly.

I wouldn't worry about that at this stage. We first want to create the S4 class equivalent of the schema. For that purpose, editedBook should just inherit the book class:

setClass("editedBook", contains="book")

And make sure book contains resourceGroup, as well as it's unique slots. Point me to your code soon so I can give some feedback before you get too deep.

Once we have the S4 classes in place, we will write methods that map other citation objects into these types. For instance, R already has a native citation class (based on Bibtex). It won't map perfectly into EML, but we'll do our best to make a reasonable mapping. (That stage is where we actually will worry about where we put editors vs authors, etc)

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

@cboettig Ok thanks that sounds good. Here is what I did today. Not much but a beginning.

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Nice work, thanks for the link.

You may know this already, but you'll want all of the citation classes to inherit resourceGroup. This is a bit annoying as the resourceGroup slots should be listed before the type-specific slots, since order matters in the schema. e.g. you might think you can just add "contains="resourceGroup" to the class definition, but this gets the wrong ordering, with those slots coming last (e.g. in the order given by slotNames(new("article")). So you have to do something like this:

    setClass("A", slots = c(slot1 = "character", slot2="character"))
    setClass("B_slots", slots = c(Bslot1 = "character", Bslot2= "character"))
    setClass("B", contains =c("A", "B"))

e.g. for article:

setClass("article_slots",
         slots = c(journal = "journal",
                   volume = "volume",
                   issue = "issue",
                   pageRange = "pageRange",
                   publisher = "publisher",
                   publicationPlace = "publicationPlace",
                   ISSN = "ISSN"))
setClass("article", contains = c("resourceGroup", "article_slots")

not super elegant, I know. I'm looking for a better way to handle ordering but this is pretty simple.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Ok I see. Thanks!

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

What to do best in the following case. Publisher in article comes from responsibleParty. Should I do it this way
where I include responsible party as inheritance:

setClass("article_slots",
         slots = c(journal = "journal",
                   volume = "volume",
                   issue = "issue",
                   pageRange = "pageRange",
                   publicationPlace = "publicationPlace",
                   ISSN = "ISSN")
         )

setClass("article",
         contains = c("resourceGroup",
                            "responsibleParty"
                            "article_slots")
         )

Or rather put the responsibleParty class in place for publisher as publisher is no where else mentioned and the slots needs to be named like this as otherwise we miss it.

setClass("article_slots",
         slots = c(journal = "journal",
                   volume = "volume",
                   issue = "issue",
                   pageRange = "pageRange",
                   publisher = "responsibleParty", 
                   publicationPlace = "publicationPlace",
                   ISSN = "ISSN")
         )

setClass("article",
         contains = c("resourceGroup",
                            "article_slots")
         )

I would preferably choose the second one but I am not exactly sure

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff I think you're overthinking this actually.

You want the slot names to match exactly with the elements, so article must contain a slot publisher of class publisher. That class should then simply responsibleParty (and nothing else). You'll see that in fact I've already defined the publisher class in this way, because it is used elsewhere: https://github.com/ropensci/reml/blob/master/R/dataset.R#L5 (yeah, dataset.R is not a good home for this, but I just put it there because I first needed this class while writing the dataset slots.)

Does that make sense?

Your second case would work if we took the node name from the parent slot, but we take it from the object itself because that is more modular (we don't need to know the parent). So, your second case would result in EML that had a <responsibleParty> node where it should have a <publisher> node.

It may look silly or verbose to have classes defined that are equivalent to responsibleParty, but it's actually very tidy this way. There are lots of kinds of responsibleParty, but we want them to have there own name.

from eml.

cboettig avatar cboettig commented on July 18, 2024

p.s. Officially your classes should all have referencesGroup at the end of their inheritance, and have id_scope for the attribute data, even though we might not be using these slots:

setClass("article",
         contains = c("id_scope",
                            "resourceGroup",
                            "article_slots",
                            "referencesGroup"))

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Thanks @cboettig yes that makes sense. If this is a good place I do not know at the moment. Maybe there will be a better home later on. But makes sense as you describe it to be where it is at the moment and I can use it from there. Thanks for the explanations.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

I thought it would be a good idea to validate the initialization of the citation types just to make sure the required fields are in place on initialiization of the object. How can I do this best? What I have at the moment is (but
does not work like this):

setClass("article",
         contains = c("id_scope",
                      "resourceGroup",
                      "article_slots",
                      "referencesGroup"
                      ),
         validity = function(object){
               if(all.equal(object@journal, character(0))){ # check that non empty
                     return("Journal is reqired")
               }else{TRUE}
         })

After that I will do the mapping to bibtex. I thought to do this with one method per each citation type as signature that operates on Rs toBibtex and after that create a method that operates on citation or bibentry for signature eml. Depending on the citation type provided in the eml the right function with right BibTeX mapping will be called to generate the bibtex representation. But this would require many manual decision on which citation type to call in the citation(eml) method. Is there a more elegant way to do this? Or am I on the right track so far?

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Well ok. The bibentry feels more native that you mention in your first post.

setMethod("bibentry",
          "article",
          function(object, bibtype){
              entry =  bibentry(
                author = object@creator
                title = object@title,
                journal = object@journal,
                year = object@pubDate,
                ....)
              entry

          }

          )

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Nice. Yes, if you map to bibentry, we get the mapping to Bibtex for free, as well as other formats. (For instance, we can add a citation by DOI then using the knitcitations::cite function.

You've written setMethod above, (creates a function named "bibentry" that takes an "article" as it's signature), but this looks like a "Coercion" method (changes types) so I think it is best written as a setAs instead. Obviously we'd want coercions both ways, from article to bibentry and the reverse. Make sense?

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

@cboettig Ok good then I am set up to go on and hopefully finish this within the next few days. What do you think about checking that required fields in bibtypes are in place on initialization? I added a question in the post before. As what is displayed there does not really work and I am not sure why it now works like this. Maybe it is completely senseless to to this checking but if it is good to have there I would like to know how to best do this.

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff yeah I meant to comment on the validation step. I haven't taken a
close look at your validation code, but in general I do not think it is
necessary for us to write validation methods, since:

a) We can already validate against the schema itself using eml_validate(),
so there is no need to also validate the S4
b) While validation is important and a great strength of using EML, it
doesn't really make sense from the user's perspective to have the function
fail because they are trying to read in some EML that isn't technically
valid (e.g. missing some required field, etc). In such cases, it seems the
software should try and do it's best with what it has (maybe with a
warning) rather than fail anyhow.

For more discussion of this, see past issues:
https://github.com/ropensci/reml/issues/7 and
https://github.com/ropensci/reml/issues/46 and feel free to weigh in on
those threads.

On Mon, Dec 9, 2013 at 8:18 AM, cpfaff [email protected] wrote:

@cboettig https://github.com/cboettig Ok good then I am set up to go on
and hopefully finish this within the next few days. What do you think about
about checking that required fields in bibtypes are in place on
initialization? I added a question in the post before. As what is displayed
there does not really work and I am not sure why it now works like this.
Maybe it is completely senseless to to this checking but if it is good to
have there I would like to know how to best do this.


Reply to this email directly or view it on GitHubhttps://github.com/ropensci/reml/issues/27#issuecomment-30145669
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Okay, integrated. go ahead and pull from master to update your branch. Had to make a few minor changes:

  • I updated the organization of file types, so I had to update your @include directives (which were correct based on the old version, so nice work).
  • You had a few of your classes defined with Type as part of the class name, e.g. citation = citationType. Technically that makes sense, but because (as I mentioned earlier) we base the EML node name on the class name instead of the slot name (because that way we can write a node without knowing who it's parent is), we would get the wrong name for the node (e.g. we want <citation> not <citationType>). I just dropped the Type parts of all these names.

Anyway, nice work. Before we get any deeper on the literature module, we'll want some test functions for this. If you aren't familiar with writing unit tests with testthat, read up a bit and check out the tests in inst/tests. (You can run them with, e.g. test_file("inst/tests/test_data.set.R"), or run (almost*) all of them with test_dir("inst/tests").

Some of the tests use optional libraries you may not have installed. (The functions may try to install these, but no guarantees).

Once we have a test suite that covers most of the class definitions (read an EML file with these types, write an EML file, and validate the file you write against eml_validate), you'll be in good shape to extend the module futher (bibtex etc).

  • tests whose name doesn't start with test_ are not run by test_dir. This is handy to exclude tests like knb_test.R, which needs user intervention to get the dataone certificate...

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

@cboettig

Thanks for the integration work and fixing the type in names. I have already a bit experience in writing tests and have all the libraries in place. Tried this with our "rbefdata" package and most of the times it worked good. So I will try to set them up before I go deeper into the citation module. Further questions are highly likely to arise in the process. ;-)

a) We can already validate against the schema itself using eml_validate(), so there is no need to also validate > the S4

Ok I already thought about this, so I will just remove the one consitency check I have (see commit above). But just in case I would like to do this. How would I check consitency in this situation as the example above does not work to check if the slot has an empty string.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Here comes the first questions. There is no citation in place in the eml file that is used to test reading eml. Can I add one manually or do we better use another eml file with citation? Many of the tests fail for me at the moment as something is wrong with my set up I think. For example the system.file() call gives me back an empty string and no file path which will make the following stepts fail as well if they require the eml file to be read in. Will have to check this out first before I can start writing tests.

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Right, you'll need to create an example EML file with some citation entries. You can either find a real example (e.g. on the KNB), or just create one using your module.

Re: system.file, make sure you have already installed the current version of reml first? (e.g. install() from devtools or R CMD INSTALL?)

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

@cboettig Thanks. Meanwhile already a step further. The tests work now and I am playing around with an example file with citation. Added the citation to eml class below dataset and it nicely reads and writes my citation elements. That is so cool. But I still need to write the tests.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

I think I am stuck a bit and have a few questions on how to exactly go on with the litrature module besides the testing that still needs to be done. Actually I like the idea of test driven development but find it hard to first write the tests and then the code. So I was just going on a bit further in code that I have something to test. The first question is now on how to best or actually where exactly to integrate the literature module into the rest of the classes. Citations can appear in varios different places and this confuses me a bit.

The first thing I did was to add the citation on the top most level in the eml class which worked fine as I have written yesterday:

setClass("eml",
         slots = c(packageId   = "character",
                        system      = "character",
                        scope       = "character",
                        dataset     = "dataset", 
                        citation    = "citation",
#                        software    = "software",
#                        protocol    = "protocol",
                        additionalMetadata = "ListOfadditionalMetadata",
                        namespaces = "character",
                        dirname = "character"),
         # slots 'namespaces' and 'dirnames' are for internal use
         # only and not written as XML child elements.
         prototype = prototype(namespaces = eml_namespaces))

Now I tried to figure out in which classes to add the citation = "citation" as well, so it can be recognized when we turn the eml into an s4 representation and vice versa, but I am not sure where to look this up best. I did not find this in the eml html text documentation of the literature module or maybe I just overlooked it. Maybe I have to look this up in another way like for example derive it from the graphics that are provided in the documentation. If so I am not exactly sure how.

from eml.

cboettig avatar cboettig commented on July 18, 2024

@cpfaff Good question. citation is used throughout the schema. For classes we haven't yet implemented, (e.g. protocol), we'll just be able to include it when we write the rest of the class definition (you'll see it in protocol/proceduralStep/citation)). For classes that we have implemented, ideally I would have made some note in class definition where I was unable to complete the definition without the module, but I haven't always done so (particularly in the beginning).

Yeah, I do a mix of browsing the documentation and browsing the images, but neither is a good way to find all the places the "CitationType" is used. For that, you might just want to grep against the .xsd definitions directly:

 grep CitationType *.xsd
eml-attribute.xsd:                      <xs:element name="citation" type="cit:CitationType">
eml-coverage.xsd:            <xs:element name="timeScaleCitation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-coverage.xsd:                    <xs:element name="classificationSystemCitation" type="cit:CitationType">
eml-coverage.xsd:              <xs:element name="identificationReference" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-literature.xsd:  <xs:element name="citation" type="CitationType">
eml-literature.xsd:  <xs:complexType name="CitationType">
eml-methods.xsd:            <xs:element name="citation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-methods.xsd:          <xs:element name="citation" type="cit:CitationType">
eml-physical.xsd:                    <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml-project.xsd:                    <xs:element name="citation" type="cit:CitationType" minOccurs="0" maxOccurs="unbounded">
eml-project.xsd:              <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml-project.xsd:              <xs:element name="citation" type="cit:CitationType" minOccurs="0">
eml.xsd:          <xs:element name="citation" type="cit:CitationType">

And we see everywhere it is used. Notice that sometimes the element name is citation while elsewhere we have to define a new class because the element name is actually something like identificationReference. So that's probably the best answer to your question.

You can search for a list of places where citation has already been used, e.g. at the command line in the R directory:

grep '"citation"' *.R

shows:

cboettig@strata:~/Documents/code/ropensci/reml/R$ grep '"citation"' *
coverage.R:         slots = c(classificationSystemCitation = "citation",
coverage.R:                        identificationReference = "citation",
eml.R:#                        citation    = "citation",
literature.R:setClass("citation",
literature.R:         slots = c(citation = "citation")
literature.R:# setMethod("citation", "eml",
literature.R:         # slots = c(classificationSystemCitation = "citation",

Here we see that I've used it in coverage, and also that I've made a mistake -- I should have defined corresponding classes that inherit citation (recall class names determine element names). This will create a node named citation when we want a node named identificationReference, e.g.:

t <- new("taxonomicSystem")
t@identificationReference@article@title = "the title"  # So that the entry isn't empty
> reml:::S4Toeml(t)
  <taxonomicSystem>
    <citation>
      <article scope="document">
        <title>the title</title>
      </article>
    </citation>
  </taxonomicSystem>

So that needs to be fixed. Note in testing this I saw a few errors you'll want to address:

  • You haven't defined the XMLElementNode coercions methods for citation class itself. (Pull from master because I just added this case).
  • You've copied a mistake I made earlier in a few places: You've defined both coersions to use emlToS4. When going from the S4 class to EML, we actually want the method S4Toeml. (This error wasn't actually causing any mistakes because I wasn't calling the coercion method in the recursion, I was calling S4Toeml directly. I've now fixed this to rely on the defined coercion method though).

So, your coercions should look like:

setAs("article",
      "XMLInternalElementNode",
      function(from) S4Toeml(from)
      )

setAs("XMLInternalElementNode",
      "article",
      function(from) emlToS4(from)
      )

Again, I've fixed article but not the rest, so pull from master and then send me a pull request with these fixed up. Sorry for the confusion.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Thanks very much for the detailed answer which was quite helpful. The grepping against the .xsd is a good idea. I noticed the mistake already myself and fixed it for all coercions in the module yesterday in the evening. I also already had the coercions for citation in place but I did not pushed the commits yet. Will pull the master and resolve the conflicts if any before I send you a small pull request.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Ok seems that code boxes do not work in commit messages in reference thread, bummer. That would be so cool.

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

A question. There is something that confuses me a bit and where I need some input. We have the creator from resource group which takes a ListOfcreator. In edited books the editors should be listed in the creator field. But as said already the slots awaits a ListOfCreators. And so the separate person type of editors with a ListOfeditor cannot be placed in there but should. How to best realize this. Defining two types for slots like this does not work
as it generates creator1 and creator2 for the different types

 "creator" = c("ListOfcreator", "ListOfeditor"),

from eml.

cpfaff avatar cpfaff commented on July 18, 2024

Well just found the solution with class unions. That is cool:

setClassUnion("ListOfcreatorOreditor", c("ListOfcreator", "ListOfeditor"))
...
"creator" = "ListOfcreatorOreditor",

from eml.

cboettig avatar cboettig commented on July 18, 2024

Looks like we have the literature module mostly in place, great work @cpfaff . This issue also raises the idea of adding and extracting a "canonical" citation (e.g. what people should cite when using the data). We have the function eml_get(eml, "citation_info") (or the method citation_info) which extracts a citation from an EML form based on creators, title, year, and publisher data from the dataset, but that's not quite ideal. Looking for something more standard that could house a journal article the dataset creators wanted cited, etc, see https://projects.ecoinformatics.org/ecoinfo/issues/6283 As this issue has already covered a lot of stuff an has almost 30 comments, will open this "canonical citation" as a new issue. I think we can close this one.

from eml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.