Git Product home page Git Product logo

neotoma's Introduction

neotoma

Build Status Build status codecov.io rstudio mirror downloads cran version NSF-1550707 NSF-1948926

NOTE The package neotoma has now been deprecated. Unfortunately our wnapi.neotomadb.org server reached end of life and was no longer operable. The package has been replaced by neotoma2. As much as possible we have kept many of the same patterns, but there are new data structures. The wnapi server was only passing along data submitted before June 2020. The new package accesses all data up until the present. Please contact us directly either by email ([email protected]) or on our slack workspace (there is an it_r channel) and we can help assist you in migrating to neotoma2.

The neotoma package is a programmatic R interface to the Neotoma Paleoecological Database. The package is intended to both allow users to search for sites and to download data for use in analyical workflows of paleoecological research.

neotoma is part of the rOpenSci project and is also hosted on Figshare. The neotoma package has been available on CRAN since May 3, 2015.

For more information on the package please refer to:

Goring, S., Dawson, A., Simpson, G. L., Ram, K., Graham, R. W., Grimm, E. C., & Williams, J. W.. (2015). neotoma: A Programmatic Interface to the Neotoma Paleoecological Database. Open Quaternary, 1(1), Art. 2. DOI: 10.5334/oq.ab

For ongoing news, issues or information please join the Neotoma Slack server (and add the #r channel). Slack is a real time chat application and collaboration hub with mobile and desktop applications.

Development

We expect no further development of this package, however, we welcome contributions from any individual, whether code, documentation, or issue tracking. All participants are expected to follow the code of conduct for this project.

  • Simon Goring - University of Wisconsin-Madison, Department of Geography

Contributors

  • Gavin Simpson - University of Regina, Department of Biology
  • Jeremiah Marsicek - University of Wyoming, Department of Geology and Geophysics
  • Karthik Ram - University of California - Berkely, Berkeley Institue for Data Science.
  • Luke Sosalla - University of Wisconsin, Department of Geography

Package functions resolve various Neotoma APIs and re-form the data returned by the Neotoma database into R data objects. The format of the Neotoma data, and the actual API functions can be accessed on the Neotoma API website.

If you have used the package please consider providing us feedback through a short survey.

Install neotoma

  • CRAN:
install.packages('neotoma')
  • Development version from GitHub:
install.packages("devtools")
library(devtools)
install_github("ropensci/neotoma")
library(neotoma)

Currently implemented in neotoma

More functions are available through the package help. These represent the core functions:

  • get_site - obtain information on sites in the Neotoma dataset (which may contain multiple datasets). API
  • get_dataset - obtain dataset metadata from Neotoma. API
  • get_download - obtain full datasets (pollen or mammal) from Neotoma. API
  • compile_list - using established pollen-related taxonomies from the literature, take the published taxon list and standardize it to allow cross site analysis.
  • get_contact - find contact information for data contributors to Neotoma. API
  • get_publication - obtain publication information from Neotoma. API
  • get_table - return matrices corresponding to one of the Neotoma database tables. tables
  • get_taxa - Get taxon information from Neotoma. API
  • get_chroncontrol - Get chronological information used to build the age-depth model for the record. API

Recent Changes

  • 1.7.6: Updated the API endpoints to correctly point to the new windows API endpoint, in preparation for migration; Fixed the help for get_closest(); Changed character encoding for two data tables, pollen.equiv and taxon.list, made them available in the main package using data().
  • 1.7.4: Bug fix: get_dataset(gpid=123) was returning an error, fix corrects the error to allow unassigned x variables. Updated the allowable dataset types for searching to reflect the larger set of dataset types within Neotoma.
  • 1.7.3: Added numeric/integer methods to the get_site() and get_dataset() functions so that a vector of dataset or siteids can be passed to improve more general workflow methods.
  • 1.7.2: Bugfixes, added the taxa() function to easily extract taxa from one or multiple download objects.
  • 1.7.1: Bugfix for compile_download(), single sample downloads were failing to compile properly, added the taxa() function to extract taxa lists from large download objects.
  • 1.7.0: Added plot_leaflet() to allow interactive exploration of downloaded Neotoma data. Integrates with the Neotoma Explorer. Minor bugfix for get_download() to allow records to be sent to Neotoma and to be filtered.
  • 1.6.2: Improved the basic plot() method based on tests against Tilia files in the Neotoma Holding Tank & built more robust interpolation in read_bacon() so that age models without interpolated dates can still be imported. browse() now opens multiple datastes in the Neotoma Explorer at once.
  • 1.6.1: New Stratiplot() method, using the analogue package to plot dataset diagrams from download and download_list objects, bug fixes for write_agefile() and a new function, read_bacon(), to read in and integrate Bacon chronologies into download objects.
  • 1.6.0: Support for vector inputs in the gpid selection. Added a get_closest() function to find the closest sample site. Mostly clean-up of reported bugs by users. Revised examples for faster check speed.
  • 1.5.1: Minor fix to the get_dataset() for site level data to account for some datasets with empty submission data. Some style changes to code (non-functional changes)
  • 1.5.0: More extensive testing to support multiple dataset types. Water chemistry datasets still unsupported. Function read.tilia() added to read Tilia (http://tiliait.com) style XML files. Moved to using xml2, httr and jsonlite to support parsing.
  • 1.4.1: Small changes to get_geochron() to address bug reports and improve object print methods.
  • 1.4.0: Added plot() method for datasets, sites & downloads. Fixed a bug with records missing chronologies.

A few examples:

Find the distribution of sites with Mammoth fossils in Neotoma

#  Example requires the mapdata package:
library('mapdata')

#  You may use either '%' or '*' as wildcards for search terms:
test <- get_dataset(taxonname='Mammuthus*')

The API call was successful, you have returned  3273 records.

site.locs <- get_site(test)

# A crude way of making the oceans blue.
plot(1, type = 'n',
     xlim=range(site.locs$long)+c(-10, 10),
     ylim=range(site.locs$lat)+c(-10, 10),
     xlab='Longitude', ylab = 'Latitude')
rect(par("usr")[1],par("usr")[3],par("usr")[2],par("usr")[4],col = "lightblue")
map('world',
    interior=TRUE,
    fill=TRUE,
    col='gray',
    xlim=range(site.locs$long)+c(-10, 10),
    ylim=range(site.locs$lat)+c(-10, 10),
    add=TRUE)

points(site.locs$long, site.locs$lat, pch=19, cex=0.5, col='red')

thing

Plot the proportion of publications per year for datasets in Neotoma

# Requires ggplot2
library('ggplot2')
library('plyr')
pubs <- get_publication()

pub.years <- ldply(pubs, "[[", "meta")

ggplot(data=pub.years, aes(x = year)) +
     stat_bin(aes(y=..density..*100, position='dodge'), binwidth=1) +
     theme_bw() +
     ylab('Percent of Publications') +
     xlab('Year of Publication') +
     scale_y_continuous(expand = c(0, 0.1)) +
     scale_x_continuous(breaks = seq(min(pub.years$year, na.rm=TRUE), 2014, by=20))

thing

Cumulative plot of record uploads to Neotoma since 1998.

Found at this gist

cumulativerecords

Obtain records & Rebuild Chronologies with Bacon

Found at this gist. Prepared in part for a Bacon (Blaauw & Christen, 2011) workshop at the 2015 International Limnogeology Conference in Reno-Tahoe, Nevada led by Amy Myrbo (University of Minnesota).

Simple paleo-data visualization

Simple paleo-data visualization in R, linking the rioja, neotoma and dplyr packages. Found at this gist.

gif

Find all site elevations in California:

Found at Simon Goring's gist..

Match all Neotoma taxa to external databases using taxize:

Found at Simon Goring's gist..

Other Resources Using neotoma

neotoma Workshops

We have provided a set of educational tools through the NeotomaDB GitHub repository in the Workshops repository. These are free to share, and can be modified as needed.

neotoma's People

Contributors

andydawson avatar gavinsimpson avatar karthik avatar richardjtelford avatar sckott avatar simongoring avatar stevenysw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neotoma's Issues

Get gpids and geographic names

Search for gpid is annoying because it assumes you know the number for each of the geographical units, which most people don't. It would be great if the gpid detected whether a value was a number or a string. If it's a string then it searches in the gpid table, otherwise it just sends the number in.

S3 methods not using arguments consistent with their generic

When you do R CMD check on the neotoma tarball lots of warnings ar generated along the lines of:

* checking S3 generic/method consistency ... WARNING
get_dataset:
  function(x, ...)
get_dataset.default:
  function(siteid, datasettype, piid, altmin, altmax, loc, gpid,
           taxonids, taxonname, ageold, ageyoung, ageof, subdate)

This is because the S3 methods have been incorrectly set up. An S3 method must have the same arguments as the generic. A method can have additional arguments but it must have the arguments of the generic. In the case above, the get_dataset.default method has first argument siteid where as the generic has argument x. The method is also lacking the ... argument.

The obvious fix is to rewrite the definition of the generic as:

`get_dataset` <- function(siteid, ...)

and the method as

`get_dataset.default` <- function(siteid, datasettype, piid, altmin, altmax, loc, gpid,
           taxonids, taxonname, ageold, ageyoung, ageof, subdate, ...)

This needs doing for all (or most) of the new S3 methods added recently.

Second example in `?get_dataset` fails to run

The second example in ?get_dataset fails on:

gpids <- get_table(table.name='GeoPoliticalUnits')
canID <- gpids[which(gpids$GeoPoliticalName == 'Canada'),1]
v2kyr.datasets <- get_dataset(datasettype='vertebrate fauna', gpid=canID, 
                              ageold = 2000)

with error

Error in param_check(cl) : object 'gpid' not found

Re-base neotoma to use reshape2 instead of reshape

When you load neotoma you get the following startup messages

> require("neotoma")
Loading required package: neotoma
Loading required package: reshape
Loading required package: plyr

Attaching package: ‘reshape’

The following object is masked from ‘package:plyr’:

    rename, round_any

As I understand it, reshape2 should avoid these infelicities and as it is the new code base will be maintained more closely than reshape.

If you want, I can take a stab at this?

Neotoma uses it's own keywords, not those on the approved list

The \keyword{} macro is supposed to only take keywords specified in the KEYWORDS file in the R sources. neotoma is using it's own keywords. Hence these need editing to be from the approved list, appended below:

GROUPED Keywords
----------------

Graphics
    aplot       &   Add to Existing Plot / internal plot
    dplot       &   Computations Related to Plotting
    hplot       &   High-Level Plots
    iplot       &   Interacting with Plots
    color       &   Color, Palettes etc
    dynamic     &   Dynamic Graphics
    device      &   Graphical Devices

Basics
    sysdata     &   Basic System Variables      [!= S]
    datasets    &   Datasets available by data(.)   [!= S]
    data        &   Environments, Scoping, Packages [~= S]
    manip       &   Data Manipulation
    attribute   &   Data Attributes
    classes     &   Data Types (not OO)
        & character &   Character Data ("String") Operations
        & complex   &   Complex Numbers
        & category  &   Categorical Data
        & NA    &   Missing Values          [!= S]
    list        &   Lists
    chron       &   Dates and Times
    package     &   Package Summaries

Mathematics
    array       &   Matrices and Arrays
          & algebra &   Linear Algebra
    arith       &   Basic Arithmetic and Sorting    [!= S]
    math        &   Mathematical Calculus etc.  [!= S]
    logic       &   Logical Operators
    optimize    &   Optimization
    symbolmath  &   "Symbolic Math", as polynomials, fractions
    graphs      &   Graphs, (not graphics), e.g. dendrograms

Programming, Input/Ouput, and Miscellaneous

    programming &   Programming
         & interface&   Interfaces to Other Languages
    IO      &   Input/output
         & file &   Files
         & connection&  Connections
         & database &   Interfaces to databases
    iteration   &   Looping and Iteration
    methods     &   Methods and Generic Functions
    print       &   Printing
    error       &   Error Handling

    environment &   Session Environment
    internal    &   Internal Objects (not part of API)
    utilities   &   Utilities
    misc        &   Miscellaneous
    documentation   &   Documentation
    debugging   &   Debugging Tools

Statistics

    datagen     &   Functions for generating data sets
    distribution    &   Probability Distributions and Random Numbers
    univar      &   simple univariate statistics  [!= S]
    htest       &   Statistical Inference
    models      &   Statistical Models
        & regression&   Regression
        & &nonlinear&   Non-linear Regression (only?)
    robust      &   Robust/Resistant Techniques
    design      &   Designed Experiments
    multivariate    &   Multivariate Techniques
    ts      &   Time Series
    survival    &   Survival Analysis
    nonparametric   &   Nonparametric Statistics [w/o 'smooth']
    smooth      &   Curve (and Surface) Smoothing
         & loess    &   Loess Objects
    cluster     &   Clustering
    tree        &   Regression and Classification Trees
    survey      &   Complex survey samples


MASS (2, 1997)
--------------

add the following keywords :

    classif     &   Classification  ['class' package]
    spatial     &   Spatial Statistics ['spatial' package]
    neural      &   Neural Networks ['nnet'  package]

Cleanup various

  • change fxn names from x.y to x_y
  • move each fxn in to its own file

get_download throws warning message

For some sites (Bondi Section, Billy's Lake and others) the call get_download returns a message:
Aggregation function missing: defaulting to length

I'm not entirely sure why, it actually doesn't seem to affect the output, but it would be nice to resolve.

unit tests for neotoma

As the examples in neotoma are encased in \dontrun{}, the package needs some unit tests to avoid changes in one function that breaks other functions.

Whatever solution is chosen, for submission to CRAN some checks may need to be skipped if the time taken to query Neotoma results in too long a package check time.

Some mammal sites throw errors for get_download

Some sites throw an error:
Error in rep(NA, ncol(counts)) : invalid 'times' argument

This happens in the get.samples function in get_download, I assume it' because there's only one sample, but need to debug later. Happens for example on datasetid 4574

Compress data files

R CMD check raises the following issue:

* checking data for ASCII and uncompressed saves ... WARNING

  Note: significantly better compression could be obtained
        by using R CMD build --resave-data
                   old_size new_size compress
  taxon.list.RData    241Kb    143Kb       xz

Need to compress this file using xz compression if the package is to go to CRAn at some point

Have get_download return information for multiple sites.

get_download should be able to return multiple records using a serial call to the API so that code reading:
get_download(c(1, 2, 3, 4))

returns a list of length 4 where each item in the list is a separate assemblage object.

Warnings during load

> library(neotoma)
Warning messages:
1: replacing previous import ‘rename’ when loading ‘reshape’ 
2: replacing previous import ‘round_any’ when loading ‘reshape’ 

I'd suggest adding reshape to the depends list and take it out of import.

NeotomaDB can return non-count data and non-unique TaxonName data that we crosstab and return as component `counts`

In fixing get_download() after the move to reshape2 I noted a couple of issues, one of which I have tried to effect a fix for in #29. In the examples for get_download() we have

#' #  Search for sites with "Thuja" pollen that are older than 8kyr BP and
#' #  that are on the west coast of North America:
#' t8kyr.datasets <- get_datasets(taxonname='Thuja*', loc=c(-150, 20, -100, 60), ageyoung = 8000)
#'
#' #  Returns 3 records (as of 04/04/2013), get dataset for the first record, Gold Lake Bog.
#' GOLDKBG <- get_download(t8kyr.datasets[[1]]$DatasetID)

When forming the counts component, the TaxonName may not be unique. For example, in this data set Lycopodium tablets occurs twice in TaxonName, differentiated by the units field. However, we wish to crosstab on the TaxonName variable. When we do that, dcast() (and cast() before it) would return the data using fun.aggregate = length - i.e. count how many times each element of TaxonName was present in each sample. This probably went unnoticed because this call was wrapped in suppressMessages() and also perhaps not all data sets have non-unique TaxonName values.

From the example it seems Simon was aware that more than juts Pollen counts would be in the counts component, but if this NeotomaDB doesn't enforce unique values in TaxonName then neotoma needs to handle this. What I did here was pull out only the the rows where TaxaGroup == "Laboratory analyses" and use those for the counts component. Then I added a new component lab.data which pulled out those rows that matched `TaxaGroup == "Laboratory analyses".

This is clearly inelegant - what other values might there be in TaxaGroup? Should we expect to retrieve them all?

How should such situations be handled in get_download()?

Using get_download() comment

I find it a bit confusing that when there is only a single site returned, you still need to index into the first element. For example, if I do

foo <- get_download(datasetid=1665)

Then when I do names(foo) I get NULL. Also when I just try to see the object by typing foo, I don't see anything. Not a huge deal, but I use Neotoma and R all the time and was a bit confused by this so thought I'd comment. Maybe if there is something in the list but we are suppressing printing we could print a message instead? Something like

Printing suppressed due to size of object

Just a thought.

Data sets in `./data` need to documented

R CMD check reports

* checking for missing documentation entries ... WARNING
Undocumented data sets:
  ‘taxon.list’ ‘translate.table’
All user-level objects in a package should have documentation entries.

neotoma needs documentation for these data sets.

get.contacts doesn't work

These are just the function examples. You should check that an object is of class data.frame before melting it.

> #  To find all data contributors who are active:
> active.cont <- get.contacts(contactstatus = 'active')
Error in get.contacts(contactstatus = "active") : 
  The contactstatus must be a character string.
> 
> # To find all data contributors who have the last name "Smith"
> smith.cont <- get.contacts(familyname = 'Smith')
The API call was successful, you have returned  28 records.
Error in UseMethod("melt", data) : 
  no applicable method for 'melt' applied to an object of class "data.frame"
Calls: get.contacts ... withCallingHandlers -> cast -> eval -> melt -> melt.list -> lapply -> FUN
> 
> 

get.datasets returns a list

It should return a data.frame, from SImon:

"The problem is that when a dataset gets returned it can contain a number of possible submission dates and chronologies, so it's a list with lists, and that makes it hard to turn it into a table with a uniform number of columns and stuff (for me anyway)."

The `compile_taxa` does not support external tables

compile_taxa does not accept external tables, but should:

So we need to write that in, that's easy:
compile_taxa(downloads, ext.table = ext.table, list = 'Paleon')

and revise the help file to reflect this change.

Datasets not returning multiple age models

It used to be that get_download would return multiple age models, not it isn't. Datasetids 546 and 3131 should have three age models (COHMAP, NAPD1, NAPD2), but right now they're only returning one.

Check to see if this was a change in the API or in the package.

`get_publications()` returns many empty columns

As referenced in #29 get_publications() can return a large number of columns, many of which are empty for many rows because a single publication with many authors sets the upper limit on the number of columns in the Author data returned, and all other publications are expanded to match.
#29 proposed to alter get_publications() so that it returned a list with two components instead of the single data frame that is currently returned.

The two components would be:

  1. publications; a data frame with components PublicationID, PubType, Year, and Citation.
  2. authors; a data frame in long (melted) format that contained the Authors.ContactName, Authors.ContactID, and Authors.Order information, plus a column with the PublicationID to match with the other publications component.

Dynamically update tables.

There will always be some tables sitting in the package as Rdata files. It would be great if we could check to see if the tables were out of date by talking to neotoma each time the package is started up.

i.e., on package load check the table on neotoma against the table in the program directory to tell if its been modified since the package was last updated. If the package is older than the last instance of the table on the website then send a warning.

`get_download()` reports too many data sets downloaded.

This may just be an infelicity. In the example (from ?get_download())

#' t8kyr.datasets <- get_datasets(taxonname='Thuja*', loc=c(-150, 20, -100, 60), ageyoung = 8000)
#'
#' #  Returns 3 records (as of 04/04/2013), get dataset for the first record, Gold Lake Bog.
#' GOLDKBG <- get_download(t8kyr.datasets[[1]]$DatasetID)

you'll note that get-download() prints

API call was successful. Returned 2 records.

Which it does in the way get_download() counts the number of returned records, to whit

Browse[2]> str(aa, max = 3)
List of 2
 $ success: num 1
  $ data   :List of 2
  ..$ :List of 10
  .. ..$ CollUnitType   : chr "Core"
  .. ..$ DatasetName    : logi NA
  .. ..$ Samples        :List of 24
  .. ..$ DatasetPIs     :List of 2
  .. ..$ DatasetType    : chr "pollen"
  .. ..$ NeotomaLastSub : chr "6/30/2007"
  .. ..$ DefChronologyID: num 442
  .. ..$ Site           :List of 9
  .. ..$ DatasetID      : num 966
  .. ..$ CollUnitHandle : chr "GOLDLKBG"
   ..$ :List of 5
  .. ..$ DatasetType   : chr "pollen"
  .. ..$ DatasetName   : logi NA
  .. ..$ DatasetID     : num 966
  .. ..$ CollUnitType  : chr "Core"
  .. ..$ CollUnitHandle: chr "GOLDLKBG"

get_download throws away aa[[1]] and works only with aa[[2]], by which it counts the length of aa[[2]] as shown above. Then of that it only then works with aa[[2]][[1]]. In other words the list of 5 at the end is dropped.

What is the intended behaviour here? Seems the second list (aa[[2]][[2]]) duplicates some of the data in the first list (aa[[2]][[1]]). As the code ignores aa[[2]][[2]] and is coded to download only a single data set, should be just do away with counting the length of aa[[2]] and reports that a single data set was downloaded?

Installing from github sends an error

When I try to install from GitHub I get the following error:

"C:/PROGRA~1/R/R-30~1.0/bin/x64/R" --vanilla CMD INSTALL "C:\Users\Simon  \
  Goring\AppData\Local\Temp\Rtmps7M5WD\neotoma-master" --library="C:/Users/Simon  \
  Goring/Documents/R/win-library/3.0" --with-keep.source --install-tests 

* installing *source* package 'neotoma' ...
** R
Error in parse(outFile) : 
  C:/Users/Simon Goring/AppData/Local/Temp/Rtmps7M5WD/neotoma-master/R/get_download.R:97:1: unexpected input
96:                                 unit.name = sapply(samples, function(x) x$AnalysisUnitName))
97: <<
   ^
ERROR: unable to collate and parse R files for package 'neotoma'
* removing 'C:/Users/Simon Goring/Documents/R/win-library/3.0/neotoma'
* restoring previous 'C:/Users/Simon Goring/Documents/R/win-library/3.0/neotoma'

but running the code for get_download seems to work fine. Any ideas?

`get_download()` should return a matrix or data.frame for element `count.table`

get_download() crosstabs the downloaded data with xtabs() and returns this as component count.table via the following code

count.table <- xtabs(value ~ L1 + TaxonName, sample.data)

An object of class "xtabs" isn't that useful though; invariably one would want a data frame or a matrix, but it is somewhat tedious to convert and "xtabs" object to either of these more useful classes.

Propose that get_download() returns component count.table as a matrix or a data frame.

This would involve a cast() using acast() or dcast() instead of the current use of xtabs().

Getting an error looking for `llply`

I'm suddenly getting an error using get_dataset. It looks like plyr isn't being exported or something. I think the same thing is happening with RJSONIO as well. Is this because the S3 method doesn't call it?

Error in get_dataset.site(western.sites) : could not find function "llply"

Mapping throws error

In one of the examples from the manuscript (https://github.com/SimonGoring/neotoma_paper/blob/master/Neotoma_paper.Rmd#L81-L88), I get

bc.map <- get_map(location = c(-120, 60), zoom = 4)

ggmap(bc.map) +
   geom_point(data = all.sites, aes(x = long, y = lat)) +
   geom_point(data = get_site(dataset = all.datasets),
              aes(x = long, y = lat),
              color = 2) +
   xlab('Longitude West') + ylab('Latitude North')
Error: Request-URI Too Long

Is it just me?

<r> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mapproj_1.2-2          maps_2.3-6             gridExtra_0.9.1        Bchron_4.1.1           inline_0.3.13         
 [6] plyr_1.8.1             reshape2_1.4           ggmap_2.3              ggplot2_0.9.3.1        neotoma_1.0-1         
[11] Rdocumentation_0.1     slidifyLibraries_0.3.1 slidify_0.4.5          sacbox_0.0.0           devtools_1.5.0.99     

loaded via a namespace (and not attached):
 [1] animation_2.2           ape_3.1-1               clusterGeneration_1.3.1 coda_0.16-1             colorspace_1.2-4       
 [6] digest_0.6.4            ellipse_0.3-8           evaluate_0.5.5          expm_0.99-1.1           fastmatch_1.0-4        
[11] formatR_0.10            gtable_0.1.2            hdrcde_3.1              httr_0.4                igraph_0.7.0           
[16] knitr_1.6               labeling_0.2            lattice_0.20-29         markdown_0.6.5          MASS_7.3-33            
[21] Matrix_1.1-4            mclust_4.3              memoise_0.2.1           mnormt_1.4-7            msm_1.3                
[26] munsell_0.4.2           mvtnorm_0.9-99992       nlme_3.1-117            numDeriv_2012.9-1       parallel_3.1.1         
[31] phangorn_1.99-7         phytools_0.4-05         plotrix_3.5-5           png_0.1-7               proto_0.3-10           
[36] Rcpp_0.11.1             RCurl_1.95-4.1          reshape_0.8.5           rgl_0.93.996            RgoogleMaps_1.2.0.6    
[41] rjson_0.2.13            RJSONIO_1.2-0.2         scales_0.2.4            scatterplot3d_0.3-35    splines_3.1.1          
[46] stringr_0.6.2           survival_2.37-7         tools_3.1.1             whisker_0.3-2           XML_3.98-1.1           
[51] yaml_2.1.13  

@karthik and @gavinsimpson you're awesome!

Thanks so much for all your help @karthik and @gavinsimpson. This package probably could have limped on for a while, but it's really starting to look very polished, and I'm starting to feel much more confident in my own coding, even if some of these discussions make me feel like I'm barely treading water!

Add extractor functions to neotoma

The ongoing writing of the neotoma paper has demonstrated to me that the package would benefit from some extractor functions to get parts of the various objects returned by the main functions.

Examples

Extracting counts from objects

Users need to use constructs like

western.comp[[1]]$counts

to access the counts for the first data set in western.comp.

It would be a nicer user experience if they could just do

counts(western.comp[[1]])
counts(western.comp)

The former would return an object of class c("neo_counts", "data.frame"), essentially a data frame containing the counts data for that site/ The latter would return an object of class c("neo_counts_list", "list"), which would be a list of neo_counts objects, one per site/data set in western.comp.

Extracting sample Ages

This requires

western.comp[[1]]$sample.meta$Age

How about

ages(western.comp[[1]]) ## or Ages()
ages(western.comp)

instead?

This also raises the issue of extracting the sample meta data. That should have an extractor too.

Thoughts?

List of extractor functions to implement

  1. counts() to extract count data from objects. Methods for single objects and lists of objects. Track this in Issue #146.
  2. ages() (or Ages()) to extract sample age data from objects. Methods for single objects and lists of objects.
  3. metadata() to extract forms of metadata from objects. Methods for single objects and lists of objects. Perhaps has argument type allowing one to choose which of the sets of metadata is returned.

If you agree that these would be useful and have suggestions for other extractors we should add, list them and I'll add them to the list so we can track progress here for the general issue and have separate issues to track details of each extractor function in more detail.

Get neotoma up onto cran

Not sure what needs to be done for unit testing. but:

  • Unit tests need to be completed
  • Bug fixes need to be completed
  • Other (non-essential?) cleanup needs to be undertaken
  • Documentation needs to be proofread

Combine multiple records into a meta-sample object

Some synthesis work uses multiple datasets and while the current objects are great it would be useful to make a meta-sampe object that would contain multiple count data for multiple sites. Basically a list that mirrors the object returned by get_download except:

  • metadata: instead of metadata for a single record there would be multiple rows
  • sample.meta: this is generally age data for a single site, maybe return this as a list, or return with columns site, age, uncertainty. Not sure yet.
  • taxon.list: the full set of taxa in the record.
  • counts: The full set of count data, except with row names identifying the core & core depth, to be associated with information in metadata.
  • lab.data: same as above, except that instead of assemblage data it would be the associated lab data.

Argument for `compile_list()` is `data` but `sample` is used in the code

R CMD check warns

* checking Rd \usage sections ... WARNING 
Undocumented arguments in documentation object 'compile_list'
  ‘sample’
Documented arguments not in \usage in documentation object 'compile_list':
  ‘data’

Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See the chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.

A simple fix is to change line 7 in ./R/compile.list.R from

#' @param data A pollen object returned by \code{get_download}.

to

#' @param sample A pollen object returned by \code{get_download}.

But data seems a better name for the object returned by get_download().

Alternatively, the argument could be object which is used regularly in R.

Windows builds failing because of the `.Rproj` file

The Appveyor Windows builds are failing because the *.*proj extension is used to identify a Visual Studio project which the MSBuild script wants to do something with. As the repo has a *.Rproj this seems to be being recognised by the build script as a *.*proj file in the Visual Studio sense. You can see the error message here: https://ci.appveyor.com/project/sckott/neotoma/build/1.0.21#L5

Does anyone know if we can hide this file from Appveyor, or should we just add this to the .gitignore list and remove it from the repo?

Thoughts @karthik @sckott ? How have you handled this for other ropensci repos?

compile_taxa.R missed.samples object

There is an object missed.samples in compile_taxa.R that never gets used... not sure what the intention was, but it needs to be fixed or removed.

Remove dependence on plyr

Having looked at the code, there is little reason to use plyr in neotoma. Removing a dependency should make maintaining the package into the future that bit simpler.

I have a branch in my repo that has completely removed the dependency on plyr. It is ready to be pulled once #29 is merged, should you wish to merge it. At the moment if I do a pull request on that branch, it will try to merge #29 and the new changes in my noplyr branch.

Getting all datasets fails.

Running the command:
aa <- get_datasets()

Returns the error:
The API call was successful, you have returned 10786 records. Error in names(x$SubDates) <- c("SubmissionDate", "SubmissionType") : 'names' attribute [2] must be the same length as the vector [0] In addition: Warning message: In getForm(base.uri, .params = cl, binary = FALSE, .encoding = "utf-8", : No inputs passed to form

Not sure if this is one record that is causing the problem, or a general problem with datasets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.