Git Product home page Git Product logo

amadeus's Introduction

amadeus amadeus website

R-CMD-check cov lint pkgdown Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

amadeus is a mechanism for data, environments, and user setup for common environmental and climate health datasets in R. amadeus has been developed to improve access to and utility with large scale, publicly available environmental data in R.

Installation

amadeus is not yet available from CRAN, but it can be installed with the devtools, remotes, or pak packages.

devtools::install_github("NIEHS/amadeus")
remotes::install_github("NIEHS/amadeus")
pak::pak("NIEHS/amadeus")

Contribution

To add or edit functionality for new data sources or datasets, open a Pull request into the main branch with a detailed description of the proposed changes. Pull requests must pass all status checks, and then will be approved or rejected by amadeus's authors.

Utilize Issues to notify the authors of bugs, questions, or recommendations. Identify each issue with the appropriate label to help ensure a timely response.

Download

download_data accesses and downloads raw geospatial data from a variety of open source data repositories. The function is a wrapper that calls source-specific download functions, each of which account for the source's unique combination of URL, file naming conventions, and data types. Download functions cover the following sources:

Data Source File Type Data Genre
Climatology Lab TerraClimate netCDF Meteorology
Climatology Lab GridMet netCDF Climate
Water
Köppen-Geiger Climate Classification GeoTIFF Climate Classification
MRLC1 Consortium National Land Cover Database (NLCD) GeoTIFF Land Use
NASA2 Moderate Resolution Imaging Spectroradiometer (MODIS) HDF Atmosphere
Meteorology
Land Use
Satellite
NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) netCDF Atmosphere
Meteorology
NASA SEDAC3 UN WPP-Adjusted Population Density GeoTIFF
netCDF
Population
NASA SEDAC Global Roads Open Access Data Set Shapefile
Geodatabase
Roadways
NASA Goddard Earth Observing System Composition Forcasting (GEOS-CF) netCDF Atmosphere
Meteorology
NOAA Hazard Mapping System Fire and Smoke Product Shapefile
KML
Wildfire Smoke
NOAA NCEP4 North American Regional Reanalysis (NARR) netCDF Atmosphere
Meteorology
OpenGeoHub Foundation OpenLandMap GeoTIFF Climate
Elevation
Soil
Land Use
Satellite
Parameter Elevation Regression on Independent Slopes Model (PRISM) BIL
ASCII
Climate
US EPA5 Air Data Pre-Generated Data Files CSV Air Pollution
US EPA Ecoregions Shapefile Climate Regions
US EPA National Emissions Inventory (NEI) CSV Emissions
US EPA Toxic Release Inventory (TRI) Program CSV Chemicals
Pollution
USGS6 Global Multi-resolution Terrain Elevation Data (GMTED2010) ESRI ASCII Grid Elevation
USGS National Hydrography Dataset (NHD) Geopackage
Geodatabase
Hydrography

See the "download_data" vignette for a detailed description of source-specific download functions.

Example use of download_data using NOAA NCEP North American Regional Reanalysis's (NARR) "weasd" (Daily Accumulated Snow at Surface) variable.

directory <- "/  EXAMPLE  /  FILE  /  PATH  /"
download_data(
  dataset_name = "narr",
  year = c(2022, 2022),
  variable = "weasd",
  directory_to_save = directory,
  acknowledgement = TRUE,
  download = TRUE
)
Downloading requested files...
Requested files have been downloaded.
list.files(paste0(directory, "weasd"))
[1] "weasd.2022.nc"

Process

process_covariates imports and cleans raw geospatial data (downloaded with download_data), and returns a single SpatRaster or SpatVector into the user's R environment. process_covariates "cleans" the data by defining interpretable layer names, ensuring a coordinate reference system is present, and managing `timedata (if applicable).

To avoid errors when using process_covariates, do not edit the raw downloaded data files. Passing user-generated or edited data into process_covariates may result in errors as the underlying functions are adapted to each sources' raw data file type.

Example use of process_covariates using the downloaded "weasd" data.

weasd <- process_covariates(
  covariate = "narr",
  date = c("2022-01-01", "2022-01-05"),
  variable = "weasd",
  path = paste0(directory, "weasd"),
  extent = NULL
)
Cleaning weasd data for January, 2022...
Detected monolevel data...
Returning daily weasd data from 2022-01-01 to 2022-01-05.
weasd
class       : SpatRaster
dimensions  : 277, 349, 5  (nrow, ncol, nlyr)
resolution  : 32462.99, 32463  (x, y)
extent      : -16231.49, 11313351, -16231.5, 8976020  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs
source      : weasd.2022.nc:weasd
varname     : weasd (Daily Accumulated Snow at Surface)
names       : weasd_20220101, weasd_20220102, weasd_20220103, weasd_20220104, weasd_20220105
unit        :         kg/m^2,         kg/m^2,         kg/m^2,         kg/m^2,         kg/m^2
time        : 2022-01-01 to 2022-01-05 UTC

Calculate Covariates

calc_covariates stems from the beethoven project's need for various types of data extracted at precise locations. calc_covariates, therefore, extracts data from the "cleaned" SpatRaster or SpatVector object at user defined locations. Users can choose to buffer the locations. The function returns a data.frame or SpatVector with data extracted at all locations for each layer or row in the SpatRaster or SpatVector object, respectively.

Example of calc_covariates using processed "weasd" data.

locs <- data.frame(id = "001", lon = -78.8277, lat = 35.95013)
weasd_covar <- calc_covariates(
  covariate = "narr",
  from = weasd_process,
  locs = locs,
  locs_id = "id",
  radius = 0,
  geom = FALSE
)
Detected `data.frame` extraction locations...
Calculating weasd covariates for 2022-01-01...
Calculating weasd covariates for 2022-01-02...
Calculating weasd covariates for 2022-01-03...
Calculating weasd covariates for 2022-01-04...
Calculating weasd covariates for 2022-01-05...
Returning extracted covariates.
weasd_covar
    id       time     weasd_0
1 0001 2022-01-01 0.000000000
2 0001 2022-01-02 0.000000000
3 0001 2022-01-03 0.000000000
4 0001 2022-01-04 0.000000000
5 0001 2022-01-05 0.001953125

Additional Resources

The following R packages can also be used to access climate and weather data in R, but each differs from amadeus in the data sources covered or type of functionality provided.

Package Source
dataRetrieval USGS Hydrological Data and EPA Water Quality Data
daymetr Daymet
ecmwfr ECMWF Reanalysis v5 (ERA5)
RClimChange7 NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6)
rNOMADS NOAA Operational Model Archive and Distribution System
sen2r8 Sentinel-2

Footnotes

  1. Multi-Resolution Land Characteristics

  2. National Aeronautics and Space Administration

  3. Socioeconomic Data and Applications Center

  4. National Centers for Environmental Prediction

  5. United States Environmental Protection Agency

  6. United States Geological Survey

  7. Last updated more than two years ago.

  8. Archived; no longer maintained.

amadeus's People

Contributors

mitchellmanware avatar sigmafelix avatar kyle-messier avatar eva0marques avatar

Stargazers

Allan Just avatar Tobias Augspurger avatar Cole Brokamp avatar Andrew Allen Bruce avatar

Watchers

 avatar  avatar

amadeus's Issues

System-specific commands

Command writing needs to adapt to user's machine type.

  • Implement code to detect system with .Platform$OS.type ==
  • Support functions for source + system specific command generation
  • Implement supports in download functions

Wrapper functions: one-shot calc_covariates or proc_covariates & calc_covariates?

Old calculate_covariates used path, sites, id_col as common arguments. This approach made sense as previous calc_* functions were a combination of processing (or importing) and calculating functions. However, the wrapper will not work as we split these into two parts, which makes me think about calc(ulate)_covariates wrapper function refactoring ideas.

  • From the start: Do we need a wrapper function?
  • If we agree on the need of a wrapper function, I think there are two approaches to refactor calc_covariates:
    • Add process_ parts inside the current calculate_covariates for convenience
    • Add process_covariates besides calculate_covariates for consistency

To deal with data-specific arguments in a wrapper function, I found that the combined use of ellipsis argument (...) and rlang::inject(foo(!!!args)) is helpful for development, which is reflected in my 0.1.0 PR #13 .

URL unit tests HTTP response performance

Previously tried to update check_url_status with httr2 package, but issues with MODIS unit tests did not accurately reflect improved performance. Second attempt to substitute httr::GET() and httr::HEAD() with httr2::request() %>% httr2:req_perform() for increased test URL check performance.

Inconsistent behavior of `process_narr`

process_narr runs at all years regardless of the values of date argument.

amadeus::process_narr(path = "../../../../group/set/Projects/NRT-AP-Model/input/narr/omega", variable = "omega", date = c("2020-03-01", "2020-04-30"))
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Error: [subset] no (valid) layer selected

The example below returns different results from what was supposed to be as set in date argument:

amadeus::process_narr(path = "../../../../group/set/Projects/NRT-AP-Model/input/narr/omega", variable = "omega", date = c("2020-03-01", "2022-01-01"))
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2018...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2019...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2020...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2021...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Cleaning omega data for year 2022...
# Detected pressure levels data...
# Returning daily omega data from 2020-03-01 to 2022-01-01.
# class       : SpatRaster 
# dimensions  : 277, 349, 10933  (nrow, ncol, nlyr)
# resolution  : 32462.99, 32463  (x, y)
# extent      : -16231.49, 11313351, -16231.5, 8976020  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=lcc +lat_0=50 +lon_0=-107 +lat_1=50 +lat_2=50 +x_0=5632642.22547 +y_0=4612545.65137 +datum=WGS84 +units=m +no_defs 
# sources     : omega.202101.nc:omega  (899 layers) 
#               omega.202102.nc:omega  (812 layers) 
#               omega.202103.nc:omega  (899 layers) 
#               ... and 21 more source(s)
# varnames    : omega (Daily Omega on Pressure Levels) 
#               omega (Daily Omega on Pressure Levels) 
#               omega (Daily Omega on Pressure Levels) 
#               ...
# names       : omega~10101, omega~10101, omega~10101, omega~10101, omega~10101, omega~10101, ... 
# unit        :    Pascal/s,    Pascal/s,    Pascal/s,    Pascal/s,    Pascal/s,    Pascal/s, ... 
# time (days) : 2021-01-01 to 2022-01-01 

@mitchellmanware Is it as designed or did I miss something in the workflow? More documentation about date argument would be helpful.

`process_aqs` data retention

@sigmafelix

When using the process_aqs function, why does the returned object (whether sf, SpatVector, or data.table) only return the data_field when mode = "sparse"?

> aqs <- process_aqs(
+   path = "./beethoven/rtorch/data/aqs/data_files/",
+   date = c("2018-01-01", "2018-01-10"),
+   mode = "full",
+   data_field = "Arithmetic.Mean",
+   return_format = "data.table"
+ )
> head(aqs)
          site_id       lon      lat       time
           <char>     <num>    <num>     <char>
1: 01073002388101 -86.81500 33.55306 2018-01-01
2: 01073100588101 -87.00361 33.33111 2018-01-01
3: 01073101088101 -86.54917 33.54528 2018-01-01
4: 01073200388101 -86.92417 33.49972 2018-01-01
5: 01073205988101 -86.84411 33.52143 2018-01-01
6: 01089001488101 -86.58816 34.68547 2018-01-01

update README.md

  • How to install via GitHub
  • Examples to reflect new function names and output
  • New data sources in table
  • Citations

Detecting corrupted or incomplete downloads

@mitchellmanware

When running beethoven pipeline in 2022, I found that one (or more) of GEOS-CF chemical file was downloaded incompletely (i.e., the file causing the error was 2MB, which is only one-fortieth in size of typical GEOS-CF chemical files). Post-checking or detection of incomplete files would be helpful for users who want to download a large set of files from the internet.

For this file in trouble, I will replace it with a newly downloaded file. Could you change the write permission of input/geos directory in the team project folder @kyle-messier ?

Considerations

  • I suggest two approaches.
    • One is to use file hashes (e.g., SHA256MD5SUM) that are provided by the data source in some cases. If such piece of information is retrievable from JSON or HTTP request header, we could quickly verify the downloaded files with that.
    • The other is leveraging summary statistics of downloaded files, which assume that we have quite reliable network then most of the files were downloaded properly. fs package includes many handy functions to summarize files in tibbles. In this case, we could compare each file size with the typical size or a statistic of all downloaded files to indicate which files were probably corrupted or incomplete.
      • A challenge remains in this approaches where file sizes are so heterogeneous that there is no use with statistics of file sizes (e.g., MODIS tiles are drastically different in size depending on the effective data cells or number of NA/NaNs, unlike full space-time grids in modeling products including GEOS-CF and NARR).

New tests

Create additional tests for robust and integration testing.

  • download.R
    • Live download run to cover download_run() and zip file commands
  • download_support.R
    • download_run(download = TRUE)
    • download_sink()
    • download_unzip(unzip = TRUE)
    • download_remove_zips(remove = TRUE)
  • download_test_support.R
    • Missing position in extract_urls()
    • Missing sample size in check_urls()
  • calculate_covariates.R
  • manipulate_spacetime_data.R

Improving MODIS tile handling

  • When MODIS tiles are downloaded by tile indices, there is a case where only a few of the tiles are available in the index range.
  • An example is 2021.09.22 MOD09GA products, when the mainland US is completely missing even though the upper left corner (h08v03) in the Northern Pacific Ocean remained due to the range setting (h: 8-13, v: 3-6). It returned an error because of the collision between NULL return and the postprocessing functions for data.frame, leading to failing the calculation at other dates.
  • Better logic should be implemented; a quick idea is --
    • to use standard deviation or median of the number of tiles then to exclude dates when the number is below a preset threshold or aforementioned statistics.

`calc_narr` replace "level="

When the name string (containing pressure levels) is split, the "level=" is retain and returned in the $level column.

 shum_covariates_wrapper
      id       time      level     shum_100
1  loc_1 2022-01-01 level=1000 1.435557e-02
2  loc_1 2022-01-01  level=975 1.323741e-02
3  loc_1 2022-01-01  level=950 1.266798e-02
4  loc_1 2022-01-01  level=925 1.256814e-02
5  loc_1 2022-01-01  level=900 1.178267e-02
6  loc_1 2022-01-01  level=875 1.083267e-02
7  loc_1 2022-01-01  level=850 9.736534e-03
8  loc_1 2022-01-01  level=825 8.581291e-03

To do: gsub("level=", "", x$level) and check MERRA2 + GEOS for similar

Adopting linter

My PR's workflows are adopted from chopin's. Do we want to use linter for amadeus?

Add water variables

Air quality/hydrography data support in download/process/calculate suites. As download functions are already in USGS's nhdplusTools and dataRetrieval packages, I think priority is given to the other two. From PrestoGP covariate list, water variables are:

  • NAWQA Pesticide (county level) estimates

@kyle-messier One question on the direction is if we want to calculate station data by nearest neighbor spatial join or SEDC, or summarize station data at HUCs then do spatial join afterwards.

Other variables that are not water-related are put in low priority:

  • Soil chemistry (point-based), USGS
  • OpenLandMap
  • Geology
  • TerraClimate
  • Cropscape (aka CDL)
  • PRISM

coverage report

Since we had to move from codecov.io, we lost the ability to view the line-by-line coverage report. This helps us know where to focus additional tests, etc. Let's add the covr::report() html from a local run to the repo. Perhaps it could live in test/report if that doesn't interfere with R-CMD-build or others.

Disabling URL status check tests

We all found URL status check tests for MODIS datasets take too long to get test results. They eventually failed in most of the cases. I think URL tests, at least for MODIS datasets, need to be disabled. In my opinion, the purpose of URL checking is to make sure that the file is available for downloading. I realized that these tests are testing the server status rather than the functions. In this regard, I suggest disabling such tests from our test suites. We might consider reactivating the tests in the future or add a short vignette to inform users to check data availability with URL status checks.

  • Add # nocov flags in relevant lines for URL status checks
  • Comment out the check_urls parts from test-download_functions.R

`process_hms` variable parameter

Update to include snake ("light") and camel ("Light") case definitions of variable parameter. Function currently only accepts camel case. Update in manuscript case study.

pkgdown updates

The pkgdown yml needs to be updated to have more sophisticated reference page so that it organized by process_, download_ etc.

remove and tidy deprecated stdt functions

We need to remove the deprecated stdt functions from the main branch. They will always be accessible from the old committed versions, but we need to start getting things clean and tidy.

Accept polygons in `calc_*` functions

Currently the calc_ functions only accept points (with optional buffer). Extracting/calculating for any polygon boundary is useful for census-boundary summary statistics

Refactoring

This is a work list for v1 (CRAN or ROpenSci submission, whatever is first). Here I make a quick to-do list for refactoring process_* and calc_* functions in amadeus.

  • Reduce the number of function calls that read data from storage (e.g., terra::rast(), terra::vect(), terra::time() etc.) by assigning the output of repetitively used functions once earlier then use the object later multiple times
  • Unifying output class of calc_*, plain data.frame or data.table or tibble
    • Data layout: long or wide data frame?
  • (Discussion point) Human-readable field names versus field codes based on a naming convention as used in beethoven

Coverage improvement

The current PR's coverage is 97.86%. I couldn't figure out which tests exactly made the PR tests fail now, but soon I will start work on coverage while I am looking into the tests with problems.

image

Unifying date-related arguments into `date`

We have two sets of temporal arguments in amadeus functions (i.e., process_* functions): date or date_start/date_end. How about taking date as character(2) (or Date(2) / POSIXct(2)) in functions which date_start and date_end are used? I will convert these once we will find a consensus.

Coverage rate dropped from 95% to 91%

I found that the coverage rate has dropped to 91% after two consecutive merging that were made six minutes apart. Our ubuntu (devel) runner usually takes 30+ minutes to finish R CMD CHECK, so I would recommend waiting for 40 minutes after a merge is initiated at main. Merging main into your branch while another merge is running will result in this type of silent conflict/rollback. We might try using some configurations as described in a StackOverflow discussion to avoid similar instances in the future.

Retain columns in `calc_covariates`

Currently, calc_worker limits output to the locs_id, time, levels, and data value columns. It does not return the other columns from locs.

Add retain_columns (or similar name) to allow user to retain all data from locs.

coverage report via GH-actions

@sigmafelix @mitchellmanware Given that the NIEHS org won't allow codecov.io 3-rd party integration for our coverage analysis and badge, I've tried a straight up GitHub action only solution. Unfortunately, I (and ChatGPT, CoPilot) have not been successful.

Here is the relevant repo https://github.com/marketplace/actions/create-coverage-badges

Lower priority than other Amadeus tasks, but if y'all want to give it a try, that would be great.

Again, the idea is that we want to call covr as normal in the workflow, but instead of sending the report to codecov.io, we retain the report, create a badge based off that report, and update the README.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.