Git Product home page Git Product logo

epwshiftr's People

Contributors

adchong avatar hongyuanjia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

epwshiftr's Issues

init_cmip6_index Assertion on activity failed source NULL

Hi

The init_cmip6_index(..., source=NULL) usage seems broken:

packageDescription("epwshiftr")[c("Version", "Built")]
# 0.1.4
# R 4.1.2; ; 2024-04-12 10:18:46 UTC; unix
dt <- epwshiftr::init_cmip6_index(activity="CMIP", experiment="historical", variable="tas", frequency="mon") # ok
dt <- epwshiftr::init_cmip6_index(activity="CMIP", experiment="historical", variable="tas", frequency="mon", source=NULL) # error
Error in esgf_query(activity = unique(q$activity_drs), variable = unique(q$variable_id),  : 
  Assertion on 'activity' failed: Must be a subset of {'AerChemMIP','C4MIP','CDRMIP','CFMIP','CMIP','CORDEX','DAMIP','DCPP','DynVarMIP','FAFMIP','GMMIP','GeoMIP','HighResMIP','ISMIP6','LS3MIP','LUMIP','OMIP','PAMIP','PMIP','RFMIP','SIMIP','ScenarioMIP','VIACSAB','VolMIP'}, but has additional elements {'E3SM-Project','CAS'}.

Cheers,
Chris

Error using summary_database()

Error using summary_database()
Hi, I am trying to use the tool and receiving this warning message while using the summary_database() . I have copied the file in the same directory as the cmip6_index.csv. However, I am getting the following warning message.

In addition: Warning message:
Case(s) shown below does not matche any NetCDF file in the database. Please make sure all needed NetCDF files listed in the file index have been downloaded and placed in the database.
#1 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20340101-20341231.nc':

#2 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20350101-20351231.nc':

#3 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20360101-20361231.nc':

This leads to the following error in the subsequent steps:

coord$coord[, .(file_path, coord)]
Empty data.table (0 rows and 2 cols): file_path,coord
> str(coord$coord$coord[[1]])
Error in coord$coord$coord[[1]] : subscript out of bounds

The name of a sample .nc file :

tas_day_AWI-CM-1-1-MR_ssp126_r1i1p1f1_gn_20350101-20351231

esgf_query() did not accept empty resolution

library(epwshiftr)

esgf_query(variable = "tas", resolution = NULL)
#> No matched data. Please check network connection and the availability of LLNL ESGF node.
#> Null data.table (0 rows and 0 cols)

Created on 2021-02-01 by the reprex package (v0.3.0)

Failed to build PDF manual

Should use aligned environment instead of align

! LaTeX Error: Environment align undefined.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...

l.3884 }{}

Consider Southern Hemisphere latitudes

Hi! Thanks for this tool, it is quite cool.

I have been testing this tool on some southern-hemisphere climates, and it seems to be that, by default, the weather generated starts in Winter (i.e., Northern Hemisphere, where 1st of Jan is winter). I way this because, when plotting the dry bulb temperatures, I get the following:

Screenshot 2022-09-05 at 5 18 32 PM

Whereas, when I reorganize as follows:

tic = 142*24 # June 21
tac = 266*24 # Sept 23
late_summer = list(range(0,tic))
winter = list(range(tic,tac))
early_summer = list(range(tac,8760))
nhemisphere_temp = epw.data['dry_bulb_temperature']
late_summer_data = list(nhemisphere_temp[late_summer])
winter_data = list(nhemisphere_temp[winter])
early_summer_data = list(nhemisphere_temp[early_summer])
transformed = early_summer_data + late_summer_data + winter_data
epw.data['dry_bulb_temperature'] = transformed

Then I get the following:

Screenshot 2022-09-05 at 5 19 51 PM

Is this a bug? Is there an option for this? Can it be triggered based on the latitude of the EPW?

Best!

joss-review pkg readme

downloading CMIP6 models get sutck with some variables

Hello! I've been having issues downloading models. For some reason, it get stuck in most of the files that I've tried to pull. See some examples using this code for the variable o2.

Thanks for any help!

idx <- init_cmip6_index(
  # only consider ScenarioMIP activity
    activity = "ScenarioMIP",
  # specify dry-bulb temperature and relative humidity
    variable = "o2",
  # specify report frequent
    frequency = "mon",
  # specify experiment name
    experiment = c("ssp126", "ssp245", "ssp585"),
  # specify GCM name
    source = NULL,
  # specify variant
    variant = "r1i1p1f1",
  # More options
    replica = FALSE,
    latest = TRUE,
    resolution = NULL,
    data_node = NULL,
  # specify years of interest
    years = c(seq(2022, 2100, 1)),
  # save to data dictionary
    save = TRUE,
)

esm <- idx$file_url[1]

download.file(url = esm[1],
              destfile = paste0("inputs/o2/", basename(esm[1])),
              cacheOK = TRUE,
              extra = "--random-wait --retry-on-http-error=503",
              mode = "wb")

`future_epw()` should return information about how the weather data is split

Currently, future_epw() directly returns the created Epw objects for future climate. I always find I have to do manual steps to process each generated EPW file name using regex to get an idea of the scenario of each output. It would be useful to return a data.frame containing the information about how the data is split and aggregated based on the by argument.

HDF5 format was not supported

Currently, summary_database() only lists files with .nc extensions. There are some GCMs that output files in HDF5 format.

Problem getting index file for monthly data

Cannot build index file for monthy data. There is conflict as "mon" becomes "Amon" after the first esfg query.

library(epwshiftr)

options(epwshiftr.dir = "tmp")
options(epwshiftr.verbose = TRUE)

# get CMIP6 data nodes
nodes <- get_data_node()

idx <- init_cmip6_index(
  activity = "ScenarioMIP",
  variable = "tas",
  frequency = "mon",
  source = c("EC-Earth3"),
  experiment = c("ssp126"),
  data_node = nodes[status == "UP", data_node],
  years = c(2050, 2080)
)

Here is the error I receive:

Error in esgf_query(activity = unique(q$activity_drs), variable = unique(q$variable_id),  : 
  Assertion on 'frequency' failed: Must be a subset of {'1hr','1hrCM','1hrPt','3hr','3hrPt','6hr','6hrPt','day','dec','fx','mon','monC','monPt','subhrPt','yr','yrPt'}, but is {'Amon'}.

Different `dataset_id` could link to the same dataset

dataset_id could not be used as the unique identifier of the dataset. It is specific to data node. This did not cause any problems for esgf_query(), but did result in duplicated entries in the results of init_cmip6_index() when replica is set to TRUE. Should use dataset_pid as the unique dataset identifier when building index.

q <- epwshiftr::esgf_query(
    activity = "ScenarioMIP",
    variable = "tas",
    frequency = "day",
    experiment = "ssp585",
    source = "AWI-CM-1-1-MR",
    variant = "r1i1p1f1",
    replica = TRUE,
    latest = TRUE,
    resolution = "100 km",
    limit = 10000L,
    data_node = NULL
)

q[, .(dataset_id, dataset_pid)]
#>                                                                                        dataset_id
#> 1:   CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data1.llnl.gov
#> 2: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data3.diasjp.net
#> 3:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.ceda.ac.uk
#> 4:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.nci.org.au
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 2: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 3: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 4: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

unique(q[, -c("dataset_id", "data_node")])
#>    mip_era activity_drs institution_id     source_id experiment_id member_id
#> 1:   CMIP6  ScenarioMIP            AWI AWI-CM-1-1-MR        ssp585  r1i1p1f1
#>    table_id frequency grid_label  version nominal_resolution variable_id
#> 1:      day       day         gn 20190529             100 km         tas
#>              variable_long_name variable_units
#> 1: Near-Surface Air Temperature              K
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

Created on 2022-09-19 with reprex v2.0.2

Can't find MIROC6 model results

I'm trying to download daily data from the MIROC6 model. I've checked that the result in question is available in the ESGF system. Looking at the idx response attributes, the shards looks funny.

Is there a limit as to which model we can search?


library(epwshiftr)

idx <- init_cmip6_index(
    # only consider ScenarioMIP activity
    activity = "ScenarioMIP",
    # specify variables
    variable = c("pr"),
    # specify report frequent
    frequency = "day",
    # specify experiment name
    experiment = c("ssp245"),
    # specify GCM name
    source = c("MIROC6"),
    # specify variant,
    variant = "r1i1p1f1"
    )
#> No matched data. Please examine the actual response using 'attr(x, "response")'.

# This query should return one result
# looking at the response attribute
attr(idx, 'response')
#> $responseHeader
#> $responseHeader$status
#> [1] 0
#> 
#> $responseHeader$QTime
#> [1] 28
#> 
#> $responseHeader$params
#> $responseHeader$params$df
#> [1] "text"
#> 
#> $responseHeader$params$q.alt
#> [1] "*:*"
#> 
#> $responseHeader$params$indent
#> [1] "true"
#> 
#> $responseHeader$params$echoParams
#> [1] "all"
#> 
#> $responseHeader$params$fl
#> [1] "*,score"
#> 
#> $responseHeader$params$start
#> [1] "0"
#> 
#> $responseHeader$params$fq
#> $responseHeader$params$fq[[1]]
#> [1] "type:Dataset"
#> 
#> $responseHeader$params$fq[[2]]
#> [1] "project:\"CMIP6\""
#> 
#> $responseHeader$params$fq[[3]]
#> [1] "activity_id:\"ScenarioMIP\""
#> 
#> $responseHeader$params$fq[[4]]
#> [1] "experiment_id:\"ssp245\""
#> 
#> $responseHeader$params$fq[[5]]
#> [1] "source_id:\"MIROC6\""
#> 
#> $responseHeader$params$fq[[6]]
#> [1] "variable_id:\"pr\""
#> 
#> $responseHeader$params$fq[[7]]
#> [1] "nominal_resolution:\"100km\" || nominal_resolution:\"50km\" || nominal_resolution:\"100 km\" || nominal_resolution:\"50 km\""
#> 
#> $responseHeader$params$fq[[8]]
#> [1] "variant_label:\"r1i1p1f1\""
#> 
#> $responseHeader$params$fq[[9]]
#> [1] "frequency:\"day\""
#> 
#> $responseHeader$params$fq[[10]]
#> [1] "replica:false"
#> 
#> $responseHeader$params$fq[[11]]
#> [1] "latest:true"
#> 
#> 
#> $responseHeader$params$rows
#> [1] "10000"
#> 
#> $responseHeader$params$q
#> [1] "*:*"
#> 
#> $responseHeader$params$shards
#> [1] "localhost:8983/solr/datasets,localhost:8985/solr/datasets,localhost:8987/solr/datasets,localhost:8988/solr/datasets,localhost:8990/solr/datasets,localhost:8993/solr/datasets,localhost:8994/solr/datasets,localhost:8995/solr/datasets,localhost:8996/solr/datasets,localhost:8997/solr/datasets"
#> 
#> $responseHeader$params$tie
#> [1] "0.01"
#> 
#> $responseHeader$params$facet.limit
#> [1] "2048"
#> 
#> $responseHeader$params$qf
#> [1] "text"
#> 
#> $responseHeader$params$facet.method
#> [1] "fc"
#> 
#> $responseHeader$params$facet.mincount
#> [1] "1"
#> 
#> $responseHeader$params$wt
#> [1] "json"
#> 
#> $responseHeader$params$facet.sort
#> [1] "lex"
#> 
#> 
#> 
#> $response
#> $response$numFound
#> [1] 0
#> 
#> $response$start
#> [1] 0
#> 
#> $response$maxScore
#> [1] 0
#> 
#> $response$docs
#> list()

Created on 2021-08-12 by the reprex package (v2.0.0)

This is the expected result

image

Use `utils:URLencode()`

esgf_query() uses a self-implemented URL encode approach, which is kind of a hack. It is better to take advantage of the utils::URLencode() function instead.

Failed to load local cmip6 index

options(epwshiftr.dir = here::here("data/cmip6"))
epwshiftr::load_cmip6_index()
#> Loading CMIP6 experiment output file index created at 2020-09-03 22:38:54.
#> Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#>   Incompatible join types: x.datetime_start (double) and i.V1 (character)

joss-review Description of morphing

The term morphing is used in building simulations, but rarely in other fields. Can you update the documentation in morphing_epw() and explain what you mean by stretching and shifting? Maybe even with some formulas? You can still keep the reference to the paper, but some basic info would be very useful for users.

Otherwise, you could also mention the terms bias adjustment and downscaling, since these two are more common in other disciplines.

Failed to retrieve data node status via `get_data_node()`

The LLNL ESGF Node has transferred to use the new Metagrid UI, which makes get_data_node() fail to parse the data node status. Workrounds include:

  • Use the legacy interface from other federated ESGF Nodes
  • Update the parse to handle Metagrid UI

CRAN check fails when LLNL ESGF node is not available

Comments from Brian Ripley:

There is a variety of failures here, it seems both in contacting a
website and in the content of that site.  We need to remind you of the
CRAN policy

'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available or has changed (and
not give a check warning nor error).'

so this needs correction whether or not the resource recovers.

joss-review Usage examples

The usage examples are often not run, and require auxiliary data that is not in the packages. Do you think it’s possible to add some data to the package, so the examples can be run? I know that putting raw GCM data in there is not possible, otherwise the package size will explode. But maybe crop some netcdf file to a little extent, and put some epw file, and then you can have real examples that can be run? What do you think?

Use `offset` for query result pagination

Currently, init_cmip6_index() only returns the first 10,000 records.

epwshiftr/R/esgf.R

Lines 446 to 452 in 09827d4

if (nrow(qd) == 10000L) {
warning("The dataset query returns 10,000 results which ",
"hits the maximum record limitation of a single query using ESGF search RESTful API. ",
"It is possible that the returned Dataset query responses are not complete. ",
"It is suggested to examine and refine your query."
)
}

It is enough for most use cases. But it will still be good to implement pagination.

Allow empty `replica` for query

According ESGF Search RESTful API, the default behavior is to returns all records (masters and replicas). Current implantation always specifies the replica parameter, which means to send a query to either return only master records or only replicas.

Example code

Hello Jia!
I am here!!
I need an example of code as converse in Email earlier

Regards
ZZaman

Release epwshiftr v0.1.1

Prepare for release:

  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Update cran-comments.md
  • Polish NEWS
  • Review pkgdown reference index for, e.g., missing topics

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_dev_version()

hi.1 when i change source gcm in knitr always see 0 percent ,just AWI-CM-1-1-MR working!!!! 2.how i can adding another source beside this 11 source(amateur at r and r studio)

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem

# insert reprex here

Specify a minimal required version of checkmate

assert_multi_class() was added in {checkmate} Version 1.9.0 (2019-01-09). Even this version was published 3 years ago, it is still possible that the user may have already installed an older version. Actually, there is already an error reported from a user that failed to load {epwshiftr} due to a lower version of {checkmate}

Alpha values could be quite big and thus introduce results far from reality

For instance, for tas_day_AWI-CM-1-1-MR_ssp585_r1i1p1f1_gn_20500101-20501231.nc, the alpha value for November could be -19.95. Should introduce a logic to issue warnings about this case and fallback to Shift method.

data_mean[, .(lon, lat, dist, epw_mean, gcm_mean = value, delta, alpha)]
#>          lon      lat     dist        epw_mean       gcm_mean            delta           alpha
#>        <num>    <num>    <num>         <units>        <units>          <units>         <units>
#>  1: 106.4062 35.99986 89.25595 -7.5504032 [°C] -4.804854 [°C]  2.74554916 [°C]   0.6363705 [1]
#>  2: 106.4062 35.99986 89.25595 -3.9571429 [°C] -1.769299 [°C]  2.18784401 [°C]   0.4471152 [1]
#>  3: 106.4062 35.99986 89.25595  1.3489247 [°C]  3.581754 [°C]  2.23282969 [°C]   2.6552663 [1]
#>  4: 106.4062 35.99986 89.25595  8.2494444 [°C]  8.227525 [°C] -0.02191962 [°C]   0.9973429 [1]
#>  5: 106.4062 35.99986 89.25595 13.4138441 [°C] 19.642524 [°C]  6.22867986 [°C]   1.4643471 [1]
#>  6: 106.4062 35.99986 89.25595 16.8897222 [°C] 22.220674 [°C]  5.33095225 [°C]   1.3156329 [1]
#>  7: 106.4062 35.99986 89.25595 19.6094086 [°C] 26.158875 [°C]  6.54946686 [°C]   1.3339961 [1]
#>  8: 106.4062 35.99986 89.25595 18.2104839 [°C] 24.773162 [°C]  6.56267795 [°C]   1.3603791 [1]
#>  9: 106.4062 35.99986 89.25595 13.3270833 [°C] 21.648293 [°C]  8.32121006 [°C]   1.6243834 [1]
#> 10: 106.4062 35.99986 89.25595  7.0000000 [°C] 13.851030 [°C]  6.85103005 [°C]   1.9787186 [1]
#> 11: 106.4062 35.99986 89.25595 -0.2101389 [°C]  4.192878 [°C]  4.40301659 [°C] -19.9528880 [1]
#> 12: 106.4062 35.99986 89.25595 -6.1481183 [°C] -2.254458 [°C]  3.89366038 [°C]   0.3666907 [1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.