ideas-lab-nus / epwshiftr Goto Github PK
View Code? Open in Web Editor NEWCreate future EnergyPlus Weather files using CMIP6 data
Home Page: https://ideas-lab-nus.github.io/epwshiftr/
License: Other
Create future EnergyPlus Weather files using CMIP6 data
Home Page: https://ideas-lab-nus.github.io/epwshiftr/
License: Other
The code coverage dropped down to ~63%. This is because that morphing related tests were not run as the .fst
file from test results of extract_data()
were not ready
ESGF RESTful api has the capability to generate wget scripts to download files. See: Download data from ESGF using wget.
When execute a search that targets specific data nodes, the shards
parameter should be used instead of data_node
. Ref: Shard Queries
I found the the description of the data.table output in the help differs from what I get, when I run the two functions below. Please update.
extract_data()
morphing_epw()
Hi
The init_cmip6_index(..., source=NULL)
usage seems broken:
packageDescription("epwshiftr")[c("Version", "Built")]
# 0.1.4
# R 4.1.2; ; 2024-04-12 10:18:46 UTC; unix
dt <- epwshiftr::init_cmip6_index(activity="CMIP", experiment="historical", variable="tas", frequency="mon") # ok
dt <- epwshiftr::init_cmip6_index(activity="CMIP", experiment="historical", variable="tas", frequency="mon", source=NULL) # error
Error in esgf_query(activity = unique(q$activity_drs), variable = unique(q$variable_id), :
Assertion on 'activity' failed: Must be a subset of {'AerChemMIP','C4MIP','CDRMIP','CFMIP','CMIP','CORDEX','DAMIP','DCPP','DynVarMIP','FAFMIP','GMMIP','GeoMIP','HighResMIP','ISMIP6','LS3MIP','LUMIP','OMIP','PAMIP','PMIP','RFMIP','SIMIP','ScenarioMIP','VIACSAB','VolMIP'}, but has additional elements {'E3SM-Project','CAS'}.
Cheers,
Chris
Currently, most of the functions are tested locally, as they all need NetCDF files which can be of quite a large size. We can download a small amount of NetCDF files and use GitHub cache to reuse them between workflows.
Error using summary_database()
Hi, I am trying to use the tool and receiving this warning message while using the summary_database()
. I have copied the file in the same directory as the cmip6_index.csv
. However, I am getting the following warning message.
In addition: Warning message:
Case(s) shown below does not matche any NetCDF file in the database. Please make sure all needed NetCDF files listed in the file index have been downloaded and placed in the database.
#1 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20340101-20341231.nc':
#2 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20350101-20351231.nc':
#3 | For case 'CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp245.r1i1p1f1.day.tas.gn.v20190529.tas_day_AWI-CM-1-1-MR_ssp245_r1i1p1f1_gn_20360101-20361231.nc':
This leads to the following error in the subsequent steps:
coord$coord[, .(file_path, coord)]
Empty data.table (0 rows and 2 cols): file_path,coord
> str(coord$coord$coord[[1]])
Error in coord$coord$coord[[1]] : subscript out of bounds
The name of a sample .nc file :
tas_day_AWI-CM-1-1-MR_ssp126_r1i1p1f1_gn_20350101-20351231
library(epwshiftr)
esgf_query(variable = "tas", resolution = NULL)
#> No matched data. Please check network connection and the availability of LLNL ESGF node.
#> Null data.table (0 rows and 0 cols)
Created on 2021-02-01 by the reprex package (v0.3.0)
esgf_query()
can only send a query with very strict constrains. Right now, it is impossible to build temporal coverage queries. Better to build a step-by-step query builder like httr2
Should use aligned
environment instead of align
! LaTeX Error: Environment align undefined.
See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...
l.3884 }{}
Hi! Thanks for this tool, it is quite cool.
I have been testing this tool on some southern-hemisphere climates, and it seems to be that, by default, the weather generated starts in Winter (i.e., Northern Hemisphere, where 1st of Jan is winter). I way this because, when plotting the dry bulb temperatures, I get the following:
Whereas, when I reorganize as follows:
tic = 142*24 # June 21
tac = 266*24 # Sept 23
late_summer = list(range(0,tic))
winter = list(range(tic,tac))
early_summer = list(range(tac,8760))
nhemisphere_temp = epw.data['dry_bulb_temperature']
late_summer_data = list(nhemisphere_temp[late_summer])
winter_data = list(nhemisphere_temp[winter])
early_summer_data = list(nhemisphere_temp[early_summer])
transformed = early_summer_data + late_summer_data + winter_data
epw.data['dry_bulb_temperature'] = transformed
Then I get the following:
Is this a bug? Is there an option for this? Can it be triggered based on the latitude of the EPW?
Best!
match_coord()
the code how to download an epw file. Maybe just put the same in the get started section.OPeNAP makes it possible to directly subset the GCM data directly in the data nodes. This approach can avoid downloading GBs of NetCDF files locally. See DAP2 and DAP4 Protocol Services
Hello! I've been having issues downloading models. For some reason, it get stuck in most of the files that I've tried to pull. See some examples using this code for the variable o2.
Thanks for any help!
idx <- init_cmip6_index(
# only consider ScenarioMIP activity
activity = "ScenarioMIP",
# specify dry-bulb temperature and relative humidity
variable = "o2",
# specify report frequent
frequency = "mon",
# specify experiment name
experiment = c("ssp126", "ssp245", "ssp585"),
# specify GCM name
source = NULL,
# specify variant
variant = "r1i1p1f1",
# More options
replica = FALSE,
latest = TRUE,
resolution = NULL,
data_node = NULL,
# specify years of interest
years = c(seq(2022, 2100, 1)),
# save to data dictionary
save = TRUE,
)
esm <- idx$file_url[1]
download.file(url = esm[1],
destfile = paste0("inputs/o2/", basename(esm[1])),
cacheOK = TRUE,
extra = "--random-wait --retry-on-http-error=503",
mode = "wb")
Currently, future_epw()
directly returns the created Epw
objects for future climate. I always find I have to do manual steps to process each generated EPW file name using regex to get an idea of the scenario of each output. It would be useful to return a data.frame containing the information about how the data is split and aggregated based on the by
argument.
Currently, summary_database()
only lists files with .nc
extensions. There are some GCMs that output files in HDF5 format.
Cannot build index file for monthy data. There is conflict as "mon" becomes "Amon" after the first esfg query.
library(epwshiftr)
options(epwshiftr.dir = "tmp")
options(epwshiftr.verbose = TRUE)
# get CMIP6 data nodes
nodes <- get_data_node()
idx <- init_cmip6_index(
activity = "ScenarioMIP",
variable = "tas",
frequency = "mon",
source = c("EC-Earth3"),
experiment = c("ssp126"),
data_node = nodes[status == "UP", data_node],
years = c(2050, 2080)
)
Here is the error I receive:
Error in esgf_query(activity = unique(q$activity_drs), variable = unique(q$variable_id), :
Assertion on 'frequency' failed: Must be a subset of {'1hr','1hrCM','1hrPt','3hr','3hrPt','6hr','6hrPt','day','dec','fx','mon','monC','monPt','subhrPt','yr','yrPt'}, but is {'Amon'}.
init_cmip6_index()
first sends a query for Dataset
and then uses the same input for File
, and finally merges these two results together. It should directly use the dataset_id
from the Dataset
query when fetching output file information.
dataset_id
could not be used as the unique identifier of the dataset. It is specific to data node. This did not cause any problems for esgf_query()
, but did result in duplicated entries in the results of init_cmip6_index()
when replica
is set to TRUE
. Should use dataset_pid
as the unique dataset identifier when building index.
q <- epwshiftr::esgf_query(
activity = "ScenarioMIP",
variable = "tas",
frequency = "day",
experiment = "ssp585",
source = "AWI-CM-1-1-MR",
variant = "r1i1p1f1",
replica = TRUE,
latest = TRUE,
resolution = "100 km",
limit = 10000L,
data_node = NULL
)
q[, .(dataset_id, dataset_pid)]
#> dataset_id
#> 1: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data1.llnl.gov
#> 2: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data3.diasjp.net
#> 3: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.ceda.ac.uk
#> 4: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.nci.org.au
#> dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 2: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 3: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 4: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
unique(q[, -c("dataset_id", "data_node")])
#> mip_era activity_drs institution_id source_id experiment_id member_id
#> 1: CMIP6 ScenarioMIP AWI AWI-CM-1-1-MR ssp585 r1i1p1f1
#> table_id frequency grid_label version nominal_resolution variable_id
#> 1: day day gn 20190529 100 km tas
#> variable_long_name variable_units
#> 1: Near-Surface Air Temperature K
#> dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
Created on 2022-09-19 with reprex v2.0.2
I'm trying to download daily data from the MIROC6 model. I've checked that the result in question is available in the ESGF system. Looking at the idx
response attributes, the shards
looks funny.
Is there a limit as to which model we can search?
library(epwshiftr)
idx <- init_cmip6_index(
# only consider ScenarioMIP activity
activity = "ScenarioMIP",
# specify variables
variable = c("pr"),
# specify report frequent
frequency = "day",
# specify experiment name
experiment = c("ssp245"),
# specify GCM name
source = c("MIROC6"),
# specify variant,
variant = "r1i1p1f1"
)
#> No matched data. Please examine the actual response using 'attr(x, "response")'.
# This query should return one result
# looking at the response attribute
attr(idx, 'response')
#> $responseHeader
#> $responseHeader$status
#> [1] 0
#>
#> $responseHeader$QTime
#> [1] 28
#>
#> $responseHeader$params
#> $responseHeader$params$df
#> [1] "text"
#>
#> $responseHeader$params$q.alt
#> [1] "*:*"
#>
#> $responseHeader$params$indent
#> [1] "true"
#>
#> $responseHeader$params$echoParams
#> [1] "all"
#>
#> $responseHeader$params$fl
#> [1] "*,score"
#>
#> $responseHeader$params$start
#> [1] "0"
#>
#> $responseHeader$params$fq
#> $responseHeader$params$fq[[1]]
#> [1] "type:Dataset"
#>
#> $responseHeader$params$fq[[2]]
#> [1] "project:\"CMIP6\""
#>
#> $responseHeader$params$fq[[3]]
#> [1] "activity_id:\"ScenarioMIP\""
#>
#> $responseHeader$params$fq[[4]]
#> [1] "experiment_id:\"ssp245\""
#>
#> $responseHeader$params$fq[[5]]
#> [1] "source_id:\"MIROC6\""
#>
#> $responseHeader$params$fq[[6]]
#> [1] "variable_id:\"pr\""
#>
#> $responseHeader$params$fq[[7]]
#> [1] "nominal_resolution:\"100km\" || nominal_resolution:\"50km\" || nominal_resolution:\"100 km\" || nominal_resolution:\"50 km\""
#>
#> $responseHeader$params$fq[[8]]
#> [1] "variant_label:\"r1i1p1f1\""
#>
#> $responseHeader$params$fq[[9]]
#> [1] "frequency:\"day\""
#>
#> $responseHeader$params$fq[[10]]
#> [1] "replica:false"
#>
#> $responseHeader$params$fq[[11]]
#> [1] "latest:true"
#>
#>
#> $responseHeader$params$rows
#> [1] "10000"
#>
#> $responseHeader$params$q
#> [1] "*:*"
#>
#> $responseHeader$params$shards
#> [1] "localhost:8983/solr/datasets,localhost:8985/solr/datasets,localhost:8987/solr/datasets,localhost:8988/solr/datasets,localhost:8990/solr/datasets,localhost:8993/solr/datasets,localhost:8994/solr/datasets,localhost:8995/solr/datasets,localhost:8996/solr/datasets,localhost:8997/solr/datasets"
#>
#> $responseHeader$params$tie
#> [1] "0.01"
#>
#> $responseHeader$params$facet.limit
#> [1] "2048"
#>
#> $responseHeader$params$qf
#> [1] "text"
#>
#> $responseHeader$params$facet.method
#> [1] "fc"
#>
#> $responseHeader$params$facet.mincount
#> [1] "1"
#>
#> $responseHeader$params$wt
#> [1] "json"
#>
#> $responseHeader$params$facet.sort
#> [1] "lex"
#>
#>
#>
#> $response
#> $response$numFound
#> [1] 0
#>
#> $response$start
#> [1] 0
#>
#> $response$maxScore
#> [1] 0
#>
#> $response$docs
#> list()
Created on 2021-08-12 by the reprex package (v2.0.0)
This is the expected result
esgf_query()
uses a self-implemented URL encode approach, which is kind of a hack. It is better to take advantage of the utils::URLencode()
function instead.
options(epwshiftr.dir = here::here("data/cmip6"))
epwshiftr::load_cmip6_index()
#> Loading CMIP6 experiment output file index created at 2020-09-03 22:38:54.
#> Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, :
#> Incompatible join types: x.datetime_start (double) and i.V1 (character)
The term morphing is used in building simulations, but rarely in other fields. Can you update the documentation in morphing_epw()
and explain what you mean by stretching and shifting? Maybe even with some formulas? You can still keep the reference to the paper, but some basic info would be very useful for users.
Otherwise, you could also mention the terms bias adjustment and downscaling, since these two are more common in other disciplines.
Currently, there is no way to let the user specify in a flexible way where they want to save the CMIP6 output index file. This sometime become cumbersome to let {epwshiftr} work together with {targets}.
The LLNL ESGF Node has transferred to use the new Metagrid UI, which makes get_data_node()
fail to parse the data node status. Workrounds include:
Comments from Brian Ripley:
There is a variety of failures here, it seems both in contacting a
website and in the content of that site. We need to remind you of the
CRAN policy
'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available or has changed (and
not give a check warning nor error).'
so this needs correction whether or not the resource recovers.
The usage examples are often not run, and require auxiliary data that is not in the packages. Do you think it’s possible to add some data to the package, so the examples can be run? I know that putting raw GCM data in there is not possible, otherwise the package size will explode. But maybe crop some netcdf file to a little extent, and put some epw file, and then you can have real examples that can be run? What do you think?
Currently, init_cmip6_index()
only returns the first 10,000 records.
Lines 446 to 452 in 09827d4
It is enough for most use cases. But it will still be good to implement pagination.
According ESGF Search RESTful API, the default behavior is to returns all records (masters and replicas). Current implantation always specifies the replica
parameter, which means to send a query to either return only master records or only replicas.
Hello Jia!
I am here!!
I need an example of code as converse in Email earlier
Regards
ZZaman
Prepare for release:
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_dev_version()
Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Brief description of the problem
# insert reprex here
assert_multi_class()
was added in {checkmate} Version 1.9.0 (2019-01-09). Even this version was published 3 years ago, it is still possible that the user may have already installed an older version. Actually, there is already an error reported from a user that failed to load {epwshiftr} due to a lower version of {checkmate}
For instance, for tas_day_AWI-CM-1-1-MR_ssp585_r1i1p1f1_gn_20500101-20501231.nc
, the alpha value for November could be -19.95. Should introduce a logic to issue warnings about this case and fallback to Shift method.
data_mean[, .(lon, lat, dist, epw_mean, gcm_mean = value, delta, alpha)]
#> lon lat dist epw_mean gcm_mean delta alpha
#> <num> <num> <num> <units> <units> <units> <units>
#> 1: 106.4062 35.99986 89.25595 -7.5504032 [°C] -4.804854 [°C] 2.74554916 [°C] 0.6363705 [1]
#> 2: 106.4062 35.99986 89.25595 -3.9571429 [°C] -1.769299 [°C] 2.18784401 [°C] 0.4471152 [1]
#> 3: 106.4062 35.99986 89.25595 1.3489247 [°C] 3.581754 [°C] 2.23282969 [°C] 2.6552663 [1]
#> 4: 106.4062 35.99986 89.25595 8.2494444 [°C] 8.227525 [°C] -0.02191962 [°C] 0.9973429 [1]
#> 5: 106.4062 35.99986 89.25595 13.4138441 [°C] 19.642524 [°C] 6.22867986 [°C] 1.4643471 [1]
#> 6: 106.4062 35.99986 89.25595 16.8897222 [°C] 22.220674 [°C] 5.33095225 [°C] 1.3156329 [1]
#> 7: 106.4062 35.99986 89.25595 19.6094086 [°C] 26.158875 [°C] 6.54946686 [°C] 1.3339961 [1]
#> 8: 106.4062 35.99986 89.25595 18.2104839 [°C] 24.773162 [°C] 6.56267795 [°C] 1.3603791 [1]
#> 9: 106.4062 35.99986 89.25595 13.3270833 [°C] 21.648293 [°C] 8.32121006 [°C] 1.6243834 [1]
#> 10: 106.4062 35.99986 89.25595 7.0000000 [°C] 13.851030 [°C] 6.85103005 [°C] 1.9787186 [1]
#> 11: 106.4062 35.99986 89.25595 -0.2101389 [°C] 4.192878 [°C] 4.40301659 [°C] -19.9528880 [1]
#> 12: 106.4062 35.99986 89.25595 -6.1481183 [°C] -2.254458 [°C] 3.89366038 [°C] 0.3666907 [1]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.