The knnstdisagg from rabutler-usbr

Add in other time scale disaggregation

annual to daily
monthly to daily

error in default weights for n = 40

knn_params_default(40) causes a "weights should sum to 1" error.

The "cdfs" area actually "pdfs".

Need to rename documentation, labels, etc.

add plot method

this will create diagnostic plots for the disaggregation

plot all/select sites

currently plot.knnst takes in an argument for which site to plot and it can only be one site.

Should this be updated to be able to plot multiple sites?

improve knn_space_time_disagg() tests and errors

scale_sites = TRUE does not work

flow_mat <- cbind(c(2000, 2001, 2002), c(1400, 1567, 1325))
# made up historical data to use as index years
ind_flow <- cbind(1901:1980, rnorm(80, mean = 1500, sd = 300))
# make up monthly flow for two sites
mon_flow <- cbind(
  rnorm(80 * 12, mean = 20, sd = 5),
  rnorm(80 * 12, mean = 120, sd = 45)
)
d1 <- knn_space_time_disagg(
  flow_mat, 
  ind_flow, 
  mon_flow, 
  start_month = 1, 
  scale_sites = TRUE,
  k_weights = knn_params(1, 1)
)

results in:

Sites 2
will be selected directly from the index years, i.e., not scaled.

It should scale all sites.

Need to add this as a test.

export write_knn_disagg()

Once the knn class (#1) is complete, then go ahead and export the write function so that if you call the s-t-disagg with write == False, then you can call the write portion independently.

add print method for `knn_params`

Diagnostic plots/stats should be computed over specified window

Existing annual and monthly statistics should be computed based on a user specified window size. See plot.knnst_spcor().

Remove ofolder from knn_space_time_disagg()

Instead, just rely on user to use write_knnst(); see #7

create knn class

try and ensure it doesn't conflict with other knn package classes
this is the output of the disagg function

Update save_knnstplot() to save multiple plots to same file

Can we replace the first argument with ... and then put all the knnstplot objects into a single file?

Ensure consistent input data

still need to make function much safer to the format of incoming data, i.e., which input need years associated with them, and which don't, matrices, vs. vectors, etc.

should we require xts objects?

consider using rownames for mon_flow so that we can check that it contains the same years of data as the ann_index_flow. Also consider putting the years in the rownames of that variable, and (x), for consistency

temporal correlation plots

See #29. Do something similar but with temporal correlation.

knnst_temporal_cor() - returns knnst_tmpcor object
plot it

create knn_params class

in knn_get_index_year(), the k_weights argument needs to be a list with a lot of special characteristics

So, make it require a k_params class. This should simplify a lot of the checks we have built in.

create a default k_params class that can be used as the default to knn_get_index_year()
ensure all code is covered by tests
enhance examples in knn_params() and knn_params_default()
verify class is properly documented

Verify lag-1 statistics

Something appears to be weird at many locations with:

the January historical values
the October simulated values

See woodhouse Cameo, Greendale, Hoover

as.data.frame.knnst()

set start month in knn_space_time_disagg()

for the rownames of the output, we currently assume that the data are always starting in January.

need a way to specify this.

could:
1 - require that the mon_flow have rownames or a columns with some monthly info
2 - add another argument that just states the start month

plotting

add start_month to knnst objects
nothing changes on monthly stats, just need to update get_pattern_flow_data_df() so that it uses start_month and does not assume it starts in January
new function to compute "agg_year" based on start_month. Idea being that we assume that if data start in a certain month, that is the beginning of the "agg_year", e.g., water year or fiscal year.
update get_ann_plot_stats() to sum by agg_year and not by year.
update plotting order of monthly boxplots to also be based on start_month
new test - plot data with 2 different start months (use n = 1), get the annual data out of the ggplot object, it should be the same

Round flows to nearest int?

Consider round to nearest AF (or whatever input units are).

But what are the effects of that on matching the input value. Might be better to do this in code that creates the CRSS input files.

spatial correlation plot

add spatial correlation plot

knnst_sp_cor() - returns knnst_spcor object
plot.knnst_spcor()
finish tests. particularly using this with unnamed original input.

Update cor code to work for multiple simulations

Currently knnst_spatial_cor() and knnst_temporal_cor() fail if there are more than 1 simulations.

(They are intended to fail.)

Need to fix this.

check test with tolerance specified

In test_space_time_disagg, check line 46 that checks "disagg matches previous code's results". Make sure they are actually equal, since tolerance is specified.

change sf_sites parameter in knn_space_time_disagg()

change it to scale_sites

Vignette or extensive README

Use this to demo/recreate some of the examples in the paper.

need to add in random_seed parameter

need to add in random_seed parameter to knn_space_time_disagg() and knn_get_index_year()

Unify plotting for all diagnostics

plot.knnst() returns a knnstplot object that has custom functions to print/save.
plot.knnst_spcor() returns a ggplot object.

Additionally, plot.knnst() takes in the knnst results, computes a bunch of stats, and then produces plots. However, the spatial/temporal correlation follow a different flow where first you all a function to compute the stats, e.g., knnst_sp_cor(), and then plot the results of the output of that function.

Should we unify how we handle plotting throughout the package? Both in terms of the output object, and the workflow.

Thoughts:

plot.knnst_spcor() could return same class as plot.knnst()?. That could be good, then could just edit print() and save() functions to handle more plots.
- Would c() work on the two outputs to create one big output. How would we handle multiple sites?
unify the way the data are converted from a single time series to multiple windows/bins. Some functions filter on year, some on rows, some create a unique matrix...
after the above, maybe add a convenience wrapper that gets all data for a single site; that could save some overhead, since each site is wouldn't have to be converted to multiple windows multiple times

see #29 and #30

How are we treating WY vs CY

Is this clear in inputs and in plotting?

should plotting have an option to compute annual stats on CY vs WY?

function to access the monthly knnst data

knnst_get_disagg_data() or knnst_get_monthly()

Set number of bins as package option

In plot.knnst(), n is the number of bins to use for the boxplots.

set n as a package option

README is pasting strings together weirdly

Ex:

Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling: knn_space_time_disagg(ann_flow = leesb, ann_index_flow = annual_index, 
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling:     mon_flow = monthly_data, start_month = 10, nsim = 1, scale_sites = 1:20, 
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling:     k_weights = knn_params_default(nrow(annual_index)), random_seed = seed)

So - if the input monthly data are not named, post message that they are going to be named S1, S2, ...

rabutler-usbr / knnstdisagg Goto Github PK

knnstdisagg's People

Contributors

Forkers

knnstdisagg's Issues

Recommend Projects

Recommend Topics

Recommend Org