rabutler-usbr / knnstdisagg Goto Github PK
View Code? Open in Web Editor NEWNonparametric space-time streamflow disaggregation using knn
Nonparametric space-time streamflow disaggregation using knn
knn_params_default(40)
causes a "weights should sum to 1" error.
Need to rename documentation, labels, etc.
this will create diagnostic plots for the disaggregation
...
which
. see plot.lm
knnstplot
objectsave_knnst_plot()
currently plot.knnst
takes in an argument for which site to plot and it can only be one site.
Should this be updated to be able to plot multiple sites?
save_knnstplot()
can save pdf or png
ofolder
exists and return better error if it doesn'tx
), for consistencyflow_mat <- cbind(c(2000, 2001, 2002), c(1400, 1567, 1325))
# made up historical data to use as index years
ind_flow <- cbind(1901:1980, rnorm(80, mean = 1500, sd = 300))
# make up monthly flow for two sites
mon_flow <- cbind(
rnorm(80 * 12, mean = 20, sd = 5),
rnorm(80 * 12, mean = 120, sd = 45)
)
d1 <- knn_space_time_disagg(
flow_mat,
ind_flow,
mon_flow,
start_month = 1,
scale_sites = TRUE,
k_weights = knn_params(1, 1)
)
results in:
Sites 2
will be selected directly from the index years, i.e., not scaled.
It should scale all sites.
Need to add this as a test.
Once the knn class (#1) is complete, then go ahead and export the write function so that if you call the s-t-disagg with write == False, then you can call the write portion independently.
Existing annual and monthly statistics should be computed based on a user specified window size. See plot.knnst_spcor()
.
Instead, just rely on user to use write_knnst()
; see #7
Can we replace the first argument with ...
and then put all the knnstplot objects into a single file?
still need to make function much safer to the format of incoming data, i.e., which input need years associated with them, and which don't, matrices, vs. vectors, etc.
should we require xts objects?
consider using rownames for mon_flow so that we can check that it contains the same years of data as the ann_index_flow. Also consider putting the years in the rownames of that variable, and (x), for consistency
See #29. Do something similar but with temporal correlation.
knnst_temporal_cor()
- returns knnst_tmpcor
objectin knn_get_index_year()
, the k_weights
argument needs to be a list with a lot of special characteristics
So, make it require a k_params
class. This should simplify a lot of the checks we have built in.
k_params
class that can be used as the default to knn_get_index_year()
knn_params()
and knn_params_default()
Something appears to be weird at many locations with:
See woodhouse Cameo, Greendale, Hoover
for the rownames of the output, we currently assume that the data are always starting in January.
need a way to specify this.
could:
1 - require that the mon_flow
have rownames or a columns with some monthly info
2 - add another argument that just states the start month
plotting
knnst
objectsget_pattern_flow_data_df()
so that it uses start_month
and does not assume it starts in Januaryget_ann_plot_stats()
to sum by agg_year and not by year.Consider round to nearest AF (or whatever input units are).
But what are the effects of that on matching the input value. Might be better to do this in code that creates the CRSS input files.
add spatial correlation plot
knnst_sp_cor()
- returns knnst_spcor
objectplot.knnst_spcor()
Currently knnst_spatial_cor()
and knnst_temporal_cor()
fail if there are more than 1 simulations.
(They are intended to fail.)
Need to fix this.
In test_space_time_disagg, check line 46 that checks "disagg matches previous code's results". Make sure they are actually equal, since tolerance is specified.
change it to scale_sites
Use this to demo/recreate some of the examples in the paper.
need to add in random_seed parameter to knn_space_time_disagg()
and knn_get_index_year()
plot.knnst()
returns a knnstplot
object that has custom functions to print/save.
plot.knnst_spcor()
returns a ggplot
object.
Additionally, plot.knnst()
takes in the knnst results, computes a bunch of stats, and then produces plots. However, the spatial/temporal correlation follow a different flow where first you all a function to compute the stats, e.g., knnst_sp_cor()
, and then plot the results of the output of that function.
Should we unify how we handle plotting throughout the package? Both in terms of the output object, and the workflow.
Thoughts:
plot.knnst_spcor()
could return same class as plot.knnst()?. That could be good, then could just edit print() and save() functions to handle more plots.
Is this clear in inputs and in plotting?
knnst_get_disagg_data()
or knnst_get_monthly()
In plot.knnst()
, n
is the number of bins to use for the boxplots.
set n as a package option
Ex:
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling: knn_space_time_disagg(ann_flow = leesb, ann_index_flow = annual_index,
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling: mon_flow = monthly_data, start_month = 10, nsim = 1, scale_sites = 1:20,
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling: k_weights = knn_params_default(nrow(annual_index)), random_seed = seed)
should be easy. uses the function that is in plot.knnst()
Annual stats shows "Site - Annual Statistics"
monthly stats
monthly cdfs
annual cdf
same idea as CRSSIO
Currently as.data.frame.knnst()
names sites if they are not named S1, S2, ...
However the original data and the disaggregated data do not have names on their matrices. This is inconsistent and introduces difficulties in accessing/plotting some of the data.
So - if the input monthly data are not named, post message that they are going to be named S1, S2, ...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.