Git Product home page Git Product logo

knnstdisagg's People

Contributors

rabutler-usbr avatar

knnstdisagg's Issues

add plot method

this will create diagnostic plots for the disaggregation

  • plot historical data as points
  • plot options:
    • color of historical point
    • units
    • y-units
  • months as month.abb
  • title is sitename
  • ...
  • monthly pdf
  • annual pdf
  • add in other annual stats
  • create different plots, use which. see plot.lm
  • Add in show argument, so that if you don't want to enter through every site, you do not have to.
    • interactive mode
    • how to handle if not interactive
  • return knnstplot object
  • tests
  • save_knnst_plot() see #14
  • plot all/select sites see #15
  • annual stats can be computed on CY or WY see #16

plot all/select sites

currently plot.knnst takes in an argument for which site to plot and it can only be one site.

Should this be updated to be able to plot multiple sites?

improve knn_space_time_disagg() tests and errors

  • need a test where there are no sf_sites, that works, to ensure that no warnings post anymore
    • handled in #24
  • should error if index_years and k_weights are specified by user
    • fixed so message posts that k_weights is ignored if index_years is specified.
  • should check that multiple simulations work; also need to check that multiple
    simulations when specifying index_years works
    • handled already
  • add test that checks that the monthly data for all gages sums to the annual data for the flow to disaggregate, but must check the appropriate gage.
    • test-space_time_disagg_simple.R included this for the most part. Added in 1 more test to make sure the data = the input data
  • add test where flows sum to index gage. The results should just be the input data if the input are volumes that exist in the index data when using k=1.
  • check that the ofolder exists and return better error if it doesn't
  • still need to make function much safer to the format of incoming data, i.e., which input need years associated with them, and which don't, matrices, vs. vectors, etc. #21
  • future enhancement: use rownames for mon_flow so that we can check that it contains the same years of data as the ann_index_flow. Also consider putting the years in the rownames of that variable, and (x), for consistency #11
  • should check that the mon_flow is either specified on water year vs. cy, or somehow check that #11
    • not sure if I can; might be up to the user
  • Should also consider round to nearest AF, but what are the effects of that on
    matching the inut Lees Ferry value
    #20

scale_sites = TRUE does not work

flow_mat <- cbind(c(2000, 2001, 2002), c(1400, 1567, 1325))
# made up historical data to use as index years
ind_flow <- cbind(1901:1980, rnorm(80, mean = 1500, sd = 300))
# make up monthly flow for two sites
mon_flow <- cbind(
  rnorm(80 * 12, mean = 20, sd = 5),
  rnorm(80 * 12, mean = 120, sd = 45)
)
d1 <- knn_space_time_disagg(
  flow_mat, 
  ind_flow, 
  mon_flow, 
  start_month = 1, 
  scale_sites = TRUE,
  k_weights = knn_params(1, 1)
)

results in:

Sites 2
will be selected directly from the index years, i.e., not scaled.

It should scale all sites.

Need to add this as a test.

export write_knn_disagg()

Once the knn class (#1) is complete, then go ahead and export the write function so that if you call the s-t-disagg with write == False, then you can call the write portion independently.

create knn class

  • try and ensure it doesn't conflict with other knn package classes
  • this is the output of the disagg function

Ensure consistent input data

still need to make function much safer to the format of incoming data, i.e., which input need years associated with them, and which don't, matrices, vs. vectors, etc.

should we require xts objects?

consider using rownames for mon_flow so that we can check that it contains the same years of data as the ann_index_flow. Also consider putting the years in the rownames of that variable, and (x), for consistency

temporal correlation plots

See #29. Do something similar but with temporal correlation.

  • knnst_temporal_cor() - returns knnst_tmpcor object
  • plot it

create knn_params class

in knn_get_index_year(), the k_weights argument needs to be a list with a lot of special characteristics

So, make it require a k_params class. This should simplify a lot of the checks we have built in.

  • create a default k_params class that can be used as the default to knn_get_index_year()
  • ensure all code is covered by tests
  • enhance examples in knn_params() and knn_params_default()
  • verify class is properly documented

Verify lag-1 statistics

Something appears to be weird at many locations with:

  1. the January historical values
  2. the October simulated values

See woodhouse Cameo, Greendale, Hoover

set start month in knn_space_time_disagg()

for the rownames of the output, we currently assume that the data are always starting in January.

need a way to specify this.

could:
1 - require that the mon_flow have rownames or a columns with some monthly info
2 - add another argument that just states the start month


plotting

  • add start_month to knnst objects
  • nothing changes on monthly stats, just need to update get_pattern_flow_data_df() so that it uses start_month and does not assume it starts in January
  • new function to compute "agg_year" based on start_month. Idea being that we assume that if data start in a certain month, that is the beginning of the "agg_year", e.g., water year or fiscal year.
  • update get_ann_plot_stats() to sum by agg_year and not by year.
  • update plotting order of monthly boxplots to also be based on start_month
  • new test - plot data with 2 different start months (use n = 1), get the annual data out of the ggplot object, it should be the same

Round flows to nearest int?

Consider round to nearest AF (or whatever input units are).

But what are the effects of that on matching the input value. Might be better to do this in code that creates the CRSS input files.

spatial correlation plot

add spatial correlation plot

  • knnst_sp_cor() - returns knnst_spcor object
  • plot.knnst_spcor()
  • finish tests. particularly using this with unnamed original input.

check test with tolerance specified

In test_space_time_disagg, check line 46 that checks "disagg matches previous code's results". Make sure they are actually equal, since tolerance is specified.

Unify plotting for all diagnostics

plot.knnst() returns a knnstplot object that has custom functions to print/save.
plot.knnst_spcor() returns a ggplot object.

Additionally, plot.knnst() takes in the knnst results, computes a bunch of stats, and then produces plots. However, the spatial/temporal correlation follow a different flow where first you all a function to compute the stats, e.g., knnst_sp_cor(), and then plot the results of the output of that function.

Should we unify how we handle plotting throughout the package? Both in terms of the output object, and the workflow.

Thoughts:

  • plot.knnst_spcor() could return same class as plot.knnst()?. That could be good, then could just edit print() and save() functions to handle more plots.
    • Would c() work on the two outputs to create one big output. How would we handle multiple sites?
  • unify the way the data are converted from a single time series to multiple windows/bins. Some functions filter on year, some on rows, some create a unique matrix...
  • after the above, maybe add a convenience wrapper that gets all data for a single site; that could save some overhead, since each site is wouldn't have to be converted to multiple windows multiple times

see #29 and #30

README is pasting strings together weirdly

Ex:

Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling: knn_space_time_disagg(ann_flow = leesb, ann_index_flow = annual_index, 
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling:     mon_flow = monthly_data, start_month = 10, nsim = 1, scale_sites = 1:20, 
Associated index_years.csv and disagg_[n].csv files in this folder were
generated/disaggregated using the knnstdisagg package.
========================================================================
user: RAButler
date: 2020-06-15 10:31:29
version: knnstdisagg v0.2.0
created by calling:     k_weights = knn_params_default(nrow(annual_index)), random_seed = seed)

Name sites if unnamed

Currently as.data.frame.knnst() names sites if they are not named S1, S2, ...

However the original data and the disaggregated data do not have names on their matrices. This is inconsistent and introduces difficulties in accessing/plotting some of the data.

So - if the input monthly data are not named, post message that they are going to be named S1, S2, ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.