nrel / sup3r Goto Github PK

The Super-Resolution for Renewable Resource Data (sup3r) software uses generative adversarial networks to create synthetic high-resolution wind and solar spatiotemporal data from coarse low-resolution inputs.

Home Page: https://nrel.github.io/sup3r/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

generative-adversarial-network machine-learning renewable-energy deep-learning climate-change climate-data solar-energy tensorflow wind-energy

sup3r's Introduction

Welcome to SUP3R!

The Super Resolution for Renewable Resource Data (sup3r) software uses generative adversarial networks to create synthetic high-resolution wind and solar spatiotemporal data from coarse low-resolution inputs. To get started, check out the sup3r command line interface (CLI) here.

Installing sup3r

NOTE: The installation instruction below assume that you have python installed on your machine and are using conda as your package/environment manager.

Option 1: Install from PIP (recommended for analysts):

Create a new environment: conda create --name sup3r python=3.9
Activate environment: conda activate sup3r
Install sup3r: pip install NREL-sup3r
Run this if you want to train models on GPUs: conda install -c anaconda tensorflow-gpu

4.1 For OSX use instead: python -m pip install tensorflow-metal

Option 2: Clone repo (recommended for developers)

from home dir, git clone [email protected]:NREL/sup3r.git
Create sup3r environment and install package
1. Create a conda env: conda create -n sup3r
2. Run the command: conda activate sup3r
3. cd into the repo cloned in 1.
4. Prior to running pip below, make sure the branch is correct (install from main!)
5. Install sup3r and its dependencies by running: pip install . (or pip install -e . if running a dev branch or working on the source code)
6. Run this if you want to train models on GPUs: conda install -c anaconda tensorflow-gpu On Eagle HPC, you will need to also run pip install protobuf==3.20.* and pip install chardet
7. Optional: Set up the pre-commit hooks with pip install pre-commit and pre-commit install

Recommended Citation

Update with current version and DOI:

Brandon Benton, Grant Buster, Andrew Glaws, Ryan King. Super Resolution for Renewable Resource Data (sup3r). https://github.com/NREL/sup3r (version v0.0.3), 2022. DOI: 10.5281/zenodo.6808547

Acknowledgments

This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the DOE Grid Deployment Office (GDO), the DOE Advanced Scientific Computing Research (ASCR) program, the DOE Solar Energy Technologies Office (SETO), the DOE Wind Energy Technologies Office (WETO), the United States Agency for International Development (USAID), and the Laboratory Directed Research and Development (LDRD) program at the National Renewable Energy Laboratory. The research was performed using computational resources sponsored by the Department of Energy's Office of Energy Efficiency and Renewable Energy and located at the National Renewable Energy Laboratory. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

sup3r's People

Contributors

Stargazers

Watchers

Forkers

ashesh2512 tanxuezhi ischicker alexcortiella castelao nicolasfauchereau cemaster016 m-penny05

sup3r's Issues

Distribute spatial chunks for forward passes across nodes?

Why this feature is necessary:
Memory limits. In theory the spatial grid of the forward pass output could be so large that not even 1 timestep can be held in memory on a single node. (And reducing the forward pass size to a single time step creates other issues with the overlapping motivation).

A possible solution is:
I think the easiest would be to include the write_output calls in the loops over the forward pass chunks and just write each chunk output to its own file, instead of combing chunks after the fp loops and then writing the combined array.

Urgency / Timeframe
This probably wont be needed for quite a while. I am currently able to reduce fwp_chunk_size[2] to run the pipeline on the hourly 4km new wtk data but if we were doing much more than a 2x spatial enhancement on the full grid this would be a problem.

Validation Split

The SpatialBatchHandler needs to accommodate a validation split. Usually this is a kwarg that says val_split=0.2 (20% of all available data). The SpatialBatchHandler will segregate that percent of the data into a property like SpatialBatchHandler.val_data and will not deliver any of that data in the training batches.

Unrealistic clearsky ratio scenes at night

Seeing weird clearsky ratio scenes at night in timesteps 12-14 where left is clearsky ratio, right is ghi. Timesteps with all(ghi < 10) are being caught at dawn/dusk and making really weird clearsky ratio scenes at night. Would be a good experiment to set timesteps where all(cs_ghi < 10) = np.nan to try and get rid of these weird cs ratio scenes.

Code to reproduce based on tests/batch_handlers/test_bh_h5_cc.py

"""pytests for H5 climate change data batch handlers"""

import os
import matplotlib.pyplot as plt
import numpy as np
import pytest

from sup3r.preprocessing import (
    BatchHandlerCC,
    DataHandlerH5SolarCC,
    DataHandlerH5WindCC,
)
from sup3r.preprocessing.utilities import numpy_if_tensor
from sup3r.utilities.pytest.helpers import BatchHandlerTesterCC

SHAPE = (20, 20)
FEATURES_S = ['clearsky_ratio', 'ghi', 'clearsky_ghi']
TARGET_S = (39.01, -105.13)
FEATURES_W = ['u_100m', 'v_100m', 'temperature_100m']
TARGET_W = (39.01, -105.15)
TARGET_SURF = (39.1, -105.4)

dh_kwargs = {
    'target': TARGET_S,
    'shape': SHAPE,
    'time_slice': slice(None, None, 2),
    # 'time_roll': -7,
}
from sup3r import CONFIG_DIR, TEST_DATA_DIR
FP_NSRDB = os.path.join(TEST_DATA_DIR, 'test_nsrdb_co_2018.h5')

test = np.asarray(handler.data.hourly[:, :, 816:888])

for idt in range(36):
    fig, ax = plt.subplots(1, 2, figsize=(15, 5))
    for idf in range(test.shape[-1]):
        a = ax[idf].imshow(test[:, :, idt, idf])
        plt.colorbar(a, ax=ax[idf])
    plt.title(idt)
    plt.show()
    plt.close()

sup3r status file updates use command name not unique key name

Bug Description
Current setup doesn't accommodate pipelines with duplicate commands. For example:

pipeline = [
   {'step1': 'step1.json',
    'command': 'forward-pass'},
   {'step2': 'step2.json',
    'command': 'forward-pass'}]

This pipeline wont continue to run after step1 because sup3r currently updates status files using the command name ('forward-pass') whereas using the key name ('step1'/'step2') would keep the pipeline running.

After step1 is done the status file will be similar to the following

{'forward-pass': {'successful': true, ...},
 'step1': {},
 'step2': {}
}

and rerunning will just report that forward-pass is done and stop.

Solution

There are a few places in sup3r code where status files are written/updated with pipeline_step=module_name, where module_name is the command name. Instead the key identifier ('step1'/'step2' in the example above) should be passed through.

Get low res and high res pairs from different datasets (like ERA and WTK)

Why this feature is necessary:
Training on ERA + WTK instead of coarsened WTK + WTK might improve results.

A possible solution is:
We will need to work out how to efficiently match low res and high res grids. This could be a pain in the case of ERA (a regular grid) and WTK (lambert conformal). Once the grid matching is worked out then matching high res and low res samples could be done with something like slice(high_res_start, high_res_end, t_enhance/s_enhance). Low res and high res datasets could be integrated into the batch handler by using separate data handlers for each and then refactoring the Batch class with the Batch.get_coarse_batch method. A regridding between the two data handlers could be precomputed.

Random spatial sampling

Need the ability to take each batch from a different spatial location. One option is to use the current framework with a much larger spatial raster like 1e4x1e4, load all of that data into memory, and then sample batches from a random starting target.

Questions:

How large of a spatial "source" raster can we fit into memory with ~6 years? We should probably enable interval sampling on the timeseries data to allow for a larger dataset in mem.
How do we decide how many observations are in an epoch then? Kind of arbitrary since there will be a bajillion possible starting targets.

Enable pressure level extrapolation.

sup3r/sup3r/utilities/utilities.py

Line 990 in 24bfbad

out_array[:, i, :] = np.array([np.interp(levels, h, var)

We are using np.interp here but this doesn't allow extrapolation below a provided height. Should replace with scipy.interpolate method.

"Correct" ordering of wrf latitude

Why this feature is necessary:
Wrf data uses an increasing ordering of latitude, while wtk data uses a decreasing latitude. The makes (0, 0) the index of the lower left corner for wrf data while (-1, 0) is the lower left corner for wtk data. It might not be necessary but it would be nice to have wrf data arranged with the same ordering.

A possible solution is:
We could setup DataHandlerNC.extract_feature to return data after doing data = data[::-1]. This would need to be accounted for when calculating the raster though, and would make direct extraction from xarray.open_dataset disagree with extraction from DataHandlerNC.

Urgency / Timeframe
Definitely not urgent, and might not even be necessary. We can always just do [::-1] after extraction.

Initial sup3r data pipeline

Rough steps:

Check out the WTK data on Eagle:
a. /datasets/WIND/conus/v1.0.0/wtk_conus_2007.h5 (hourly data)
b. /datasets/WIND/conus/v1.0.0/2007/wtk_conus_2007_2m.h5 (5min data)
Start with rex to get the indices associated with a spatial raster:
a. https://github.com/NREL/rex/blob/4edb1cd42ef13f3d93029e8a6c280bd50ae801e0/rex/resource_extraction/resource_extraction.py#L1254
Check out the super training API:
a.

sup3r/sup3r/sup3r.py

Line 352 in a6b95ed

def train(self, x, y, n_batch=None, batch_size=128, n_epoch=100,
Make a data pipeline/handler that delivers the high res data along with coarsened low res data to super (average spatial and sample temporal)

Start with ~40k hourly timesteps with a 1000km x 1000km raster. WTK is about 2km so for the fine res dataset this would be a 500x500 raster over 40k hours (multiple years or multiple spatial locations). ~50x spatial enhancement so coarse will be averaged down to a 10x10 raster.

Considerations:

Windspeed and direction must be -> u and v
Direction is cardinal (north) but WRF u and v are grid-orthogonal
a. https://github.com/NREL/wtk/blob/13991a4dc57a2d06eac932be2615627893ad5769/wtk/hrrr.py#L246
We should start with just u and v as “channels” but definitely code with the anticipation of new channels (more hub heights, topography, temp, pressure)
Big mem constraints – consider batching training batches using py generator and not duplicating all of the data
a. https://github.com/NREL/phygnn/blob/2028e6cae5c5bf1610e858cf656d90d68496cba1/phygnn/base.py#L367
Consider that we’re going to make “base” models using the WTK h5 source datasets, but will be transfer learning on native 2D WRF NetCDF files
Use float32
Do spatial only for now, consider temporal dimension
Multi year data handler:
a. https://github.com/NREL/rex/blob/4edb1cd42ef13f3d93029e8a6c280bd50ae801e0/rex/multi_year_resource.py#L286

Charge code:
WFED 11556 03.01.03

Timeline goal:
Have a GAN trained using the new sup3r infrastructure and data pipeline by Feb 7th

Caching Raster Index

The algorithm to retrieve the raster index from the flattened h5 WTK data can be a bit cumbersome for large extents due to the weird "bend" in the projection. It would be nice if we could provide a filepath as an optional kwarg to the DataHandler object that would load a cached raster extent instead of parsing a raster extent every time. The np.savetxt() method would be most flexible. Binary and pkl file caches tend to have issues with different python and np versions.

Timeseries sampling interval

For spatial GAN training we should be able to sample the source data timeseries at a given interval to reduce the amount of requirement memory. There's more value in training data across years vs. hour to hour.

Multi extent data handling

Currently the GAN data handler accepts multiple source files but one starting coordinate. It would actually make a lot more sense if DataHandler accepted just a single data file and a single target, and then the SpatialBatchHandler accepted one or more DataHandler objects. That way, you could load multiple DataHandler objects across multiple years/spatial extents and then the SpatialBatchHandler would iterate through them and encapsulate the heterogeneity.

A caveat: we would want the SpatialBatchHandler to sample batches at random from the multiple sources - not to sample all batches from DataHandlerA and then from DataHandlerB. If we dont randomly sample across the multiple DataHandler objects, the gradient descent will be skewed towards the first DataHandler we're working on. It would be nice if one batch sampled observations from multiple DataHandler objects too. I'm not sure how you do this in the sampling logic because you don't necessarily want to stack all the data from the DataHandlers, but I'm sure you'll figure it out.

Store record of model configs

Store record of model configs when saving model

Feature Data Normalization

The SpatialBatchHandler should (by default, norm=True) return normalized data and have an option (norm=False) to return data in the physical units. Typically we scale to a mean value of 0 and stdev of 1 for each feature. We will need to calculate the mean+stdev of each feature across all DataHandler objects and then save this data in a property so that the GAN has access to it.

We also need to be able to handle data with pre-defined mean/stdev values when we're making predictions using a trained model with pre-determined mean/stdev values. I'm not sure whats the best way to go about this. Maybe have DataHandler accept a mean/stdev value to normalize but then when being used by the SpatialBatchHandler with multiple DataHandler objects we only do the normalization on the SpatialBatchHandler level? I'm not sure how we'll handle data during production data generation, happy to brainstorm.

Artifacts from lat/lon interpolation across -180/180

Bug Description
When the spatial domain extends across -180/180 longitude plots show an artifact at this boundary. The lat/lon grid could be incorrectly mapped to the fields or it could be that the RBFInterpolate call doesn't work as well across the -180/180 boundary (RBFInterpolate uses n_neighbors and probably doesn't know to use neighbors across the -180/180 boundary, even though they are "close"). It's also possible that the output array isn't getting filled at this boundary for some reason.

Code

plt.scatter(lons.flatten(), lats.flatten(), c=windspeed.flatten())
plt.contourf(lons.flatten(), lats.flatten(), windspeed.flatten())
plt.imshow(windspeed)

All of the above show the artifact.

Length of SpatialBatchHandler

SpatialBatchHandler should have a __len__ method that says how many batches it contains

While you're at it, maybe add a DataHandler.shape property for the full data shape.

Temporal slicing in Validation data

Bug Description
When using temporal slicing (None, None, x) where x>1, some tests fail. It seems to be due to tupleIndex in ValidationData

Full Traceback

Traceback (most recent call last):
  File "test_train_gan_tslice.py", line 69, in <module>
    test_train_st_weight_update(log=True, n_epoch=1, temporal_slice=(None, None, 3))
  File "test_train_gan_tslice.py", line 63, in test_train_st_weight_update
    adaptive_update_fraction=0.05)
  File "/Users/mhassana/Desktop/GitHub/sup3r_nov24_issue/sup3r/models/base.py", line 1140, in train
    loss_details)
  File "/Users/mhassana/Desktop/GitHub/sup3r_nov24_issue/sup3r/models/base.py", line 901, in calc_val_loss
    for val_batch in batch_handler.val_data:
  File "/Users/mhassana/Desktop/GitHub/sup3r_nov24_issue/sup3r/preprocessing/batch_handling.py", line 316, in __next__
    val_index['tuple_index']]
ValueError: could not broadcast input array from shape (18,18,14,3) into shape (18,18,24,3)

Code Sample

import os
import json
import numpy as np
import pytest
import tempfile
import tensorflow as tf
from tensorflow.python.framework.errors_impl import InvalidArgumentError

from rex import init_logger

from sup3r import TEST_DATA_DIR
from sup3r import CONFIG_DIR
from sup3r.models import Sup3rGan
from sup3r.models.data_centric import Sup3rGanDC, Sup3rGanSpatialDC
from sup3r.preprocessing.data_handling import (DataHandlerH5,
                                               DataHandlerDCforH5)
from sup3r.preprocessing.batch_handling import (BatchHandler,
                                                BatchHandlerDC,
                                                SpatialBatchHandler,
                                                BatchHandlerSpatialDC)
from sup3r.utilities.loss_metrics import MmdMseLoss


FP_WTK = os.path.join(TEST_DATA_DIR, 'test_wtk_co_2012.h5')
TARGET_COORD = (39.01, -105.15)
FEATURES = ['U_100m', 'V_100m', 'BVF2_200m']


def test_train_st_weight_update(n_epoch=5, log=False, temporal_slice=slice(None, None, 1)):
    """Test basic spatiotemporal model training with discriminators and
    adversarial loss updating."""
    if log:
        init_logger('sup3r', log_level='DEBUG')

    fp_gen = os.path.join(CONFIG_DIR, 'spatiotemporal/gen_3x_4x_2f.json')
    fp_disc = os.path.join(CONFIG_DIR, 'spatiotemporal/disc.json')

    Sup3rGan.seed()
    model = Sup3rGan(fp_gen, fp_disc, learning_rate=1e-4,
                     learning_rate_disc=3e-4)

    handler = DataHandlerH5(FP_WTK, FEATURES, target=TARGET_COORD,
                            shape=(20, 20),
                            sample_shape=(18, 18, 24),
                            temporal_slice=temporal_slice,
                            val_split=0.005,
                            max_workers=1)

    batch_handler = BatchHandler([handler], batch_size=4,
                                 s_enhance=3, t_enhance=4,
                                 n_batches=4)

    adaptive_update_bounds = (0.9, 0.99)
    with tempfile.TemporaryDirectory() as td:
        model.train(batch_handler, n_epoch=n_epoch,
                    weight_gen_advers=1e-6,
                    train_gen=True, train_disc=True,
                    checkpoint_int=10,
                    out_dir=os.path.join(td, 'test_{epoch}'),
                    adaptive_update_bounds=adaptive_update_bounds,
                    adaptive_update_fraction=0.05)

if __name__ == "__main__":
    print("\n\n DOING temporal_slice=(None, None, 1) \n\n")
    test_train_st_weight_update(log=True, n_epoch=1, temporal_slice=(None, None, 1))
    print("\n\n DOING temporal_slice=(None, None, 3) \n\n")
    test_train_st_weight_update(log=True, n_epoch=1, temporal_slice=(None, None, 3))

To Reproduce
Steps to reproduce the problem behavior

Copy code sample to tests/
Execute the python script

Expected behavior
Any temporal slicing should work (in the limit of the dataset size)

Multi gpu for conditional moment

The use of multi gpu does not work with the conditional moment batch handling. Batch handler still expects the full batch size instead of batch size divided by number of GPUs

Currently I get dimension errors when calling calc_loss_cond_mom . I suspect this has to do with the mask dimension.

exo data refactor

#152 (comment)

Refactor parallelization of bias calculations

Why this feature is necessary:

From discussions on #215, @bnb32 suggested to register as a ToDo so we don't forget about it.

A possible solution is:

Different possibilities:

Combine all the redundant code on parallelization with futures;
Move to xarray and reduce a lot of code while parallelizing with Dask;

I have considered the following alternatives:

For now it is just redundant and explicit so it is prepared for an easy and clean refactoring.

Additional context

None

Charge code

Urgency / Timeframe

Not urgent. It works as it is.

log interpolation of windspeed

Implement log interpolation of windspeed in data handler extraction.

Batch/Data handler refactors to precompute batch observations

Why this feature is necessary:
This might be needed to increase training speed.

A possible solution is:
The relevant methods are BatchHandler.__next__, DataHandler.get_observation_index, and DataHandler.get_next. We should determine which of these (if not all) are requiring too much time. It's possible that get_observation_index is the bottle neck, which would make a solution much simpler (just precompute indices). If __next__ is the time hog we will likely need to precompute n_batches * batch_size * n_epochs samples and then cache these to load during training. The random sampling might be better replaced with a deterministic division of the domain though so we would only need to precompute n_batches * batch_size observations and reuse these for each epoch (would also allow us to use the same cached observations across training runs with the same epoch size and sample_shape.

Interpolator assumes ordering of height levels

Interpolator for vertical wind assumes index=0 is the min elevation but this isn't always true (e.g. ERA has index=-1 as min elevation). Raises lots of warnings. I don't think this is actually a bug because the actual interp function should work with either sorting, but warnings are scary.

sup3r/sup3r/utilities/interpolation.py

Line 214 in 7ea63c8

bad_max = max(levels) > lev_array[:, -1, :, :]