README.md for Project 3 Group H
Group members: Dongwei Fu, Alex Adams, Xinchang Li
========================Project-3-Group-H.py README===============================
We are tasked to extract weather data from Global Precipitaiton Climatology Project (GPCP) and National Centers for Environmental Prediction (NCEP) reanalysis data from the NOAA Physical Sciences Division THREDDS server, reduce and analyze those data on extreme precipitation days in Melbourne, Australia during summer months (DJF) for variables specified in the instructions.
1. GPCP data downloading and statistics
2. NCEP Reanalysis data downloading and computation
3. Plotting NCEP Reanalysis data
In Part 1, GPCP data downloading is achieved through BeautifulSoup package, a list of filenames is generated by parsing the html metadata, and each file follows: download file -> load into xarray -> aggregate -> delete file sequence to save memory and disk space.
Once all files have been processed, we end up with the xarray dataset that has all the days of global precipitation values from 1996.10 to 2019.11, and the xarray dataset is then written into netcdf file to store in a local drive.
We select 'DJF' months by using xr.DataArray.sel(time=(mdata['time.season'] == 'DJF'))
and use xr.DataArray.sel(longitude=,latitude=, method='nearest')
to select
the gridpoint closest to Melbourne, Aus.
We then drop all values that are bad data or fill-values to compute the 95-
percentile precipitation value using np.percentile(,95)
.
The Cumulative Probability Distribution Function is plotted using matplotlib. Finally, the days of precipitation exceeding the 95 percentile threhold is then written into a new .nc file.
Long term mean (LTM) fields for the base period (1981 - 2010, monthly averaged) and the daily files for the extreme precipitation days (XPRECIP) determined in Part 1 are retrieved and reduced from the THREDDS server in two different ways:
- Compact: a dictionary is created with variable names, URLs and other information essential for data retrieval and reduction, and a
for
loop (or two nestedfor
loops for looping through years) is used to loop through, retrieve and concatenate data for the selected variables. The data are stored inxarray.DataArrays
which are consoladated into dictionaries; - Direct: data for each variable is explicitly specified, retrieved and reduced with one line of code.
Anomalies for the extreme precipitation days are then calculated to be the difference between the XPRECIP mean and the LTM for each variable per grid. Note that some units are inconsistent between the two mean fields and are conformed before anomaly calculations.
XPRECIP means, LTMs and anomalies are plotted for the whole Globe using Cartopy and matplotlib. Contours, filled contours or vector plots are used based on the variables. Melbourne is marked by a red or black star in each plot for reference. To focus on the region of interest, while still showing conditions around the globe, the central longitude of the maps was shifted 180 degrees to be centered on the International Dateline, which is much closer to Melbourne. For the vector plots, the density of the wind vectors can be changed by editting the "skip" variable at the beginning of each vector plotting section, which signifies how many data points are skipped before plotting the next one in the row/column. This may be desired for more aesthetically pleasing vector maps.
- When retrieving data using the Compact method, the temporal loop might terminate if there's no XPRECIP days in one of the years. An
if
statement could be added to check each timestep and skip that year if no dates are found. - It appears that there was a problem with either the data retrieval/storage method or plotting method for the wind vector data. It was noted at one point that the same method that was used for plotting the other fields while shifting the central longitude 180 degrees did not shift the vector data as well. It might be specific to the matplotlib vector plotting methods such as .quiver and .barbs, which is why this error only presented with the wind vector data. A solution could not be found, so the wind vector data are plotted centered on the Prime Meridian instead, so that the data are still spatially accurate.