pcgotan / rainintensity Goto Github PK
View Code? Open in Web Editor NEWPlotted the Intensity-Duration-Frequency (IDF) curves by first estimating annual maximum intensity for certain time periods.
Plotted the Intensity-Duration-Frequency (IDF) curves by first estimating annual maximum intensity for certain time periods.
Hi, I found your repo very helpful. I just have some minors suggestions.
First, the Rainfall Data you star with is unconventional, so I proposed to you a simple pandas based function to take a date, value to pass from long format to matrix format.
This assumes that the rainfall data has an smaller than hour granularity.
def long_to_matrix_format(csv_path):
"""
Convert a long format CSV with 'datetime' and 'value' columns
to a matrix format with 'Year', 'Day', and hourly columns.
Parameters:
- csv_path: str, path to the long format CSV
Returns:
- matrix_df: DataFrame in matrix format
"""
# Read the CSV
df = pl.read_csv(csv_path, try_parse_dates=True).to_pandas(())
# Convert the 'datetime' column to a pandas datetime format
df["date"] = pd.to_datetime(df["date"])
# Set the DateTime column as the index
df.set_index("date", inplace=True)
df = df.resample("60min").sum()
df["date"] = df.index.to_series()
# Extract year, day of year, and hour
df["Year"] = df["date"].dt.year
df["Day"] = df["date"].dt.dayofyear
df["Hour"] = (
df["date"].dt.hour + 1
) # +1 to start hours from 1 to 24 instead of 0 to 23
# Pivot the DataFrame to the desired matrix format
matrix_df = df.pivot_table(
index=["Year", "Day"], columns="Hour", values="value", aggfunc="sum"
).reset_index()
# Rearrange columns to the desired order
matrix_df = matrix_df[["Year", "Day"] + [i for i in range(1, 25)]]
matrix_df.columns = ["Year", "Day"] + [f"Hour {i}" for i in range(1, 25)]
# Fill NaN values with 0
matrix_df.fillna(0, inplace=True)
matrix_df["Year"] = matrix_df["Year"] - matrix_df["Year"].min() + 1
return matrix_df.to_numpy().astype(float).copy()#this is important to make use of the numba code.
#-----------------------------------------
Then, the for loop you use to get the data1 dataframe is very slow, because you are using pandas vectorized functions on a (i, j) basis. I have a minor suggestion for this. do it pure python and then compile it in numba.
@numba.njit('float64[:,::1](float64[:,::1])')
@cc.export('nb_get_idf_rolling_sum', 'float64[:,::1](float64[:,::1])')
def get_idf_rolling_sum(df_array):
rows = np.unique(df_array[:, 0]).size
data = np.zeros((rows, 24))
for i in range(rows):
# Filter by year and drop the 'Year' and 'Day' columns
df1_array = df_array[df_array[:, 0] == i + 1][:, 2:].T
# Loop through 0 to 23 to calculate the max rolling sum for each window size
for j in range(24):
max_rolling_sum = np.NINF
# Loop through each column
for col in range(df1_array.shape[1]):
# Loop through each possible window in the current column
for row in range(df1_array.shape[0] - j):
rolling_sum = np.sum(df1_array[row : row + j + 1, col])
# Update max rolling sum if this sum is greater
if rolling_sum > max_rolling_sum:
max_rolling_sum = rolling_sum
# Store the max rolling sum in the data array
data[i, j] = max_rolling_sum
return data
The numba function replaces this part.
---> for i in range(99):
df1 = (((df.where(df['Year']==i+1)).dropna()).drop(['Year','Day'],axis=1)).T
for j in range(24):
data[i][j] = max((df1.rolling(j+1).sum()).max())
#-------------------
This a time test for a 10 millon rows of a csv file of the long format meaning:
date,value
1981-01-01 01:00:00, 2.5
1981-01-01 02:00:00, 2.5
1981-01-01 03:00:00, 2.5
1981-01-01 04:00:00, 2.5
.
.
.
nth 10 million
#convert long format to matrix format
t0 = 0.5562264919281006
#proces roliing sum
t1 = 0.024176836013793945
This two functions use polars, numpy, numba and pandas.
I would be nice if you turn this into a python package maybe adding a couple of features, there are not many python packages for IDF curves construction.
Have a nice day,
Marcelo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.