Git Product home page Git Product logo

rainintensity's People

Contributors

pcgotan avatar

Stargazers

 avatar

Watchers

 avatar

rainintensity's Issues

Hi, dont use pandas for looping, try this.

Hi, I found your repo very helpful. I just have some minors suggestions.

First, the Rainfall Data you star with is unconventional, so I proposed to you a simple pandas based function to take a date, value to pass from long format to matrix format.

This assumes that the rainfall data has an smaller than hour granularity.

def long_to_matrix_format(csv_path):
    """
    Convert a long format CSV with 'datetime' and 'value' columns
    to a matrix format with 'Year', 'Day', and hourly columns.

    Parameters:
    - csv_path: str, path to the long format CSV

    Returns:
    - matrix_df: DataFrame in matrix format
    """

    # Read the CSV
    df = pl.read_csv(csv_path, try_parse_dates=True).to_pandas(())
    
# Convert the 'datetime' column to a pandas datetime format
    df["date"] = pd.to_datetime(df["date"])
    
# Set the DateTime column as the index
    df.set_index("date", inplace=True)
    df = df.resample("60min").sum()
    df["date"] = df.index.to_series()


    # Extract year, day of year, and hour
    df["Year"] = df["date"].dt.year
    df["Day"] = df["date"].dt.dayofyear
    df["Hour"] = (
        df["date"].dt.hour + 1
    )  # +1 to start hours from 1 to 24 instead of 0 to 23

    # Pivot the DataFrame to the desired matrix format
    matrix_df = df.pivot_table(
        index=["Year", "Day"], columns="Hour", values="value", aggfunc="sum"
    ).reset_index()

    # Rearrange columns to the desired order
    matrix_df = matrix_df[["Year", "Day"] + [i for i in range(1, 25)]]
    matrix_df.columns = ["Year", "Day"] + [f"Hour {i}" for i in range(1, 25)]

    # Fill NaN values with 0
    matrix_df.fillna(0, inplace=True)

    matrix_df["Year"] = matrix_df["Year"] - matrix_df["Year"].min() + 1

    return matrix_df.to_numpy().astype(float).copy()#this is important to make use of the numba code.

#-----------------------------------------

Then, the for loop you use to get the data1 dataframe is very slow, because you are using pandas vectorized functions on a (i, j) basis. I have a minor suggestion for this. do it pure python and then compile it in numba.

@numba.njit('float64[:,::1](float64[:,::1])')
@cc.export('nb_get_idf_rolling_sum', 'float64[:,::1](float64[:,::1])')
def get_idf_rolling_sum(df_array):
    rows = np.unique(df_array[:, 0]).size
    data = np.zeros((rows, 24))

    for i in range(rows):
        # Filter by year and drop the 'Year' and 'Day' columns
        df1_array = df_array[df_array[:, 0] == i + 1][:, 2:].T

        # Loop through 0 to 23 to calculate the max rolling sum for each window size
        for j in range(24):
            max_rolling_sum = np.NINF

            # Loop through each column
            for col in range(df1_array.shape[1]):
                # Loop through each possible window in the current column
                for row in range(df1_array.shape[0] - j):
                    rolling_sum = np.sum(df1_array[row : row + j + 1, col])

                    # Update max rolling sum if this sum is greater
                    if rolling_sum > max_rolling_sum:
                        max_rolling_sum = rolling_sum

            # Store the max rolling sum in the data array
            data[i, j] = max_rolling_sum

    return data

The numba function replaces this part.
---> for i in range(99):
df1 = (((df.where(df['Year']==i+1)).dropna()).drop(['Year','Day'],axis=1)).T
for j in range(24):
data[i][j] = max((df1.rolling(j+1).sum()).max())

#-------------------
This a time test for a 10 millon rows of a csv file of the long format meaning:

date,value
1981-01-01 01:00:00, 2.5
1981-01-01 02:00:00, 2.5
1981-01-01 03:00:00, 2.5
1981-01-01 04:00:00, 2.5
.
.
.
nth 10 million

#convert long format to matrix format
t0 = 0.5562264919281006

#proces roliing sum
t1 = 0.024176836013793945

This two functions use polars, numpy, numba and pandas.

I would be nice if you turn this into a python package maybe adding a couple of features, there are not many python packages for IDF curves construction.

Have a nice day,
Marcelo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.