chasmani / piecewise-regression Goto Github PK

piecewise-regression (aka segmented regression) in python. For fitting straight line models to data with one or more breakpoints where the gradient changes.

License: MIT License

Python 96.75% TeX 3.25%

segmented-regression piecewise-regression regression data-analysis statistics model-fitting linear-regression python3

piecewise-regression's People

Contributors

Stargazers

Watchers

piecewise-regression's Issues

Define easy method to Get breakpoints coordinates, and segments parameters as tuples.

Hi, I like this package and I want to share some Ideas to develop it:

# after that I do the fitt
pw_fit = piecewise_regression.Fit(xx, yy, n_breakpoints=3)

why there is no easy method to get list of tuples that represents breakpoints coordinates? or even list of tuples of segments parameters.

yeah I know I can extract this stuff by myself from the current summery. but wouldn't that be more helpful?!
something like:

breakpoints = pw_fit.breakpoints_list()
# the return would be like breakpoints = [ (x1,y1),  (x2,y2), (x3,y3) ]

# and for segments parameters:
line1, line2, line3 = pw_fit.lines_params()
# the return would be like   [ (slope1, intercept1), (slope2, intercept2), (slope3, intercept3) ]

and than maybe I can preform tuple unpacking

slope1, intercept1 = line1
slope2, intercept2 = line2
slope3, intercept3 = line3

thank you for your consideration.

Alpha calculations incorrect when breakpoints are out of order

Hello,
Firstly, thanks for fairly simple to use and useful library.
I have been trying to use this library for generating linear functions to use for calibration with a microcontroller and noticed that the estimated alpha values in the summary and get_results are incorrect when the breakpoints are out of order. I don't completely understand the maths and might be missing something, however here is the test I made:

Test code

import matplotlib.pyplot as plt
import numpy as np
import piecewise_regression

x = list(range(20))
y = list(range(0, 5, 1)) + list(range(5, 0, -1)) + list(range(0, 10, 2)) + list(range(10 , 0, -2))
pw_fit = piecewise_regression.Fit(x, y, n_breakpoints=3)

pw_fit.plot_fit(color="red")
pw_fit.plot_data()
pw_fit.summary()
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Output

                    Breakpoint Regression Results                     
====================================================================================================
No. Observations                       20
No. Model Parameters                    8
Degrees of Freedom                     12
Res. Sum of Squares           6.29704e-29
Total Sum of Squares               143.75
R Squared                        1.000000
Adjusted R Squared               1.000000
Converged:                           True
====================================================================================================
====================================================================================================
                    Estimate      Std Err            t        P>|t|       [0.025       0.975]
----------------------------------------------------------------------------------------------------
const            -2.7639e-15     1.67e-15      -1.6567         0.123   -6.399e-15   8.7116e-16
alpha1                   1.0     5.51e-16   1.8147e+15     5.28e-178          1.0          1.0
beta1                   -2.0     9.14e-16  -2.1887e+15             -         -2.0         -2.0
beta2                   -4.0     1.26e-15  -3.1681e+15             -         -4.0         -4.0
beta3                    3.0     1.26e-15   2.3761e+15             -          3.0          3.0
breakpoint1              5.0     1.47e-15            -             -          5.0          5.0
breakpoint2             15.0     8.35e-16            -             -         15.0         15.0
breakpoint3             10.0     1.11e-15            -             -         10.0         10.0
----------------------------------------------------------------------------------------------------
These alphas(gradients of segments) are estimatedfrom betas(change in gradient)
----------------------------------------------------------------------------------------------------
alpha2                  -1.0     7.29e-16  -1.3718e+15     1.52e-176         -1.0         -1.0
alpha3                  -5.0     1.46e-15  -3.4295e+15     2.54e-181         -5.0         -5.0
alpha4                  -2.0     7.29e-16  -2.7436e+15      3.7e-180         -2.0         -2.0
====================================================================================================
Davies test for existence of at least 1 breakpoint: p=1.24811e-06 (e.g. p<0.05 means reject null hypothesis of no breakpoints  at 5% significance)

Note that the breakpoints are not in ascending order and the alphas don't make much sense (to me). I would expect alpha1 = 1, and the rest as some arrangement of -1, 2 and -2 (EDIT: I had previously specified an order of those numbers that would have been wrong). Occasionally when running it the breakpoints are in ascending order and the alphas match this.

My current workaround is to sort the betas and breakpoints by the breakpoints and then calculate the alphas similarly to how they are calculated when plotting the fit data:

piecewise-regression/piecewise_regression/main.py

Lines 876 to 881 in 2641812

 # Build the fit plot segment by segment. Betas are defined as 

 # difference in gradient from previous section 

 yy_plot = intercept_hat + alpha_hat * xx_plot 

 for bp_count in range(len(breakpoints)): 

 yy_plot += beta_hats[bp_count] * \ 

 np.maximum(xx_plot - breakpoints[bp_count], 0)

Discontinuous Linear Segments?

Hey there!
First of all, amazing tool, thanks so much for taking the time to publish it out to the public. Great work!

I was thinking of possibly attempting to expand the repo to also handle discontinuous linear segments: maybe a breakpoint could, in addition to catching a change in the gradient, also catch a change in the constant / intercept.
What are your thoughts? Could this technique easily be adjusted to handle this use-case?

Thanks!

Abbrivations

Hello,

I'd like to express my gratitude for your efforts in creating this valuable tool.

I have a few suggestions that I believe could enhance the user experience and improve the documentation:

Improved Variable Naming in get_results():
The get_results() function returns a wealth of useful information. However, the use of abbreviations such as se, t_stat, p_t, bic, or rss might be challenging for users from diverse fields. See: this Software Engineering Stack Exchange post.
Example Usage for get_results() and get_params() in Documentation:
While get_results() and get_params() are excellent methods, the documentation lacks examples illustrating their usage. Adding clear examples, possibly with simple mathematical models like the ones below, would significantly benefit users:

$$ y = const + α1⋅x $$

$$ y = const - β1+ α2⋅(x − breakpoint1) $$
Introduce a Function for Returning a List of Fits:
It would be valuable to introduce a new function that returns a list of fits, such as [fit_1, breakpoint1, fit_2, breakpoint2]. Each fit (e.g., fit_1) could be a callable function that accepts xs and returns corresponding ys based on the fit.

Thank you for considering these suggestions. I believe these enhancements will contribute to the overall usability and understanding of your tool.

I already forked the repository and fixed the Variable Naming in get_results(). I can request a merge. However Wanted to share my thoughts here first.

Databricks - take long time to run a model

Hello,.

I'm working on segmented regression and i'm trying to use the piecewise-regression package for my time series. But it takes an hour to fit the model even on dummy data on databricks whereas it would take seconds any other environment.

kwargs not passed to the axvspan plot for the confidence intervals customization

piecewise-regression/piecewise_regression/main.py

Line 914 in e3c5739

plt.axvspan(bp_ci[0], bp_ci[1], alpha=0.1)

Easy function to get full line equations of fitted segments

Would be more convenient

NaN for Davies Test

Hi,

I am getting a NaN after using the Davies Test, while the breakpoint looks fine. What is the best way to resolve this? Can I share my data and code snippets? Thank you.

Including uncertainties on measurements

Hi:)
Thank you for this very helpful package and the explanatory accompanying paper.
I am currently using it to fit data where I have uncertainties on both x- and y-values, so I wanted to ask if it would be possible to include them in the fitting process to make it more accurate?
Best regards,
Jana

Using ModelSelection results to choose fit with best num of breakpoints

ModelSelection is flagged as experimental, will we be able to do this soon? In the meantime, do you recommend bootstrapping and using Fit for different number of breakpoints and selecting model with highest R2? ModelSelection already seems to be doing precisely this (bootstrapping Fit and comparing across models internally), why is do you still suggest running it multiple times for robustness? Why is BIC the metric chosen in ModelSelection? (there is little justification in the documentation).

Allow for multivariate regression

Thank you for your library!
For my particular usecase, however, I need:

to allow for multivariate regression (i.e. multiple input variables)
to specify per input variable if+how many breakpoints need to be found.

Thanks in advance!

not having a function to get the y predictions

Hello,
It's good to have such a useful package for segmented regression, but I don't see any function to get the predictions directly. Can you just create another function that returns the predictions, the code is already present in plot_fit function inside Fit class.

Making this code base scikit-learn-compatible

Hello @chasmani
I came across this repo while working on some time series problems. I was looking for a piecewise linear time series forecaster but there seems to be nothing like that can be used directly. I'm using sktime and they have a nice wrapper for all regressors that are sklearn compatible. If this would be the case one could use your code to do time series forecasting in the following way.

from sktime.forecasting.trend import TrendForecaster
from sktime.datasets import load_airline
import piecewise_regression

y = load_airline()
forecaster = TrendForecaster(regressor=piecewise_regression())
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1,2,3])

I'm not sure how much effort it would be but would that be something interesting? It would make the code directly usable.

Question about calculation of VV in breakpoint_fit in main.py

Dear @chasmani,

Thank you for your work on piecewise-regression and thank you for referring to Muggeo's paper; it has been very helpful for me in understanding segmented regression.

I was wondering whether you could explain to me why the calculation of VV in line 104 in breakpoint_fit in main.py in is like this?
VV = [np.heaviside(self.xx - bp, 1) for bp in self.current_breakpoints]

In Muggeo's paper, V seems to be the negative of the indicator function (end of page 3060) because it is the derivative. Why is it not necessary to take the negative when calculating VV?

I hope you can help me to further understand. Thank you,

Kind regards,

Rianne Schouten

Please add min_breakpoints parameter for ModelSelection

IMHO, users should, in most cases, have some rough idea about the min breakponits in their data. By allowing users to specify the min breakpoints in the ModelSelection function, it'll save the model fitting time for the model with 1 breakpoints upto (min_breakpoints - 1) breakpoints.

Thanks

sklearn-compatibility of function name and instance

Thank you for providing a useful library.

I think that compatibility with scikit-learn is high as one way to improve the usability of machine learning models.

In particular, I think the following two specifications have low compatibility.

Fit is designed as a class, not a function.
A specification that requires X and y when creating an instance in "class Muggeo".

It's just my opinion.

break point confidence intervals plotting

In piecewise_regression/main.py, line 893 accepted keyword arguments for maplotlib.pyplot.axvspan but it did not pass this to line 908.

piecewise-regression/piecewise_regression/main.py

Line 893 in 1c0f74c

def plot_breakpoint_confidence_intervals(self, **kwargs):

piecewise-regression/piecewise_regression/main.py

Line 908 in 1c0f74c

plt.axvspan(bp_ci[0], bp_ci[1], alpha=0.1)

Suggestion

plt.axvspan(bp_ci[0], bp_ci[1], alpha=0.1, **kwargs)

	# Build the fit plot segment by segment. Betas are defined as
	# difference in gradient from previous section
	yy_plot = intercept_hat + alpha_hat * xx_plot
	for bp_count in range(len(breakpoints)):
	yy_plot += beta_hats[bp_count] * \
	np.maximum(xx_plot - breakpoints[bp_count], 0)

chasmani / piecewise-regression Goto Github PK

piecewise-regression's People

Contributors

Stargazers

Watchers

Forkers

piecewise-regression's Issues

Test code

Output

Recommend Projects

Recommend Topics

Recommend Org