stelmo / linearsegmentation.jl Goto Github PK
View Code? Open in Web Editor NEWLinear segmented regression
License: MIT License
Linear segmented regression
License: MIT License
Currently the score for the equality of the fit is the RMSE.
It will be great to have an option to use the R2 score as well.
Being normalized it should make tweaking the hyper parameter much easier.
Currently segments are tuples of indices and fitted linear models from GLM.jl. This adds a rather large dependency, which is not strictly speaking necessary. Consider removing it, or making it a weak dependency that is only loaded if the user manually loads GLM. This would also justify the 1.9+ compat requirement of this package...
Looking at the generated indices of the segments I can see they share indices (I saw the last one of a segment is the first of the following segment).
I'd assume the segments should be exclusive.
If not, could such option be added?
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Run the following script:
# External
using LinearSegmentation;
using StableRNGs;
using UnicodePlots;
## Constants & Configuration
oRng = StableRNG(123);
## Functions
function Conv1D( vA :: Vector{T}, vB :: Vector{T}; convMode :: String = "full" ) :: Vector{T} where {T <: Real}
lenA = length(vA);
lenB = length(vB);
if (convMode == "full")
startIdx = 1;
endIdx = lenA + lenB - 1;
elseif (convMode == "same")
startIdx = 1 + floor(Int, lenB / 2);
endIdx = startIdx + lenA - 1;
elseif (convMode == "valid")
startIdx = lenB;
endIdx = lenA;
end
vO = zeros(T, lenA + lenB - 1);
for idxB in 1:lenB
@simd for idxA in 1:lenA
@inbounds vO[idxA + idxB - 1] += vA[idxA] * vB[idxB];
end
end
return vO[startIdx:endIdx];
end
## Parameters
# Data
numSamples = 500;
ampFiltSize = 20;
phaseFiltSize = 50;
vSeg = [1, 151, 301, 401, 501];
# Model
minSegLen = 30.0;
maxRmse = 0.12;
## Load / Generate Data
vAmp = rand(oRng, numSamples);
vAmp = 0.2 * Conv1D(vAmp, ones(ampFiltSize) / ampFiltSize; convMode = "same");
vPhase = 0.2 * rand(oRng, numSamples);
vPhase = Conv1D(vPhase, ones(phaseFiltSize) / phaseFiltSize; convMode = "same");
vPhase = cumsum(vPhase);
vX = LinRange(0, numSamples - 1, numSamples);
vC = vAmp .* cos.(2 * pi * vPhase);
vL = zeros(numSamples);
vL[vSeg[1]:(vSeg[2] - 1)] .= 0;
vL[vSeg[2]:(vSeg[3] - 1)] .= 1;
vL[vSeg[3]:(vSeg[4] - 1)] .= collect(LinRange(0.5, 1.0, 100));
vL[vSeg[4]:(vSeg[5] - 1)] .= collect(LinRange(1.0, 0.4, 100));
vY = vC .+ vL;
## Display Data
ii = 1;
vIdx = vSeg[ii]:(vSeg[ii + 1] - 1)
hP = scatterplot(vX[vIdx], vY[vIdx], width = 90, height = 8, xlim = (vX[1], vX[end]), ylim = (minimum(vY), maximum(vY)));
for ii in 2:(length(vSeg) - 1)
local vIdx = vSeg[ii]:(vSeg[ii + 1] - 1)
scatterplot!(hP, vX[vIdx], vY[vIdx]);
end
title!(hP, "Input Data");
xlabel!(hP, "Index");
ylabel!(hP, "Value");
display(hP);
## Analysis
segs = shortest_path(vX, vY; min_segment_length = minSegLen, fit_function = :rmse, fit_threshold = maxRmse);
# Remove the 1st item which is shared
for ii in 2:length(segs)
deleteat!(segs[ii][1], 1);
end
## Display Results
hP = scatterplot(segs[1][1], vY[segs[1][1]], width = 90, height = 8, xlim = (vX[1], vX[end]), ylim = (minimum(vY), maximum(vY)));
for ii in 2:length(segs)
scatterplot!(hP, segs[ii][1], vY[segs[ii][1]]);
end
title!(hP, "Linear Segmentation");
xlabel!(hP, "Index");
ylabel!(hP, "Value");
display(hP);
This is the data:
Basically an harmonic signal riding a DC and linear functions.
Should be easy case.
The output is:
Look at the bottom left of the first linear (Rising), you see magenta colors?
The segments are:
The 1 segment is 1:150
The 2 segment is 151:301
The 3 segment is 302:394
The 4 segment is 395:500
Some overlap in 3 to 4 makes some sense, but for 2 and 3?
Under no RMSE it should be an improvement.
I think it has to do with the shortest path vs. the original idea of interval partition.
Yet I'm not sure.
I created a solver to the segmentation based on Dynamic Programming.
It is a formulation I created on my own to the problem as defined in RcppDynProg.
I will be happy to contribute it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.