stelmo / linearsegmentation.jl Goto Github PK

Linear segmented regression

License: MIT License

Julia 100.00%

linearsegmentation.jl's Issues

Add Option for Scoring based on R2

Currently the score for the equality of the fit is the RMSE.
It will be great to have an option to use the R2 score as well.

Being normalized it should make tweaking the hyper parameter much easier.

Remove GLM.jl dependency

Currently segments are tuples of indices and fitted linear models from GLM.jl. This adds a rather large dependency, which is not strictly speaking necessary. Consider removing it, or making it a weak dependency that is only loaded if the user manually loads GLM. This would also justify the 1.9+ compat requirement of this package...

Segments Share Sample Indices

Looking at the generated indices of the segments I can see they share indices (I saw the last one of a segment is the first of the following segment).

I'd assume the segments should be exclusive.
If not, could such option be added?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Failing Case with Simple Data

Run the following script:

# External
using LinearSegmentation;
using StableRNGs;
using UnicodePlots;

## Constants & Configuration

oRng = StableRNG(123);

## Functions

function Conv1D( vA :: Vector{T}, vB :: Vector{T}; convMode :: String = "full" ) :: Vector{T} where {T <: Real}

    lenA = length(vA);
    lenB = length(vB);

    if (convMode == "full")
        startIdx    = 1;
        endIdx      = lenA + lenB - 1;
    elseif (convMode == "same")
        startIdx    = 1 + floor(Int, lenB / 2);
        endIdx      = startIdx + lenA - 1;
    elseif (convMode == "valid")
        startIdx    = lenB;
        endIdx      = lenA;
    end

    vO = zeros(T, lenA + lenB - 1);

    for idxB in 1:lenB
        @simd for idxA in 1:lenA
            @inbounds vO[idxA + idxB - 1] += vA[idxA] * vB[idxB];
        end
    end

    return vO[startIdx:endIdx];
end

## Parameters
# Data
numSamples    = 500;
ampFiltSize   = 20;
phaseFiltSize = 50;

vSeg = [1, 151, 301, 401, 501];

# Model
minSegLen = 30.0;
maxRmse   = 0.12;

## Load / Generate Data

vAmp    = rand(oRng, numSamples);
vAmp    = 0.2 * Conv1D(vAmp, ones(ampFiltSize) / ampFiltSize; convMode = "same");
vPhase  = 0.2 * rand(oRng, numSamples);
vPhase  = Conv1D(vPhase, ones(phaseFiltSize) / phaseFiltSize; convMode = "same");
vPhase  = cumsum(vPhase);

vX = LinRange(0, numSamples - 1, numSamples);

vC = vAmp .* cos.(2 * pi * vPhase);
vL = zeros(numSamples);
vL[vSeg[1]:(vSeg[2] - 1)] .= 0;
vL[vSeg[2]:(vSeg[3] - 1)] .= 1;
vL[vSeg[3]:(vSeg[4] - 1)] .= collect(LinRange(0.5, 1.0, 100));
vL[vSeg[4]:(vSeg[5] - 1)] .= collect(LinRange(1.0, 0.4, 100));

vY = vC .+ vL;

## Display Data

ii = 1;
vIdx = vSeg[ii]:(vSeg[ii + 1] - 1)

hP = scatterplot(vX[vIdx], vY[vIdx], width = 90, height = 8, xlim = (vX[1], vX[end]), ylim = (minimum(vY), maximum(vY)));

for ii in 2:(length(vSeg) - 1)
    local vIdx = vSeg[ii]:(vSeg[ii + 1] - 1)
    scatterplot!(hP, vX[vIdx], vY[vIdx]);
end

title!(hP, "Input Data");
xlabel!(hP, "Index");
ylabel!(hP, "Value");
display(hP);

## Analysis
segs = shortest_path(vX, vY; min_segment_length = minSegLen, fit_function = :rmse, fit_threshold = maxRmse);

# Remove the 1st item which is shared
for ii in 2:length(segs)
    deleteat!(segs[ii][1], 1);
end

## Display Results
hP = scatterplot(segs[1][1], vY[segs[1][1]], width = 90, height = 8, xlim = (vX[1], vX[end]), ylim = (minimum(vY), maximum(vY)));

for ii in 2:length(segs)
    scatterplot!(hP, segs[ii][1], vY[segs[ii][1]]);
end

title!(hP, "Linear Segmentation");
xlabel!(hP, "Index");
ylabel!(hP, "Value");
display(hP);

This is the data:

Basically an harmonic signal riding a DC and linear functions.
Should be easy case.

The output is:

Look at the bottom left of the first linear (Rising), you see magenta colors?
The segments are:

The 1 segment is 1:150
The 2 segment is 151:301
The 3 segment is 302:394
The 4 segment is 395:500

Some overlap in 3 to 4 makes some sense, but for 2 and 3?
Under no RMSE it should be an improvement.

I think it has to do with the shortest path vs. the original idea of interval partition.
Yet I'm not sure.

Adding Dynamic Programming Based Approach

I created a solver to the segmentation based on Dynamic Programming.

It is a formulation I created on my own to the problem as defined in RcppDynProg.

I will be happy to contribute it.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.