Git Product home page Git Product logo

robustmodels.jl's People

Contributors

getzze avatar svilupp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

svilupp

robustmodels.jl's Issues

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Hung Process

I have an application where I am fitting many (thousands) of MMEstimators. In one case I come by the following data and rlm runs forever. There is no error message, but It will never stop running. This result is specific to certain inputs, but I cannot find out what about these inputs causes the problem and how to deal with the problem so that my program can continue to run. Here is the example

X = [0.0; 0.0; 593.3040161132812; 680.9676513671875; 533.0647583007812; 742.5764770507812; 835.7925415039062; 1277.613525390625; 465.0248718261719; 977.941162109375; 453.80657958984375; 524.1534423828125; 400.8550109863281; 1025.7659912109375; 3729.7734375; 7977.93408203125; 33058.66796875; 58342.8359375; 96970.9765625; 125303.8515625; 105264.4453125; 68260.9375; 40450.44921875; 27465.583984375; 12540.0400390625; 10328.353515625;;]

y = [170845.453125, 373183.40625, 489773.0625, 640513.0, 896556.25, 894648.0625, 1.0691845e6, 1.056674e6, 1.2729035e6, 1.0171798125e6, 937198.375, 593592.5625, 694190.0625, 0.0, 19976.91796875, 0.0, 0.0, 32533.732421875, 0.0, 42338.94140625, 47968.13671875, 54009.5546875, 40316.9609375, 40895.29296875, 0.0, 33167.5078125]

I = (X[:,1].!=0.0) .& (y.!=0.0)

rlm(X[I,:]./mean(y),y[I]./mean(y), MMEstimator{TukeyLoss}(), initial_scale=:mad)

I can fit a simple linear model to the data easily

julia> X[I,1]\y[I]
0.6911979570792081

plot(X[I,:], y[I], seriestype = :scatter)

The "I" is so that we are considering only cases where by X and y are non-zero. I am wondering why the process hangs and what I could do to prevent this or at least skip it if it runs for too long.

A couple observations:

  1. This issue is specific to certain inputs. The error is reproducible on my machine with these inputs.
  2. If I use an M-estimator "MEstimator{TukeyLoss}()" or a TauEstimator{TukeyLoss}(), then it works. But if I use an SEstimator{TukeyLoss}() it also fails.
  3. This is a case where the data will fit the model terribly. That is OK but it may have something to do with why the process runs forever.

Also, great package. It's easy to use is helping my own project along.

StackOverflowError if inputs allow Missing type

First of all, thank you for the great package!

I think we have found a bug with @cbhower in handling inputs that allow missing (no actual missing values were present).

Expected behaviour: If I provided a matrix with a Missing eltype, I should get a MethodError that it's not supported.

Actual behaviour: User a gets StackOverflowError, which is hard to debug. See below

How to reproduce: See the MWE below. We can reproduce it on Julia 1.8.5 both ARM-based and x86-based.

mwe.jl

using RobustModels
using Random
using Missings

N = 10
# if you remove allowmissing, it will work
X = randn(N, 4) |> allowmissing
y = ones(N)

quantreg(X, y; quantile=0.5)

Returns:

LoadError: StackOverflowError:
Stacktrace:
[1] fit(::Type{QuantileRegression}, X::Matrix{Union{Missing, Float64}}, y::Vector{Float64}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:quantile,), Tuple{Float64}}}) (repeats 37022 times)
@ RobustModels ~/.julia/packages/RobustModels/FVHvF/src/quantileregression.jl:140
[2] quantreg(::Matrix{Union{Missing, Float64}}, ::Vector{Float64}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:quantile,), Tuple{Float64}}})
@ RobustModels ~/.julia/packages/RobustModels/FVHvF/src/quantileregression.jl:76

For comparison, with GLM.jl:

using GLM
using Random
using Missings

N = 10
X = randn(N, 4) |> allowmissing
y = ones(N)

lm(X, y)

Returns:

ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Union{Missing, Float64}}, ::Vector{Float64}, ::Nothing)
Closest candidates are:
fit(::Type{LinearModel}, ::AbstractMatrix{<:Real}, ::AbstractVector{<:Real}, ::Union{Nothing, Bool}; wts, dropcollinear) at ~/.julia/packages/GLM/4A2DM/src/lm.jl:134

Versioninfo():

RobustModels v0.4.5 (fresh install)

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 8 ร— Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 6 on 6 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8

appropriate R^2 for fitted models?

Is there a function (I've searched the API and can't find it but maybe have missed it) to compute variance explained? I'm using robust models from this package to compute p-values for correlations in the case of a single independent variable:

y = b0 + b1 * x

ideally I'd also want to compute the correaltion coefficient, which in the model above in the ols case is just sign(b1) * sqrt(R^2), but in this case I can't simply predict the responses and compute R^2 as per usual because of the potential for negative values.

I see in the API one possibility is (i think)

R^2 = 1 - StatsBase.deviance(model) / StatsBase.nulldeviance(model)

but I'm wondering if there's potentially the same issue here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.