getzze / robustmodels.jl Goto Github PK
View Code? Open in Web Editor NEWA Julia package for robust regressions using M-estimators and quantile regressions
License: MIT License
A Julia package for robust regressions using M-estimators and quantile regressions
License: MIT License
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I have an application where I am fitting many (thousands) of MMEstimators. In one case I come by the following data and rlm runs forever. There is no error message, but It will never stop running. This result is specific to certain inputs, but I cannot find out what about these inputs causes the problem and how to deal with the problem so that my program can continue to run. Here is the example
X = [0.0; 0.0; 593.3040161132812; 680.9676513671875; 533.0647583007812; 742.5764770507812; 835.7925415039062; 1277.613525390625; 465.0248718261719; 977.941162109375; 453.80657958984375; 524.1534423828125; 400.8550109863281; 1025.7659912109375; 3729.7734375; 7977.93408203125; 33058.66796875; 58342.8359375; 96970.9765625; 125303.8515625; 105264.4453125; 68260.9375; 40450.44921875; 27465.583984375; 12540.0400390625; 10328.353515625;;]
y = [170845.453125, 373183.40625, 489773.0625, 640513.0, 896556.25, 894648.0625, 1.0691845e6, 1.056674e6, 1.2729035e6, 1.0171798125e6, 937198.375, 593592.5625, 694190.0625, 0.0, 19976.91796875, 0.0, 0.0, 32533.732421875, 0.0, 42338.94140625, 47968.13671875, 54009.5546875, 40316.9609375, 40895.29296875, 0.0, 33167.5078125]
I = (X[:,1].!=0.0) .& (y.!=0.0)
rlm(X[I,:]./mean(y),y[I]./mean(y), MMEstimator{TukeyLoss}(), initial_scale=:mad)
I can fit a simple linear model to the data easily
julia> X[I,1]\y[I]
0.6911979570792081
plot(X[I,:], y[I], seriestype = :scatter)
The "I" is so that we are considering only cases where by X and y are non-zero. I am wondering why the process hangs and what I could do to prevent this or at least skip it if it runs for too long.
A couple observations:
Also, great package. It's easy to use is helping my own project along.
First of all, thank you for the great package!
I think we have found a bug with @cbhower in handling inputs that allow missing (no actual missing values were present).
Expected behaviour: If I provided a matrix with a Missing eltype, I should get a MethodError that it's not supported.
Actual behaviour: User a gets StackOverflowError, which is hard to debug. See below
How to reproduce: See the MWE below. We can reproduce it on Julia 1.8.5 both ARM-based and x86-based.
mwe.jl
using RobustModels
using Random
using Missings
N = 10
# if you remove allowmissing, it will work
X = randn(N, 4) |> allowmissing
y = ones(N)
quantreg(X, y; quantile=0.5)
Returns:
LoadError: StackOverflowError:
Stacktrace:
[1] fit(::Type{QuantileRegression}, X::Matrix{Union{Missing, Float64}}, y::Vector{Float64}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:quantile,), Tuple{Float64}}}) (repeats 37022 times)
@ RobustModels ~/.julia/packages/RobustModels/FVHvF/src/quantileregression.jl:140
[2] quantreg(::Matrix{Union{Missing, Float64}}, ::Vector{Float64}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:quantile,), Tuple{Float64}}})
@ RobustModels ~/.julia/packages/RobustModels/FVHvF/src/quantileregression.jl:76
For comparison, with GLM.jl:
using GLM
using Random
using Missings
N = 10
X = randn(N, 4) |> allowmissing
y = ones(N)
lm(X, y)
Returns:
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Union{Missing, Float64}}, ::Vector{Float64}, ::Nothing)
Closest candidates are:
fit(::Type{LinearModel}, ::AbstractMatrix{<:Real}, ::AbstractVector{<:Real}, ::Union{Nothing, Bool}; wts, dropcollinear) at ~/.julia/packages/GLM/4A2DM/src/lm.jl:134
Versioninfo():
RobustModels v0.4.5 (fresh install)
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 8 ร Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 6 on 6 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
Is there a function (I've searched the API and can't find it but maybe have missed it) to compute variance explained? I'm using robust models from this package to compute p-values for correlations in the case of a single independent variable:
y = b0 + b1 * x
ideally I'd also want to compute the correaltion coefficient, which in the model above in the ols case is just sign(b1) * sqrt(R^2)
, but in this case I can't simply predict the responses and compute R^2 as per usual because of the potential for negative values.
I see in the API one possibility is (i think)
R^2 = 1 - StatsBase.deviance(model) / StatsBase.nulldeviance(model)
but I'm wondering if there's potentially the same issue here?
@JuliaRegistrator register branch=main
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.