sylvaticus / betaml.jl Goto Github PK
View Code? Open in Web Editor NEWBeta Machine Learning Toolkit
License: MIT License
Beta Machine Learning Toolkit
License: MIT License
If only one class is seen in the training data, the model fits okay, but prediction fails. I wonder if this is something that could be supported. Encountered this issue when doing cv for a very small binary classification problem (crabs).
using MLJ
Model = @load KernelPerceptronClassifier
model = Model()
X = (x=rand(10), );
y = coerce(collect("aaaaaaaaaab"), Multiclass)[1:10];
julia> unique(y)
1-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> levels(y)
2-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
# works fine:
mach = machine(model, X, y) |> fit!;
# problem:
julia> predict_mode(mach, X)
ERROR: BoundsError: attempt to access 0-element Vector{Matrix{Float64}} at index [1]
Stacktrace:
[1] getindex
@ ./array.jl:861 [inlined]
[2] predict(x::Matrix{Float64}, xtrain::Vector{Matrix{Float64}}, ytrain::Vector{Vector{Int64}}, α::Vector{Vector{Int64}}, classes::Vector{Char}; K::typeof(BetaML.Utils.radialKernel))
@ BetaML.Perceptron ~/.julia/packages/BetaML/AeLyL/src/Perceptron/Perceptron.jl:622
[3] predict(model::BetaML.Perceptron.KernelPerceptronClassifier, fitresult::Tuple{NamedTuple{(:x, :y, :α, :classes, :K), Tuple{Vector{Matrix{Float64}}, Vector{Vector{Int64}}, Vector{Vector{Int64}}, Vector{Char}, typeof(BetaML.Utils.radialKernel)}}, Vector{Char}}, Xnew::NamedTuple{(:x,), Tuple{Vector{Float64}}})
@ BetaML.Perceptron ~/.julia/packages/BetaML/AeLyL/src/Perceptron/Perceptron_MLJ.jl:137
[4] predict_mode(m::BetaML.Perceptron.KernelPerceptronClassifier, fitresult::Tuple{NamedTuple{(:x, :y, :α, :classes, :K), Tuple{Vector{Matrix{Float64}}, Vector{Vector{Int64}}, Vector{Vector{Int64}}, Vector{Char}, typeof(BetaML.Utils.radialKernel)}}, Vector{Char}}, Xnew::NamedTuple{(:x,), Tuple{Vector{Float64}}})
@ MLJBase ~/MLJ/MLJBase/src/interface/model_api.jl:11
[5] predict_mode(mach::Machine{BetaML.Perceptron.KernelPerceptronClassifier, true}, Xraw::NamedTuple{(:x,), Tuple{Vector{Float64}}})
@ MLJBase ~/MLJ/MLJBase/src/operations.jl:85
[6] top-level scope
@ REPL[39]:1
julia> oneHotEncoder([-1,1,1])
ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [-1]
Stacktrace:
[1] setindex!
@ ./array.jl:903 [inlined]
[2] oneHotEncoderRow(x::Int64; d::Int64, factors::UnitRange{Int64}, count::Bool)
@ BetaML.Utils ~/.julia/packages/BetaML/cpTAz/src/Utils/Processing.jl:64
[3] oneHotEncoder(Y::Vector{Int64}; d::Int64, factors::UnitRange{Int64}, count::Bool)
@ BetaML.Utils ~/.julia/packages/BetaML/cpTAz/src/Utils/Processing.jl:127
[4] oneHotEncoder(Y::Vector{Int64})
@ BetaML.Utils ~/.julia/packages/BetaML/cpTAz/src/Utils/Processing.jl:121
[5] top-level scope
@ REPL[5]:1
julia> oneHotEncoder([-1,1,1],factors=[-1,1])
ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [-1]
Stacktrace:
[1] setindex!
@ ./array.jl:903 [inlined]
[2] oneHotEncoderRow(x::Int64; d::Int64, factors::UnitRange{Int64}, count::Bool)
@ BetaML.Utils ~/.julia/packages/BetaML/cpTAz/src/Utils/Processing.jl:64
[3] oneHotEncoder(Y::Vector{Int64}; d::Int64, factors::Vector{Int64}, count::Bool)
@ BetaML.Utils ~/.julia/packages/BetaML/cpTAz/src/Utils/Processing.jl:127
[4] top-level scope
@ REPL[6]:1
Since you where renaming the package only two days ago, should NN also be used (more appropriate as acronym, like RBF).
I'm not sure it's advised to have both (a new module adding the old). If you change, or if not, could something like:
module Nn
[init] throw("do use: using NN")
end
work or wise versa?
I was surprised when re-running some ecosystem-wide integration tests to get this message when training these using the MLJ interface: MultitargetNeuralNetworkRegressor
NeuralNetworkRegressor
:
Wrong verbosity level. Verbosity must be either 0, 10, 20, 30 or 40
I was probably using verbosity =-1
to suppress warnings.
I understand MLJ spec is mostly silent on this, but in practice the rule has been : "With the exception of warnings, training should be silent if verbosity == 0. Lower values should suppress warnings" and I would add "any integer should be allowed".
Perhaps in the MLJ interface for the BetaML models one could map
<= 0 -> 0
1 -> 10
2 -> 20
3 -> 30
>= 5 -> 40
or similar ??
Implement the comments of @ablaom for AutoEncoderMLJ
The compat upgrade has been merged to master but a new release was never tagged.
@sylvaticus Could we please have a new tagged release? This is causing issues downstream for MLJModels.
julia> using MLJ
julia> Model = @load KernelPerceptronClassifier
[ Info: For silent loading, specify `verbosity=0`.
import BetaML ✔
BetaML.Perceptron.KernelPerceptronClassifier
julia> model = Model()
KernelPerceptronClassifier(
K = BetaML.Utils.radialKernel,
maxEpochs = 100,
initialα = Int64[],
shuffle = false,
rng = Random._GLOBAL_RNG())
julia> X = (x=rand(10), );
julia> y = coerce(collect("abababababcc"), Multiclass)[1:10];
julia> unique(y)
2-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
julia> levels(y)
3-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> mach = machine(model, X, y) |> fit!;
[ Info: Training machine(KernelPerceptronClassifier(K = radialKernel, …), …).
julia> predict_mode(mach, X) |> levels
2-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
That last indicates a bug, as all levels in the pool of the training vector should be present in the pool of the predictions.
Curiously in other classifiers I looked at, the levels are indeed being tracked correctly. So perhaps have a look at, eg, the BetaML DecisionTreeClassifier to see how this can be corrected.
This bug is causing a failure when the model is bagged in an ensemble using EnsembleModel
because some classes are not present in some of the bagged observations, but are present in others.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
The first thing I want to know looking at this package is why would I use it? What is the big advantage over vanilla flux? It would be great to add this right at the beginning of the readme.
Running MLJModels.@update
to update MLJ's model registry is running into this new error:
ERROR: LoadError: Bad `load_path` trait for BetaML.Imputation.BetaMLGMMImputer: BetaMLGMMImputer not a registered package.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] top-level scope
@ ~/MLJ/MLJModels/src/registry/src/update.jl:122
[3] eval
@ ./boot.jl:373 [inlined]
[4] eval(x::Expr)
@ Base.MainInclude ./client.jl:453
[5] _update(mod::Module, test_env_only::Bool)
@ MLJModels.Registry ~/MLJ/MLJModels/src/registry/src/update.jl:153
[6] var"@update"(__source__::LineNumberNode, __module__::Module)
@ MLJModels.Registry ~/MLJ/MLJModels/src/registry/src/update.jl:24
in expression starting at REPL[4]:1
I wanted to use BetaML.jl
in a project, however when I try doing so I get the following error:
julia> using Foo
[ Info: Precompiling Foo [4817f03b-69bd-4595-9d0a-a711fd8a192f]
ERROR: LoadError: InitError: Evaluation into the closed module `Perceptron` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `Perceptron` with `eval` during precompilation - don't do this.
Stacktrace:
[1] eval
@ ./boot.jl:368 [inlined]
[2] eval(x::Expr)
@ BetaML.Perceptron ~/.julia/packages/BetaML/mqBvh/src/Perceptron/Perceptron.jl:19
[3] metadata_pkg(T::Type; name::String, uuid::String, url::String, julia::Bool, license::String, is_wrapper::Bool, package_name::String, package_uuid::String, package_url::String, is_pure_julia::Bool, package_license::String)
@ MLJModelInterface ~/.julia/packages/MLJModelInterface/wwFA9/src/metadata_utils.jl:54
[4] #41
@ ./broadcast.jl:1284 [inlined]
[5] _broadcast_getindex_evalf
@ ./broadcast.jl:670 [inlined]
[6] _broadcast_getindex
@ ./broadcast.jl:643 [inlined]
[7] #29
@ ./broadcast.jl:1075 [inlined]
[8] macro expansion
@ ./ntuple.jl:74 [inlined]
[9] ntuple
@ ./ntuple.jl:69 [inlined]
[10] copy
@ ./broadcast.jl:1075 [inlined]
[11] materialize
@ ./broadcast.jl:860 [inlined]
[12] __init__()
@ BetaML ~/.julia/packages/BetaML/mqBvh/src/BetaML.jl:63
[13] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
@ Base ./loading.jl:831
[14] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
@ Base ./loading.jl:1039
[15] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1315
[16] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[17] macro expansion
@ ./loading.jl:1180 [inlined]
[18] macro expansion
@ ./lock.jl:223 [inlined]
[19] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
[20] include
@ ./Base.jl:419 [inlined]
[21] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1554
[22] top-level scope
@ stdin:1
during initialization of module BetaML
in expression starting at /data_temp/picaud/Temp/Beta/Foo.jl/src/Foo.jl:1
in expression starting at stdin:1
ERROR: Failed to precompile Foo [4817f03b-69bd-4595-9d0a-a711fd8a192f] to /home/picaud/.julia/compiled/v1.8/Foo/jl_a1tr7Z.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:1707
[3] compilecache
@ ./loading.jl:1651 [inlined]
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1337
[5] _require_prelocked(uuidkey::Base.PkgId)
@ Base ./loading.jl:1200
[6] macro expansion
@ ./loading.jl:1180 [inlined]
[7] macro expansion
@ ./lock.jl:223 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1144
The error is not present when I remove precompilation, the BetaML.jl
"patch" is:
# function __init__()
# MMI.metadata_pkg.(MLJ_INTERFACED_MODELS,
# name = "BetaML",
# uuid = "024491cd-cc6b-443e-8034-08ea7eb7db2b", # see your Project.toml
# url = "https://github.com/sylvaticus/BetaML.jl", # URL to your package repo
# julia = true, # is it written entirely in Julia?
# license = "MIT", # your package license
# is_wrapper = false, # does it wrap around some other package?
# )
# end
Steps to reproduce :
Create a local package Foo (in /tmp/ by example)
cd /tmp
(@v1.8) pkg> generate Foo.jl
Generating project Foo:
Foo.jl/Project.toml
Foo.jl/src/Foo.jl
(@v1.8) pkg> activate ./Foo.jl/
Activating project at `/tmp/Foo.jl`
(Foo) pkg> add BetaML
(Foo) pkg> activate
Activating project at `~/.julia/environments/v1.8`
(@v1.8) pkg> dev ./Foo.jl/
Resolving package versions...
Then modify Foo.jl as follows :
module Foo
using BetaML # <---- here
greet() = print("Hello World!")
end # module Foo
Then from Julia type
julia> using Foo
and I (and maybe you) will get the error I mentioned at the beginning.
Thanks!
The GMMClusterer is an unsupervised probabilistic model. However we can't check that programmatically because of JuliaAI/MLJModelInterface.jl#120
Is there any fix to make sure that both KMeans and GMMClusterer return a set of categorical values? Right now predict(Kmeans(), ...)
will return a vector of categorical values whereas predict(GMMClusterer(), ...)
will return a vector of distributions.
Need fitesult
-> fitresult
in
BetaML.jl/src/Clustering/Clustering_MLJ.jl
Line 178 in df11d62
See also #64
In MLJ learned parameters are distinct from hyper-parameters. A "model" in MLJ is a container for hyper-parameters and that is all.
For this reason, there should be no reason forMMI.fit
should to mutate model fields and the original API forbade this (Unfortunately, this rule seems to have disappeared from the docs JuliaAI/MLJ.jl#755). Only clean!
can mutate the fields, and only if they don't make sense. One execption is that fit
may mutate a RNG.
So this is currently non-compliant:
using Pkg
Pkg.activate(temp=true)
Pkg.add("MLJBase")
Pkg.add(name="BetaML", rev="master")
using MLJBase
import BetaML
model = BetaML.Clustering.MissingImputator()
mixtures = deepcopy(model.mixtures)
X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38;
missing -2.3; 5.2 -2.4] |> MLJBase.table
mach = machine(model, X) |> fit!
julia> @assert model.mixtures == mixtures
ERROR: AssertionError: model.mixtures == mixtures
Stacktrace:
[1] top-level scope at REPL[40]:1
Maybe MMI.fit
can begin by creating a deepcopy
of mixtures
and p₀
, in this and the related models.
Add PAM (or FastPAM) to fit the KMedoidsClusterer
Specifically, I think separating the modules in this into subpackages (i.e. reexported as part of a larger overall BetaML package) would help a lot with discoverability; for instance, the problem I mentioned earlier of people in the stats community having lots of trouble finding the imputation methods here.
From here:
input_scitype = MMI.Table(MMI.Missing, MMI.Known), # also ok: MMI.Table(Union{MMI.Missing, MMI.Known}),
What is written in the comment is correct. What is actually used is not:
julia> X = (; x=[missing, 1, 2])
(x = Union{Missing, Int64}[missing, 1, 2],)
julia> scitype(X) <: Table(Missing, Known)
false
julia> scitype(X) <: Table(Union{Missing, Known})
true
For more on the Table
scitype constructor, see here.
All the tree scitypes need changing.
Sorry that I did not pick this up in my review.
The overhead for constructing UnivariateFinite
objects one at a time is very high. For this reason a UnivariateFiniteArray
implementation of AbstractArray{<:UnivariateFinite}
was developed. This includes optimised implementations of broadcasting pdf
, and so forth.
I recommend that in the BetaML classifiers one contruct probabilistic predictions by applying the UnivariateFinite(...)
constructor (which can construct arrays as well as singletons) to the full matrix of probabilities (with all observations in it). You can see examples of this in all the MLJ probabilistic classifier interfaces. I am copying the doc-string for this constructor below:
cc @OkonSamuel
UnivariateFinite(support,
probs;
pool=nothing,
augmented=false,
ordered=false)
Construct a discrete univariate distribution whose finite support is
the elements of the vector support
, and whose corresponding
probabilities are elements of the vector probs
. Alternatively,
construct an abstract array of UnivariateFinite
distributions by
choosing probs
to be an array of one higher dimension than the array
generated.
Unless pool
is specified, support
should have type
AbstractVector{<:CategoricalValue}
and all elements are assumed to
share the same categorical pool, which may be larger than support
.
Important. All levels of the common pool have associated
probabilities, not just those in the specified support
. However,
these probabilities are always zero (see example below).
If probs
is a matrix, it should have a column for each class in
support
(or one less, if augment=true
). More generally, probs
will be an array whose size is of the form (n1, n2, ..., nk, c)
,
where c = length(support)
(or one less, if augment=true
) and the
constructor then returns an array of size (n1, n2, ..., nk)
.
using CategoricalArrays
v = categorical([:x, :x, :y, :x, :z])
julia> UnivariateFinite(classes(v), [0.2, 0.3, 0.5])
UnivariateFinite{Multiclass{3}}(x=>0.2, y=>0.3, z=>0.5)
julia> d = UnivariateFinite([v[1], v[end]], [0.1, 0.9])
UnivariateFinite{Multiclass{3}(x=>0.1, z=>0.9)
julia> rand(d, 3)
3-element Array{Any,1}:
CategoricalArrays.CategoricalValue{Symbol,UInt32} :z
CategoricalArrays.CategoricalValue{Symbol,UInt32} :z
CategoricalArrays.CategoricalValue{Symbol,UInt32} :z
julia> levels(d)
3-element Array{Symbol,1}:
:x
:y
:z
julia> pdf(d, :y)
0.0
Alternatively, support
may be a list of raw (non-categorical)
elements if pool
is:
some CategoricalArray
, CategoricalValue
or CategoricalPool
,
such that support
is a subset of levels(pool)
missing
, in which case a new categorical pool is created which has
support
as its only levels.
In the last case, specify ordered=true
if the pool is to be
considered ordered.
julia> UnivariateFinite([:x, :z], [0.1, 0.9], pool=missing, ordered=true)
UnivariateFinite{OrderedFactor{2}}(x=>0.1, z=>0.9)
julia> d = UnivariateFinite([:x, :z], [0.1, 0.9], pool=v) # v defined above
UnivariateFinite(x=>0.1, z=>0.9) (Multiclass{3} samples)
julia> pdf(d, :y) # allowed as `:y in levels(v)`
0.0
v = categorical([:x, :x, :y, :x, :z, :w])
probs = rand(100, 3)
probs = probs ./ sum(probs, dims=2)
julia> UnivariateFinite([:x, :y, :z], probs, pool=v)
100-element UnivariateFiniteVector{Multiclass{4},Symbol,UInt32,Float64}:
UnivariateFinite{Multiclass{4}}(x=>0.194, y=>0.3, z=>0.505)
UnivariateFinite{Multiclass{4}}(x=>0.727, y=>0.234, z=>0.0391)
UnivariateFinite{Multiclass{4}}(x=>0.674, y=>0.00535, z=>0.321)
⋮
UnivariateFinite{Multiclass{4}}(x=>0.292, y=>0.339, z=>0.369)
Unless augment=true
, sums of elements along the last axis (row-sums
in the case of a matrix) must be equal to one, and otherwise such an
array is created by inserting appropriate elements ahead of those
provided. This means the provided probabilities are associated with
the the classes c2, c3, ..., cn
.
UnivariateFinite(prob_given_class; pool=nothing, ordered=false)
Construct a discrete univariate distribution whose finite support is
the set of keys of the provided dictionary, prob_given_class
, and
whose values specify the corresponding probabilities.
The type requirements on the keys of the dictionary are the same as
the elements of support
given above with this exception: if
non-categorical elements (raw labels) are used as keys, then
pool=...
must be specified and cannot be missing
.
If the values (probabilities) are arrays instead of scalars, then an
abstract array of UnivariateFinite
elements is created, with the
same size as the array.
The algorithm listed as GeneralImputer
here is more widely-known as MICE (Multiple imputation by chained equations) in statistics. I'm not sure if the name used here is standard in ML, but the lack of a solid MICE implementation is a common complaint in the Julia statistics ecosystem, so I was very surprised to stumble across this pure-Julia implementation of MICE under a completely different name. Would it make sense to either rename or alias GeneralImputer
to make this easier to discover?
import BetaML
using MLJTestInterface
@testset "generic mlj interface test" begin
f, s = MLJTestInterface.test(
[BetaML.Bmlj.KMeansClusterer,],
MLJTestInterface.make_regression()[1];
mod=@__MODULE__,
verbosity=0, # bump to debug
throw=true, # set to true to debug (`false` in CI)
)
@test isempty(failures)
end
# generic mlj interface test: Error During Test at REPL[11]:1
# Got exception outside of a @test
# UndefVarError: `fitresults` not defined
# Stacktrace:
# [1] attempt(f::MLJTestInterface.var"#9#10"{BetaML.Bmlj.KMeansClusterer, Tuple{@NamedTuple{Rm::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, LStat::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}}, message::String; throw::Bool)
< parts omitted for clarity >
# caused by: UndefVarError: `fitresults` not defined
# Stacktrace:
# [1] fitted_params(model::BetaML.Bmlj.KMeansClusterer, fitresult::@NamedTuple{classes::Vector{Int64}, centers::Matrix{Float64}, distanceFunction::BetaML.Bmlj.var"#13#15"})
# @ BetaML.Bmlj ~/.julia/packages/BetaML/SPPMQ/src/Bmlj/Clustering_mlj.jl:175
# [2] fitted_params(mach::MLJBase.Machine{BetaML.Bmlj.KMeansClusterer, true})
# @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/machines.jl:820
# [3] (::MLJTestInterface.var"#9#10"{BetaML.Bmlj.KMeansClusterer, Tuple{@NamedTuple{Rm::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, LStat::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}})()
# @ MLJTestInterface ~/.julia/packages/MLJTestInterface/6i2JH/src/attemptors.jl:85
# [4] attempt(f::MLJTestInterface.var"#9#10"{BetaML.Bmlj.KMeansClusterer, Tuple{@NamedTuple{Rm::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, LStat::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}}, message::String; throw::Bool)
# @ MLJTestInterface ~/.julia/packages/MLJTestInterface/6i2JH/src/attemptors.jl:15
# [5] #fitted_machine#8
# @ ~/.julia/packages/MLJTestInterface/6i2JH/src/attemptors.jl:77 [inlined]
# [6] fitted_machine
# @ ~/.julia/packages/MLJTestInterface/6i2JH/src/attemptors.jl:75 [inlined]
# [7] test(model_types::Vector{DataType}, data::@NamedTuple{Rm::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, LStat::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}; mod::Module, level::Int64, throw::Bool, verbosity::Int64)
# @ MLJTestInterface ~/.julia/packages/MLJTestInterface/6i2JH/src/test.jl:202
# [8] macro expansion
# @ REPL[11]:2 [inlined]
# [9] macro expansion
# @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
# [10] top-level scope
# @ REPL[11]:2
# [11] eval
# @ Core ./boot.jl:385 [inlined]
# [12] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
# @ REPL /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
# [13] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
# @ REPL /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
# [14] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
# @ REPL /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
# [15] run_repl(repl::AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::
# Any)
# @ REPL /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
# [16] run_repl(repl::AbstractREPL, consumer::Any)
# @ REPL /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
# [17] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
# @ Base ./client.jl:432
# [18] #invokelatest#2
# @ Base ./essentials.jl:887 [inlined]
# [19] invokelatest
# @ Base ./essentials.jl:884 [inlined]
# [20] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
# @ Base ./client.jl:416
# [21] exec_options(opts::Base.JLOptions)
# @ Base ./client.jl:333
# [22] _start()
# @ Base ./client.jl:552
# Test Summary: | Error Total Time
# generic mlj interface test | 1 1 6.6s
# ERROR: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
Can you please provide a full example with GaussianMixtureClusterer? I tried to instantiate the type but it is giving me an error saying m is not defined.
This code used to work:
using MLJ: @load
gmm = @load GMMClusterer pkg=BetaML verbosity=0
gmm(K=4)
Now I understand that the new model name is GaussianMixtureClusterer, but the construction is failing.
Hi,
I just happened to (so far) only contribute activation functions to your project. Not that I use it or any of the others. I would like to help the one project where it makes the biggest impact, or one central place and this may be it:
I recently updated my packages and noticed that I couldn't create an MLJ machine with the Gaussian Mixture Model with BetaML v0.11.0. The older version v0.10.4 is working fine. I have not checked whether this is true for other models in BetaML
Reproducable example:
julia> using MLJ
julia> GMM = MLJ.@load GaussianMixtureClusterer pkg=BetaML verbosity=0
BetaML.GMM.GaussianMixtureClusterer
julia> machine(GMM(), rand(100, 10))
ERROR: MethodError: no method matching machine(::BetaML.GMM.GaussianMixtureClusterer, ::Matrix{Float64})
Closest candidates are:
machine(::Type{<:Model}, ::Any...; kwargs...)
@ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/machines.jl:336
machine(::Static, ::Any...; cache, kwargs...)
@ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/machines.jl:340
machine(::Union{Symbol, Model}, ::Any, ::AbstractNode, ::AbstractNode...; kwargs...)
@ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/machines.jl:359
...
Stacktrace:
[1] top-level scope
@ REPL[4]:1
julia> using Pkg
julia> Pkg.status()
Project MLJ_debug v0.1.0
Status `~/tmp/MLJ_debug/Project.toml`
[024491cd] BetaML v0.11.0
[add582a8] MLJ v0.20.2
@JuliaRegistrator register
Release notes:
First registered release of Bmlt, the Beta Machine Learning Toolkit
We are currently implementing detailed docstrings for all MLJ models, following a standard we have developed. See this issue: JuliaAI/MLJ.jl#913
@sylvaticus If it is helpful to you, @josephsdavid, who is helping us this summer as GSoD technical writer can prepare PRs for you to review. David is a working data scientist with some Julia knowledge. You will need to let me know soon if you would like this.
The code
modelType = @load RandomForestClassifier pkg = "BetaML" verbosity=1
mod = modelType(
n_trees = 2,
max_depth = 10
)
is not working in the latest version of BetaML.
AS it is a template function, it is defined over a single eltype T of the mixtures vector.
Need to be refactored to work with mixed cases (if one really needs different mixture types for the different classes)
Could be doing something I'm not supposed to, but I can't seem to get this to work.
Platform details:
Julia: v1.5.1
BetaML: v0.3.0
Minimum example:
import BetaML
BetaML.Trees.buildForest(rand(100), rand(100))
ERROR: BoundsError: attempt to access (100,)
at index [2]
Stacktrace:
[1] indexed_iterate at .\tuple.jl:81 [inlined]
[2] buildForest(::Array{Float64,1}, ::Array{Float64,1}, ::Int64; maxDepth::Int64, minGain::Float64, minRecords::Int64, maxFeatures::Int64, splittingCriterion::String, forceClassification::Bool) at C:\[...]\.julia\packages\BetaML\w0Pyx\src\Trees.jl:430
[3] buildForest(::Array{Float64,1}, ::Array{Float64,1}, ::Int64) at C:\[...]\.julia\packages\BetaML\w0Pyx\src\Trees.jl:429 (repeats 2 times)
[4] top-level scope at REPL[157]:1
Change the loop order in cache/predict of the AutoEncoder to allow convolutional layers with non-array output in layers
julia> fit!(Scaler(),[1,10,100])
ERROR: BoundsError: attempt to access Tuple{Int64} at index [2]
Stacktrace:
[1] indexed_iterate
@ ./tuple.jl:88 [inlined]
[2] _fit(m::StandardScaler, skip::Vector{Int64}, X::Vector{Int64}, cache::Bool)
@ BetaML.Utils ~/.julia/dev/BetaML/src/Utils/Processing.jl:645
[3] fit!(m::Scaler, x::Vector{Int64})
@ BetaML.Utils ~/.julia/dev/BetaML/src/Utils/Processing.jl:860
[4] top-level scope
@ REPL[17]:1
┌ BetaML [024491cd-cc6b-443e-8034-08ea7eb7db2b]
│ ┌ Warning: Progress(n::Integer, dt::Real, desc::AbstractString = "Progress: ", barlen = nothing, color::Symbol = :green, output::IO = stderr; offset::Integer = 0)
is deprecated, use Progress(n; dt = dt, desc = desc, barlen = barlen, color = color, output = output, offset = offset)
instead.
│ │ caller = ip:0x0
│ └ @ Core :-1
Current scitype:
target_scitype =
AbstractVecOrMat{<:Union{ScientificTypesBase.Continuous, ScientificTypesBase.Count}},
which allows a vector as target. But using a vector throws an error:
model = BetaML.Nn.MultitargetNeuralNetworkRegressor();
X, y = make_regression(); # y is vector here
mach = machine(model, X, y)
fit!(mach)
[ Info: Training machine(MultitargetNeuralNetworkRegressor(layers = nothing, …), …).
┌ Error: Problem fitting the machine machine(MultitargetNeuralNetworkRegressor(layers = nothing, …), …).
└ @ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:682
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: The label should have multiple dimensions. Use `NeuralNetworkRegressor` for single-dimensional outputs.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] fit(m::BetaML.Nn.MultitargetNeuralNetworkRegressor, verbosity::Int64, X::Tables.MatrixTable{Matrix{Float64}}, y::Vector{Float64})
@ BetaML.Nn ~/.julia/packages/BetaML/mWUwE/src/Nn/Nn_MLJ.jl:206
[3] fit_only!(mach::Machine{BetaML.Nn.MultitargetNeuralNetworkRegressor, true}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
@ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:680
[4] fit_only!
@ ~/.julia/packages/MLJBase/97P9U/src/machines.jl:606 [inlined]
[5] #fit!#63
@ ~/.julia/packages/MLJBase/97P9U/src/machines.jl:778 [inlined]
[6] fit!(mach::Machine{BetaML.Nn.MultitargetNeuralNetworkRegressor, true})
@ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:775
[7] top-level scope
@ REPL[31]:1
One might also want to support tabular y
here, which is what other MLJ multitarget models support.
What am I missing here?
using MLJ
import BetaML.Trees
import DataFrames as DF
table = OpenML.load(42638)
df = DF.select(DF.DataFrame(table), DF.Not(:cabin))
cleaner = FillImputer()
machc = machine(cleaner, df) |> fit!
dfc = transform(machc, df)
y, X = unpack(dfc, ==(:survived))
Tree = @load DecisionTreeClassifier pkg=BetaML
tree = Tree(max_depth=3)
mach = machine(tree, X, y) |> fit!
raw_tree = fitted_params(mach).fitresult[1]
wrapped_tree = Trees.wrap(raw_tree, (feature_names=DF.names(X),))
# 2 == female?
# ├─ 1 == 3?
# │ ├─ "1" => 0.5
# │ │ "0" => 0.5
# │ │
# │ └─ "1" => 0.9470588235294117
# │ "0" => 0.052941176470588235
# │
# └─ 3 >= 7.0?
# ├─ "1" => 0.16817359855334538
# │ "0" => 0.8318264014466547
# │
# └─ "1" => 0.6666666666666666
# "0" => 0.3333333333333333
cc @roland-KA
julia> using BetaML
julia> fit!(Scaler(),[ 1 10 100; 2 20 200; 3 30 300])
ERROR: InexactError: Int64(-1.224744871391589)
Stacktrace:
[1] Int64
@ ./float.jl:900 [inlined]
[2] convert
@ ./number.jl:7 [inlined]
[3] setindex!
@ ./array.jl:971 [inlined]
[4] macro expansion
@ ./multidimensional.jl:932 [inlined]
[5] macro expansion
@ ./cartesian.jl:64 [inlined]
[6] _unsafe_setindex!(::IndexLinear, ::Matrix{Int64}, ::Vector{Float64}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
@ Base ./multidimensional.jl:927
[7] _setindex!
@ ./multidimensional.jl:916 [inlined]
[8] setindex!
@ ./abstractarray.jl:1397 [inlined]
[9] _fit(m::StandardScaler, skip::Vector{Int64}, X::Matrix{Int64}, cache::Bool)
@ BetaML.Utils ~/.julia/dev/BetaML/src/Utils/Processing.jl:656
[10] fit!(m::Scaler, x::Matrix{Int64})
@ BetaML.Utils ~/.julia/dev/BetaML/src/Utils/Processing.jl:860
[11] top-level scope
@ REPL[15]:1
During precompilation I encountered some warnings:
[ Info: Precompiling BetaML [024491cd-cc6b-443e-8034-08ea7eb7db2b]
WARNING: could not import Perceptron.KernelPerceptron into BetaML
WARNING: could not import Perceptron.KernelPerceptronHyperParametersSet into BetaML
WARNING: could not import Perceptron.Pegasos into BetaML
WARNING: could not import Perceptron.PegasosHyperParametersSet into BetaML
Maybe BetaML is importing names from Perceptron module that no longer exist?
I notice that examples in docstrings use thepredict
and fit
from MLJModelInterface
(which are not exported by MLJ, and not intended for use by general MLJ user) rather than the machine fit!
, predict
, etc methods exported by MLJ. In this respect, these model docstrings differ from all the other MLJ model docstrings, so I'd consider them "uncompliant".
I understand this is some work to correct. Still, it would be great, for uniformity, to have these changed.
While working with BetaML, DataFrames and Chain, I found that importing BetaML leads to ambiguity in findall
when working with the @chain
macro.
using Chain, DataFrames
import BetaML as BML
df = DataFrame(randn(100, 3), :auto)
# This works
transform(df, All() => ByRow((x...) -> sum(x)) => :y)
# This fails
@chain df begin
transform(_, All() => ByRow((x...) -> sum(x)) => :y)
end
I am not sure what the correct solution would be. The error log suggests defining findall(::F, ::Array{T}) where {T, F<:Function}
, but I am not experienced in managing packages and therefore not sure if one would have to keep other things in mind.
Here is the full error log:
LoadError: MethodError: findall(::Chain.var"#4#5", ::Vector{Any}) is ambiguous.
Candidates:
findall(testf::Function, A)
@ Base array.jl:2439
findall(testf::F, A::AbstractArray) where F<:Function
@ Base array.jl:2447
findall(el::T, cont::Array{T}; returnTuple) where T
@ BetaML.Utils ~/.julia/packages/BetaML/QcevM/src/Utils/Processing.jl:73
Possible fix, define
findall(::F, ::Array{T}) where {T, F<:Function}
Is there not a typing error here?
BetaML.jl/src/Utils/Measures.jl
Lines 15 to 17 in 4bf2d55
"""Cosine distance"""
cosine_distance(x,y) = dot(x,y)/(norm(x)*norm(y))
"""
I guess it should be:
"""Cosine distance"""
cosine_distance(x,y) = 1 - dot(x,y)/(norm(x)*norm(y))
"""
(if I well understood what you wanted to refer to as "cosine distance")
using MLJBase
using MLJModels
model = (@iload BetaMLGMMRegressor)()
X, y = make_regression();
mach = machine(model, X, y) |> fit!
yhat = predict(mach, X);
julia> l2(yhat, y)
ERROR: DimensionMismatch: Encountered two objects with sizes (100, 1) and (100,) which needed to match but don't.
Stacktrace:
[1] check_dimensions
@ ~/.julia/packages/MLJBase/CtxrQ/src/utilities.jl:145 [inlined]
[2] _check(measure::LPLoss{Int64}, yhat::Matrix{Float64}, y::Vector{Float64})
@ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/measures/measures.jl:60
[3] (::LPLoss{Int64})(::Matrix{Float64}, ::Vararg{Any})
@ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/measures/measures.jl:126
[4] top-level scope
@ REPL[36]:1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.