This package has been discontinued. Most functionalities have been moved MLUtils.jl.
juliaml / learnbase.jl Goto Github PK
View Code? Open in Web Editor NEWAbstractions for Julia Machine Learning Packages
License: Other
Abstractions for Julia Machine Learning Packages
License: Other
This package has been discontinued. Most functionalities have been moved MLUtils.jl.
For MLDataUtils we need some kind of function that returns how many datapoints are in a dataset. right now I use StatsBase.nobs there. It would be useful to introduce the function here though, since I don't want packages to depend on MLDataUtils just for two function definitions.
As I see it we have three choices
nobs
, which seems like like a recipe for troubleThoughts?
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
When trying to use MLDataPattern, I keep getting an error from MLLabelUtils that "ObsDim
is not defined." This is because after the refactor, LearnBase no longer exports ObsDim
. It also no longer exports nobs
from StatsBase either.
Can we get LearnBase in sync with the other JuliaML packages? And what do we want exported and what do we leave out?
My PR to move params from Distributions to StatsBase now has 8 commits and 22 comments...
I think this is as good a time as any to visit the idea of going back to 0 dependencies, which was our original thought when we created LearnBase. We essentially only have StatsBase and Distributions in our require file for nobs
and params
/params!
. Does anyone have a strong opinion on adding these ourselves and just not exporting them? Or other solutions?
For getting this merged into METADATA I would like to present a list of packages that we know will depend on LearnBase and derivatives soon after the registration.
Please adapt the list accordingly. If you are not sure about a specific package, then let us omit it from this list for now.
Ref #48 (comment)
For (2), do we want an umbrella package or consolidation of code? Right now, I prefer the former to maintain small dependencies for people who need them. But maybe after those packages get cleaned up, they will be trivially small.
I know JuliaML is in an state of "get your hands dirty" but it would be really nice to have an explanation on how to make a model in a "JuliaML" way. If its not clear, maybe open a discussion on how we would like that to be.
The JuliaML ecosystem is right now focused on providing tools to be used afterwards to create models. There is nevertheless not a single example showing how to use the tools to build a model and how to use the model with the provided tools (MLDataUtils for example).
I would like to do a couple of things:
port an implementation of a Perceptron
in such a way that is coherent with the ecosystem.
help to build a simple tutorial showing how to use the tools (and the model) in a real (yet it can be tiny) example.
I am doing tutorials for myself but I would like to generate something more readable such as MLDataPattern documentation but I have no idea on how to build this (is it markdown? I see the extension .rst and I have no idea on how to start building pretty documentation like that).
These two traits exist for margin-based losses, but I couldn't find their docstrings in LossFunctions.jl to port over here. @Evizero do you have some references that you could share?
I somehow completely missed this discussion. I know we had a lengthy conversation at JuliaCon about why we weren't going to import StatsBase. Can we add here for posterity what changed?
As we evolve the interface, it is quite important to have clear and precise documentation for the currently implemented concepts. The docstrings already do a great job explaining the concepts, but we need an official documentation with Documenter.jl sharing our motivations for the interface design and how these concepts interact.
The build fails in Julia v1.0. What should we do about it? https://travis-ci.org/github/JuliaML/LearnBase.jl/builds/674968423
If you are ok with it, I can drop support to Julia v1.0 and require Julia >= v1.1 moving forward.
I encountered the issue when playing with Reinforce.jl, and the essential reason is in LearnBase
.
Issue description:
When a concrete type in LearnBase
derived from AbstractSet
is displayed automatically in REPL or IJulia (i.e., with no semicolon at the end), an error will be caused like follows:
Error showing value of type LearnBase.DiscreteSet{Array{Int64,1}}:
ERROR: MethodError: no method matching iterate(::LearnBase.DiscreteSet{Array{Int64,1}})
...(a lot more, omitted here)
How to reproduce
julia> using LearnBase
julia> ds = LearnBase.DiscreteSet([1, 2, 3])
Note that, if you suppress the output with a semicolon and then print it manually with print(ds)
, then no error happens and the printed result is LearnBase.DiscreteSet{Array{Int64,1}}([1, 2, 3])
.
Reason of the error
The reason is that when a variable is displayed automatically in REPL or IJulia, the display
function is used. That is, if you print the output with display(ds)
, the same error is induced. It seems that, for subtypes of AbstractSet
, the default display
method tries to iterate over each element. However, there is no default implementation in Julia to iterate an AbstractSet
. (see documentation)
Possible fix
Two obvious fixes are possible
Base.iterate
method for each related type in LearnBase
.Base.iterate
for DiscreteSet
by iterating DiscreteSet.items
, the displayed output of the above ds
isLearnBase.DiscreteSet{Array{Int64,1}} with 3 elements:
1
2
3
However, an iteration method may make little sense for LearnBase.IntervalSet
.
display
by implementing the MIME show
method for relevant types. (see documentation)Base.show(io::IO, ::MIME"text/plain", set::LearnBase.IntervalSet) = print(io, "$(typeof(set)):\n ", "lo = $(set.lo)\n ", "hi = $(set.hi)\n")
will display a LearnBase.IntervalSet(-1.0, 1.0)
as
LearnBase.IntervalSet{Float64}:
lo = -1.0
hi = 1.0
My suggestion is that
AbstractSet
pertaining to this issue.DiscreteSet
), implement also Base.iterate
. Another benefit is that, with iteration support, those types can be used in a for loop naturally.I can make a PR if you think the above suggestion is reasonable.
Let's try this out.
I transferred as little code to this package as I think is absolutely needed.
Let the discussion on what is missing / should be changed / should be added begin.
To start off: I chose to only define the baseclass Loss
here in LearnBase and will define ModelLoss
and ParameterLoss
in MLModels instead. The motivation being that it turns out that if one programs something that falls into the ModelLoss
/ ParameterLoss
framework one probably needs to import MLModels anyway. For example there are a lot of propertyfunctions such as isnemitski
there that are useful or in some cases even needed to implement an algorithm properly (at least in some cases with SVMs).
Dear all,
In this issue I would like to discuss a refactoring of LearnBase.jl to accommodate more general problems under transfer learning settings. Before I can do this, I would like to get your feedback on a few minor changes. These changes should facilitate a holistic view of the interface, and should help shape the workflow that developers are expected to follow (see #28).
Below are a few suggestions of improvement that I would like to consider.
Split the main LearnBase.jl file into smaller source files with more specific concepts. For example, I'd like to review the Cost
interface in a separate file called costs.jl
. Similarly, we could move the data orientation interface to a separate file orientation.jl
and include these two files in LearnBase.jl
.
Can we get rid of all exports in the module? I understand that this module is intended for use by developers who would import LearnBase; const LB = LearnBase
in their code. Exporting all the names in LearnBase.jl
can lead to problems downstream like the fact that LossFunctions.jl was not exporting the abstract SupervisedLoss
type, and then users of LossFunctions.jl
would also need to import LearnBase.jl
just to get access to the name. My suggestion here is to define the interface without exports. And then each package in JuliaML can export the relevant concepts.
The interface for learning models is currently spread over various different Julia ecosystems. In most cases, there are two functions that developers need to implement (e.g. fit/predict
, model/update
, fit/transform
). I would like to do a literature review on the existing approaches, and generalize this to transfer learning settings. This generalization shouldn't force users to subtype their models from some Model
type. A traits-based interface is ideal for developers who want to plug their models after the fact, and developers interested in fitting entire pipelines (e.g. AutoMLPipeline.jl).
I would like to start addressing (1) and (2) in the following weeks. In order to address (3) I need more time to investigate and brainstorm a more general interface.
@tbreloff there are a small handful untested lines in your new code. could you maybe add some tests for them when you have a chance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.