Comments (9)
Nice, I think this could make for more compelling examples than "generate random uniform data"! 👍
It's worth pointing out that mlpack already has a number of distribution-like classes: GaussianDistribution
, GammaDistribution
, LaplaceDistribution
, DiscreteDistribution
, and so forth. (See src/mlpack/core/dists/
.) Now, it would be cool to generate data directly from one of these distribution classes, but there are some issues: those distribution classes are typically aimed at (1) generating random samples via Random()
, and (2) evaluating probabilities via Probability()
, but that second function is totally irrelevant here---we just want to generate datasets. Even the signature of (1) is not quite right, as for existing distributions it just generates a single point.
So, certainly some additional infrastructure is necessary to generate labeled synthetic datasets, but I do think that whatever we write should be "aware" of the distribution code and make use of it when possible in the implementation (and add new distributions as needed).
A minor pedantic thought is that after #3269, pretty much everything in mlpack is directly in the mlpack::
namespace for convenience (with the exception of a couple things in util::
and a couple things in data::
). So, I'd personally prefer to avoid a simulate::
namespace.
At least personally I wouldn't worry about Open Question (2) too much; I think if we provide something relatively barebones at first, it will get immediately used in the documentation, and that's probably good enough for now.
from mlpack.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍
from mlpack.
Sounds like an expansion for distributions is in order to handle multi-point generation. With respect to random()
is this using a poorly spec'd PRNG?
On the note of namespaces, maybe this should go under util::
or where the train/test split is found?
from mlpack.
Sounds like an expansion for distributions is in order to handle multi-point generation.
Possibly, it would be great to keep things unified, but if it doesn't make sense (or if the amount of work for adapting older distributions is not feasible), in my view it's okay to keep them different.
With respect to random() is this using a poorly spec'd PRNG?
It uses std::mt19937
, not sure if that qualifies as "poor" (I am not an RNG expert).
On the note of namespaces, maybe this should go under
util::
or where the train/test split is found?
I really think a flat namespace is fine, since there aren't really going to be any naming conflicts, but Split()
is in the data::
namespace (as is Load()
and Save()
), and I suppose we could use that too. util::
is primarily for internal mlpack tooling, but this would be user-facing.
from mlpack.
@rcurtin I am trying to find a good beginner's issue, do you think this feature request can be implemented by a beginner to learn about mlpack.
from mlpack.
Some of this can be a great way to jump into the codebase.
from mlpack.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍
from mlpack.
Active PR #3647 for reg case
from mlpack.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍
from mlpack.
Related Issues (20)
- stb_image_write warning while compiling HOT 1
- Benchmark to replace the transform functions HOT 16
- Can't train a model having bias addition layer Add() HOT 8
- Reverse Convolution? HOT 6
- Documentation issue
- [R] - `verbose` argument has no effect HOT 1
- Get rid of `arma::fill::zeros` when we upgrade the minimum armadillo version HOT 5
- Document `internal_compact::` name space for `arma::fill` HOT 2
- [R] - Global option for 'verbose' argument HOT 5
- Add `.prepare` script to have r-universe automatically build new nightlies HOT 1
- bfd.h:35:2: error: #error config.h must be included before this header HOT 4
- Any ideas about Random Forest regressor? HOT 2
- Switch from `-j 2` to `-j ${nproc}` HOT 3
- dimensionality mismatch: Decision Tree CLI with both -t and -T specified
- [R] - Should the returning model object gain a class?
- NumPy 2.0 support
- [R] Switch `sprintf` to `snprintf` HOT 4
- Physics-Informed Neural Network possible with MLPack? HOT 1
- 1-D Convolution issues about time series data HOT 1
- Using Header-Only mlpack via CMake FetchContent and Automatic Dependency Download
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlpack.