Git Product home page Git Product logo

dimensionalityreduction.jl's Introduction

DimensionalityReduction.jl

**The DimensionalityReduction package is deprecated. It is superseded by a new package MultivariateStats. **.


Algorithms

  • Principal Component Analysis (PCA)

PCA Usage

using DimensionalityReduction

# simulate 100 random observations
# rotate and scale as well
X = randn(100,2) * [0.8 0.7; 0.9 0.5]
Xpca = pca(X)

Rows of X each represent a data point (i.e., a different repetition of the experiment), and columns of X represent the different variables measured.

Attributes:

Xpca.rotation                # principal components
Xpca.scores                  # rotated X
Xpca.standard_deviations     # square roots of the eigenvalues
Xpca.proportion_of_variance  # fraction of variance brought by each principal component
Xpca.cumulative_variance     # cumulative proportion of variance

By default, pca() uses SVD decomposition. Alternatively, pcaeig(X) will calculate directly the eigenvectors of the covariance matrix.

pca() centers and re-scales input data by default. This is controlled by the center and scale keyword arguments:

pca(X::Matrix ; center::Bool, scale::Bool)

Centering is done by subtracting the mean, and scaling by normalizing each variable by its standard deviation.

If scale is true (default), then the principal components of the data are also scaled back to the original space and saved to Xpca.rotation

To overlay the principal components on top of the data with PyPlot

using PyPlot
plot( X[:,1], X[:,2], "r." )  # point cloud

# get data center
ctr = mean( X, 1 )

# plot principal components as lines
#  weight by their standard deviation
PCs = Xpca.rotation
for v=1:2
	weight = Xpca.standard_deviations[v]
	plot( ctr[1] + weight * [0, PCs[1,v]], 
		  ctr[2] + weight * [0, PCs[2,v]],
		  linewidth = 2)
end

To make a biplot with PyPlot

using PyPlot
scores = Xpca.scores[:,1:2]
plot( scores[:,1], scores[:,2], "r." )

To make a biplot with Gadfly:

using Gadfly
scores = Xpca.scores[:,1:2]
pl = plot(x=scores[:,1],y=scores[:,2], Geom.point)
draw(PNG("pca.png", 6inch, 6inch), pl)

Starting from a DataFrame:

using RDatasets
iris = data("datasets", "iris")
iris = convert(Array,DataArray(iris[:,1:4]))
Xpca = pca(iris)

ICA Usage

ICA has been deprecated.

t-SNE Usage

t-SNE has been deprecated.

NMF

NMF has been moved into a separate package.

dimensionalityreduction.jl's People

Contributors

andreasnoack avatar bloody76 avatar cbecker avatar csaid avatar delafont avatar iainnz avatar johnmyleswhite avatar lindahua avatar simonster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dimensionalityreduction.jl's Issues

Package name too long?

Among all julia packages, this has the longest name (23 characters in total).

I found it pretty cumbersome in an interactive session.

What about we rename it to DimReduction? To me, this is nicer to read & type.

REQUIRE DataFrames?

Why does this package require DataFrames? I cannot find any reference (except in a comment) to the package; though I am no DataFrames expert.

More of a curiosity than anything; I was surprised to see it installed alongside this tool.

[PkgEval] DimensionalityReduction may have a testing issue on Julia 0.3 (2014-07-14)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.3

  • On 2014-07-12 the testing status was Tests pass.
  • On 2014-07-14 the testing status changed to Tests fail, but package loads.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Tests fail, but package loads. means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using worked.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

INFO: Installing ArrayViews v0.4.6
INFO: Installing DataArrays v0.1.12
INFO: Installing DataFrames v0.5.6
INFO: Installing DimensionalityReduction v0.1.0
INFO: Installing GZip v0.2.13
INFO: Installing Reexport v0.0.1
INFO: Installing SortingAlgorithms v0.0.1
INFO: Installing StatsBase v0.5.3
INFO: Package database updated
ERROR: `/` has no method matching /(::Float64, ::Array{Float64,2})
 in evaluate_source_estimation at /home/idunning/pkgtest/.julia/v0.3/DimensionalityReduction/test/ica.jl:16
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in anonymous at no file:10
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
while loading /home/idunning/pkgtest/.julia/v0.3/DimensionalityReduction/test/ica.jl, in expression starting on line 56
while loading /home/idunning/pkgtest/.julia/v0.3/DimensionalityReduction/runtests.jl, in expression starting on line 8
INFO: Package database updated

Note this is possibly due to removal of deprecated functions in Julia 0.3-rc1: JuliaLang/julia#7609

Inconsistency in PCA scaling

Even though pcaeig() and pca() take both center and scale arguments, pcaeig() ignores the value of scale

EDIT: I see that C = scale ? cor(X) : cov(X) is supposed to take care of the scaling in pcaeig(), though I got different results when I tried it before. I will investigate more on this.

Also, do others agree that eigenvalues should be scaled back to the original space?

I can take care of this in the next few days or next week, but I would like to know if the latter makes sense to others (we can also add an option for this).

better description line

the julia package listing currently only provides the package name and description, which for many packages, including DimensionalityReduction ("Methods for dimensionality reduction"), are redundant. would it make sense to change that to read "PCA, ICA, etc." or something more, well, descriptive? i searched and searched, didn't find PCA or ICA, and only learned of your hard work here after posting a question asking where they were.

Plan for v0.2.0

This package is an important foundation for our efforts towards a machine learning ecosystem. I plan to spend some time to work on this package recently.

Here is a tentative plan:

  • Remove dependence on DataFrames

    Particularly, this package should provide core dimensionality reduction algorithms that focus on ordinary arrays. Dependence on data frames should be removed.

  • Consistent interface

    To conform with other machine learning packages, this package should use column-major data set format (i.e. each column being an observation).

  • Improve PCA

    Implement multiple PCA algorithm (e.g. based on covariance, SVD, transposed data, etc), and a high-level pca function that selects an appropriate method based on input data.

  • Linear Discriminant Analysis

  • Independent Component Analysis

    It seems that a Fast ICA algorithm has been implemented. But will add testing codes.

  • Canonical Correlation Analysis

  • Factor Analysis & Probabilistic PCA

  • Classic MDS

    It is already here. May need some testing.

  • Separate NMF to another package.

    Nonnegative Matrix Factorization in itself is a big field that deserves its own package. I have created a package NMF.jl for this purpose, and implemented a more sophisticated framework there.

  • Move manifold embedding functions to a separate package. These are a different family of methods.

These are the basics. I believe we can release v0.1.0 when these are ready.

cc: @johnmyleswhite

Error with version from METADATA.jl

Hi John,

with the version from the standard installation (Pkg.add("DimensionalityRediction")) I get the following error:

ERROR: julia_pkgdir not defined
 in include at boot.jl:238
at /home/afabisch/.julia/DimensionalityReduction/src/DimensionalityReduction.jl:5
at /home/afabisch/Projekte/6473507/VisualizeMNIST.jl:5

With the latest version from the repository I don't have this problem.

MDS type not defined

The function mds(D::Array{T,2}) at DimensionalityReduction/src/mds.jl:22 returns a type MDS(X, D, k) which is not defined in the package. The typedef was removed in commit cbbbfca. Was this an accident or an intended removal of a feature?

ICA error

I ran the example given:

using DimensionalityReduction

Generate true sources

S_true = rand(5,1000);

Mixing matrix

H_true = randn(5, 5);

generate observed signals

X = H_true*S_true;
results = ica(X);

... and then computed:

Xhat = results.H * results.S;

Since there's no noise here, we should get a near-perfect reconstruction, but X - Xhat shows large errors.

[PkgEval] Your package doesn't have a test/runtests.jl file

PackageEvaluator.jl is a script that runs nightly.
It attempts to load all Julia packages and run their tests (if available) on both the stable
version of Julia (0.3) and the nightly build of the unstable version (0.4).
The results of this script are used to generate a package listing
enhanced with testing results. This service also benefits package developers by notifying them if
their package breaks for some reason (caused by e.g. changes in Julia, changes in dependencies,
or broken binary dependencies.)

Currently PackageEvaluator attempts to find your test scripts using a heuristic, preferring the
standarized test/runtests.jl whenever present. Using test/runtests.jl allows people to test
your package using simply Pkg.test("DimensionalityReduction"), with any testing-only dependencies being
installed by looking at test/REQUIRE.

Your package doesn't appear to have a test/runtests.jl file. PackageEvaluator is going to move
away from auto-detecting tests and will instead only test packages with a test/runtests.jl
file. This change will take place in about a month.

You can:

  • Add the file and tag a new version. You may in fact have already added this file but not
    tagged a new version. PackageEvaluator only tests your latest tagged verison, so you must tag
    for the file to be detected.
  • Chose to do nothing. PackageEvaluator will stop attempting to test your package, and the testing
    status will be reported as "not possible".

If you'd like help or more information, please just reply to this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.