Git Product home page Git Product logo

parallelsparseregression.jl's Introduction

ParallelSparseRegression

Build Status

A Julia library for parallel sparse regression using shared memory. This library implements solvers for regression problems including least squares, ridge regression, lasso, non-negative least squares, and elastic net. It also proposes to add fast methods to obtain regularization paths.

Using the Alternating Direction Method of Multipliers, all of these problems can be reduced to computing the prox of each term in the objective. We rely on the fact that the prox of each term in the objective of these regression problems can be efficiently computed in parallel.

Installation

To install, just open a Julia prompt and call

Pkg.clone("[email protected]:madeleineudell/ParallelSparseRegression.jl.git")

You'll also need to use a version of IterativeSolvers with support for caching temporary variables,

Pkg.clone("[email protected]:madeleineudell/IterativeSolvers.jl.git")

Usage

Before you begin, initialize all the processes you want to participate in multiplying by your matrix. You'll suffer decreased performance if you add more processes than you have hyperthreads on your shared-memory computer.

addprocs(3)
using ParallelSparseRegression

We will solve a sparse non-negative least squares problem.

m,n,p = 100,20,.1
A = sprand(m,n,p)
x0 = Base.shmem_randn(n)
b = A*x0
rho = 1
quiet = false
maxiters = 100

params = Params(rho,quiet,maxiters)
z = nnlsq(A,b; params=params)

We can verify the solution obtained is better than merely thresholding the entries of the least squares solution to be positive.

println("Norm of Az-b is $(norm(A*z-b))")
xp = max(x0,0)
println("Norm of A(x)_+ -b is $(norm(A*xp-b))")

parallelsparseregression.jl's People

Contributors

madeleineudell avatar

Stargazers

Jack Huang avatar Kurt Braunlich avatar Sam Powell avatar Joerg Rings avatar Andrew Rothstein avatar Jian Zhou avatar svaksha ॥ स्वक्ष ॥ avatar Florian Oswald avatar  avatar mnarayan avatar Steven Diamond avatar Ariel Núñez avatar

Watchers

Ariel Núñez avatar  avatar  avatar James Cloos avatar Florian Oswald avatar

parallelsparseregression.jl's Issues

maxiter vs maxiters

IterativeSolvers uses "maxiter", ParallelSparseRegression uses "maxiters". Should we harmonize the two to avoid confusion for end users?

Write unit tests

This should be easy to do using cvxpy. In test/test.jl, just add

using PyCall
@pyimport cvxpy

to compute the solution using a verified method and compare the results.

lasso is not able to handle matrices instead of vectors

Here is an example that could be added to the test suite, for my problem I need to send matrices instead of vectors to the lasso solver:

using ParallelSparseRegression

dimensions, components, samples = 180, 100, 1500

dictionary = sprandn(dimensions, components, 0.1)
original_code = sprandn(components, samples, 0.1)
data = dictionary * original_code

# minimize \|data - dictionary*code\|_2^2 + \lambda\|code\|_1
# for variable code.
code = lasso(data, dictionary, 1)

# FIXME(Ariel): Is this the best way to check it found the real solution?
@assert(code = original_code)

Parallel execution may be slower than serial

@madeleineudell Thanks for this package, here is the output of my initial testing:

Sample program:

using ParallelSparseRegression

m,n,p = 2048,1024,.1
A = sprand(m,n,p)
x0 = Base.shmem_randn(n)
b = A*x0
rho = 1
lambda = 1
quiet = false
maxiters = 100

params = Params(rho,quiet,maxiters)

# Lasso
@time z_lasso = lasso(A,b,lambda; params=params)

Calling the following program with different addprocs values gives the following:

Output without addprocs:

1000 : 1.76e+00 1.27e-01 5.54e-03 4.09e+01
elapsed time: 24.422318823 seconds (6440755392 bytes allocated)

Output with addprocs(3):

1000 : 2.15e+00 1.12e-01 6.07e-03 4.65e+01
elapsed time: 90.979009048 seconds (12805856436 bytes allocated)

Output with addprocs(7):

1000 : 1.75e+00 1.47e-01 5.74e-03 4.21e+01
elapsed time: 228.324713722 seconds (28927210844 bytes allocated)

Full output with values for every iteration:

https://gist.github.com/ingenieroariel/9095001

Making params optional broke examples

m,n,p = 100,20,.1
A = sprand(m,n,p)
x0 = Base.shmem_randn(n)
b = A*x0
rho = 1
quiet = false
maxiters = 100

params = Params(rho,quiet,maxiters)
z = nnlsq(A,b; params=params)

Running the above file returns now:

ERROR: params not defined

Compute regularization paths

Start with a very high regularizer, so the problem is easy to solve, and decrease it. Each iteration should be very fast. Design make_prox_* so that changing lambda incurs little overhead. Allow user to choose lambda_min and lambda_max, but also include good defaults to encourage interesting (eg non-zero) solutions.

Error: pids not defined

When I run test/test.jl, I get the error:

ERROR: pids not defined
in make_prox_lsq at /Users/stevend2/.julia/ParallelSparseRegression/src/prox.jl:51
in nnlsq at /Users/stevend2/.julia/ParallelSparseRegression/src/regression.jl:6
in include_from_node1 at loading.jl:120
while loading /Users/stevend2/PythonProjects/ParallelSparseRegression.jl/test/test.jl, in expression starting on line 15

This also showed up when I was using ParallelSparseMatMul. How do I fix this?

Undefined reference error when running tests.

Cannot run the tests after latest updates. (My system does not show a stack traceback, looking to replicate in one that does)

➜  test git:(master) julia test.jl
ERROR: access to undefined reference
while loading /Users/x/.julia/ParallelSparseRegression/test/test.jl, in expression starting on line 15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.