MP-LAMP stands for Massive Parallel LAMP, which is a parallel version of LAMP. LAMP stands for Limitless-Arity Multiple-testing Procedure.
MP-LAMP will be ready by following the steps.
- Create an Amazon EC2 using the Amazon Linux Image.
- Download mp-lamp
- Uncompress it.
- Move to the top of the uncompressed directory.
- Run the following command.
$ bash aws/aws_installer_single.sh
Currently, Intel CPU and linux is assumed. If you encounter troubles during the installation process, please send us the error message and the environment.
tools | recommended version |
---|---|
Compiler | g++ (4.3 or later) |
MPI Library | OpenMPI, MPICH, MVAPICH or Intel MPI |
build tool | SCons, 2.0.0 or higher (python is needed for SCons) |
boost library | boost library 1.55.0 or later |
gflags | gflags 2.0 or later |
Notes:
-
For gcc, 4.9.3 or later is preferable. Older gcc will produce slightly slower binary.
-
The latest version of gflags requires CMAKE for the build tool. For users not familiar with CMAKE, we advise to use gflags v2.0 which could be installed by configure, make. gflags v2.0
-
Please satisfy the prerequisite.
-
Copy local.sample.cfg to local.cfg and edit appropriately.
- [compilers]
- single: compiler for non-parallel code (g++ or icpc)
- parallel: compiler for MPI (typically mpicxx)
- options: additional options for compier (added to CXXFLAGS)
- libs: additional options for library (added to LDFLAGS)
- [paths]
- include and library path
- Not needed if there is not library in non-default location.
- [compilers]
-
Sample local.cfg
[compilers]
single=g++
parallel=mpicxx
# an example for linux.
option=-DGTEST_USE_OWN_TR1_TUPLE=1 -DHAVE_CLOCK_GETTIME
libs=-lrt
# an example for Mac
# option=-msse4.2 -mpopcnt -march=corei7
[paths]
# include=/path/to/your/include_directory
# library=/path/to/your/include_library
- If local.cfg is ready, go to top directory of lamp_search and type
$ scons
or for parallel build (for 4 threads)
$ scons -j 4
Parallel binary mp_lamp will be ready.
Note: to run the parallel version, please use mpiexec
as shown in the following example.
- mp-lamp could be used from command line.
For 32 processes,
$ mpiexec -hostfile ${machinefile} -np 32 ./mp-lamp --item item_file.csv --pos positive_file.csv --a 0.05 --show_progress --log
- --item: item data file
- -pos: positive data file
- -p {"fisher", "chi", "u_test"}: This option selects the statistical test from Fisher's exact test ("fisher"), chi-squared test ("chi"), and "Mann-Whitney U-test ("u_test"). The default setting is "fisher".
- --alternative {"greater", "less", "two.sided"}: This option indicates which alternative hypothesis is used. The default setting is "greater".
- --a: significance level (default 0.05)
- --show_progress: It is adivsed to turn on show_progress for long jobs.
- --log: Shows the breakdown of execution time. It is not needed for most users. It might be useful to find out problems when mp-lamp is unexpectedly slow.
- Item data file format. By default, mp-lamp reads the following csv format item data. It assumes that the first line includes the name of the items and the rest of the lines have the name of the transactions at the beginning.
#gene,TF1,TF2,TF3,TF4
A,1,1,1,0
B,1,1,1,0
C,1,0,0,1
D,0,0,0,0
E,1,1,1,0
F,1,0,0,0
G,1,1,1,1
H,0,0,0,0
I,0,1,0,1
J,0,0,1,0
K,0,0,0,1
L,0,0,0,1
M,0,0,0,1
N,1,1,1,0
O,0,0,0,0
- Positive data file format. An example of the positive data format corresponding to the item data file is shown below. The first line is required to start with a "#". Current version crashes if the number of lines does not match with the item file.
#gene,expression
A,1
B,1
C,0
D,0
E,1
F,0
G,1
H,1
I,0
J,0
K,0
L,0
M,1
N,1
O,0
- Sample command and output of the 2-process parallel version solving the toy
data. Do not forget to invoke the command using
mpiexec
ormpirun
.
$ mpiexec -np 2 ./mp-lamp --item ./samples/sample_data/sample_item.csv --pos ./samples/sample_data/sample_expression_over1.csv --a 0.05 --show_progress
# item file : ./samples/sample_data/sample_item.csv
# positive file: ./samples/sample_data/sample_expression_over1.csv
# # of transactions= 15 # of items= 4 # of total positives= 7 max freq= 7 max positive= 5 max items in trans.= 4
# preprocess end
# lambda=6 cs_thr[lambda]= 7 pmin_thr[lambda-1]= 0.00699301 num_expand= 1 elapsed_time=0.000616
# 1st phase start
# lambda=6 closed_set_num[n>=lambda]= 4 cs_thr[lambda]= 7 pmin_thr[lambda-1]= 0.00699301 num_expand= 1 elapsed_time=0.000661
# 1st phase end
# lambda=6 num_expand= 2 elapsed_time=0.001023
# 2nd phase start
# lambda=5 int_sig_lev=0.0125 elapsed_time=0.001052
# 2nd phase end
# closed_set_num= 5 sig_lev=0.01 num_expand= 3 elapsed_time=0.001165
# 3rd phase end
# sig_lev=0.01 elapsed_time=0.001564
# time all= 0.006031 time search= 0.001858
# min. sup=5 correction factor=5
# number of significant patterns=1
# pval (raw) pval (corr) freq pos # items items
0.00699301 0.034965 5 5 3 TF1 TF2 TF3
-
Current version does not work with "mpiexec -np 1". Please use at least two processes for the parallel version.
-
Current version is only targeted for data with small number of transactions. For data with more than 100,000 transactions, please wait for the future updates.
Please contact the following for bug reports, comments, or requests.
- yoshizoe(AT)acm.org
MP-LAMP is an open source code project licensed under the Revised BSD license.