scilifelab / facs Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 9.0 1.53 MB

Fast and Accurate Classification of Sequences using Bloom filters

Home Page: http://facs.scilifelab.se/

License: Other

Python 13.06% JavaScript 0.50% C 20.89% Makefile 0.62% Jupyter Notebook 64.68% Jinja 0.25%

facs's People

Contributors

Stargazers

Watchers

Forkers

brainstorm tzcoolman arvestad guillermo-carrasco blahah inodb okulev espre05 dreycey

facs's Issues

Memory benchmarking

Would be nice to have also some numbers on memory usage.

facs.remove() does not find contaminated read on testsuite

@tzcoolman can you test facs against tests/data/synthetic_fastq/test200.fastq and let me know why the ecoli read does not end in the _contam file? Something related with the k_mer length maybe?:

(facs)[roman@biologin facs 0]$ ./facs remove -q ../tests/data/synthetic_fastq/test200.fastq -r ../tests/data/bloom/eschColi_K12.bloom
source->../tests/data/synthetic_fastq/test200.fastq
obj_file->../tests/data/bloom/eschColi_K12.bloom
prefix->(null)
match->../tests/data/synthetic_fastq/test200_eschColi_K12_contam.fastq
mis->../tests/data/synthetic_fastq/test200_eschColi_K12_clean.fastq
finish processing...
(facs)[roman@biologin facs 0]$ cat ../tests/data/synthetic_fastq/test200_eschColi_K12_contam.fastq
(facs)[roman@biologin facs 0]$

Thanks!
Roman

No need to normalize k-mer to the lower case

@brainstorm

In bloom.c, function bloom_add and bloom_check

I think there is no need to normalize k-mer to lower case. Could u tell me your thoughts?

Staging area/Sanity check

Test that all prerequisites are in place before starting analysis.

For testing purposes, instead of spiking a single static Ecoli read, we could generate a random one using SimNGS. To do so, and to make sure that there is no bias, we can parametrise SimNGS so it does not introduce any kind of random error.

Another good feature would be spiking a read from any of the supported organisms instead of only Ecoli -- of course not from the target organism.

facs build

The -l option does not work and there is no instructions on how the output will be handled .i.e how each generated bloom filter file will named and placed. It is also unclear how the -o flag should be handled in conjunction with the -l flag.
Is the support to use directories removed?
Would be nice with some more information on parameters and output e.g.:
Set -k to 15 nucleotides
Set -e to 0.005
Using file/list as input
Bloom filter(s) will be written to ...

Have FACS report full paths to sample

Hello Enze!

Can you tweak FACS so that it reports absolute paths to the sample(s)?

For instance:

/home/Enze/facs/tests/data/synthetic_fastq/simngs_phiX_100.fastq

instead of just: simngs_phiX_100.fastq?

... still with correct JSON formatting of course ;)

Thanks!

Fix fastq_screen test

The tests for fastq_screen are failing, apparently when generating the fastq_screen.conf file:

ERROR
Runs fastq_screen tests against synthetically generated fastq files folder ... ERROR

======================================================================
ERROR: Downloads and installs fastq_screen locally, generates fastq_screen.conf file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/bubo/home/h25/guilc/facs/tests/test_fastqscreen.py", line 62, in test_1_fetch_fastqscreen
    shutil.move(fscreen_path, self.progs)
NameError: global name 'fscreen_path' is not defined

======================================================================
ERROR: Runs fastq_screen tests against synthetically generated fastq files folder
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/bubo/home/h25/guilc/facs/tests/test_fastqscreen.py", line 72, in test_2_run_fastq_screen
    cfg = open(os.path.join(self.progs, "fastq_screen.conf"), 'rU')
IOError: [Errno 2] No such file or directory: '/bubo/home/h25/guilc/facs/tests/data/bin/fastq_screen.conf'

----------------------------------------------------------------------

omp pragmas ignored during compilation

During compilation (using the current GitHub code on our cluster) all the omp pragmas are ignored, so that the final binary runs single-threaded.

e.g.:

big_query.c:159: warning: ignoring #pragma omp parallel
big_query.c:161: warning: ignoring #pragma omp single
big_query.c:165: warning: ignoring #pragma omp task

I've got openmp installed (/usr/lib/libgomp.so.1 is present). Can you suggest what else might be causing this/what I can do to fix it?

Many thanks

Support for paired end reads

The same way fastq_screen (bowtie -1 -2...) supports. It is not efficient to read the two files sequentially (invoking FACS twice), it should be done at once in one single command.

Thanks @arvestad, @henrikstranneheim and @vals for the input.

Trouble installing: omp.h missing

I try to download and install using the recommended command
git clone https://github.com/SciLifeLab/facs && cd facs && make -j8
but it turns out "omp.h" is missing. What do I need to install or set for OpenMP to be found?

facs classify -h

wrong header says "./facs remove" should say "./facs classify"
-t tolerance rate, default is 0.0005. See issue #15.
There is a bad address message when running facs classify -h.

Auto tests with human or mouse references?

I dont think so... It is 2 big, and cannot executed on machines with less than 5GB memory

facs remove -o

When using the -o option:
-o /proj/b2012037/private/henrik/Example/Example1

yields:
/proj/b2012037/private/henrik/Example/Example1Example1_Homo_sapiens.GRCh37.70_nochr_clean.fastq

making it look more like "-o" is an out directory command.
When specifying '-o' then the suffix "Bloom_filter.clean" or "Bloom_filter.contaminants" should be added to the specified file name (if such exists). --> Example1_Homo_sapiens_GRCh37.clean & Example1_Homo_sapiens_GRCh37.contaminants. If "-o" path ends in a "/" indicating a directory then the complete entry as in the first paragraph should be used.

crasch when newline as last entry in fq-file

facs remove

Throws seg fault when using Example2 dataset.

Bowtie2 not working properly

As referred in PR 118, apparently bowtie2 is not running in UPPMAX when executing the tests.

It is running by command line though.

facs query -h

Is not updated. Is the only option really:
Options:
-r reference bloom filter to query against

python: returning a JSON document is gone

Somewhere in between those two builds:

https://travis-ci.org/SciLifeLab/facs/builds/13155812#L629
https://travis-ci.org/SciLifeLab/facs/builds/13128914#L629

The python JSON output for facs.query got lost:

$ python
Python 2.7.1 (r271:86832, Feb 14 2011, 14:03:18) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import facs
>>> facs.query('../tests/data/synthetic_fastq/simngs_phiX_100.fastq', '../tests/data/bloom/eschColi_K12.bloom')

''
>>>

@tzcoolman Could you please help me fix this one? It seems that it originated from pull request #75 while introducing changes for MPI someone slipped under my radar :-/

It is probably related to the new handling of the report() function in that PR (https://github.com/SciLifeLab/facs/pull/75/files#diff-f011c1f4c85573e7323c6a9d9dc721e8L226), but I am not sure...

This direcly affects the benchmarks since the JSON results are not being reported, so we need this fixed asap.

Thanks Enze!

python: Bad address when inverting arguments for facs.query

An infinite error message is displayed:

Problem reading Bloom filter: Bad address
Problem reading Bloom filter: Bad address
Problem reading Bloom filter: Bad address
(...)

When accidentally inverting the arguments for facs query, i.e:

facs.query('../tests/data/bloom/eschColi_K12.bloom',
           '../tests/data/synthetic_fastq/simngs_phiX_100.fastq')

Instead of:

facs.query('../tests/data/synthetic_fastq/simngs_phiX_100.fastq',
           '../tests/data/bloom/eschColi_K12.bloom')

refactor simNGS to generate dm3 and spiked ecoli read

Generate predictable datasets that can be used for benchmarking.

FACS not compiling in OSX

Just reproduce with make python on OSX 10.8.4. If FACS is about to officially support OSX, we should integrate the testing in OSX environment in Travis-CI.

Giving a sequence to classify (query) rather than a file with sequences

In general I'm surprised that most sequence aligner/mappers/searchers can only take as input a file (e.g. a fastq file). Sometimes I want to on demand classify some particular sequence.

For me, it would be ideal if in the python interface you could in addition to doing

facs.query("contaminated_sample.fastq.gz", "ecoli.bloom")

You could also do something along the lines of

facs.query(seq="ATACGTTACATAACATTGAAACTGGAGGGGGAAAGAAAACCAAAAGACCAGCTTGTTCCTTCACATGGCAC", "ecoli.bloom")

and it would return True or False.

stdout or saving into the file

@brainstorm
@arvestad

Shall we let the user decide whether contam or clean sequence will be printed or we directly save both contam and clean reads into files, like we did before?

Merge remove and remove_l in the same codebase (remove.c & remove.h)

And fix compiler warnings:

cc -c   simple_remove.c -o simple_remove.o -O3 -DFIFO -D_FILE_OFFSET_BITS=64 -D_LARGE_FILE -Wall -fopenmp -g -DNODEBUG -lm -lz
cc -c   simple_remove_l.c -o simple_remove_l.o -O3 -DFIFO -D_FILE_OFFSET_BITS=64 -D_LARGE_FILE -Wall -fopenmp -g -DNODEBUG -lm -lz
simple_remove_l.c: In function 'fasta_process_ml':
simple_remove_l.c:225: warning: unused variable 'sign'
simple_remove_l.c: In function 'save_result_ml':
simple_remove_l.c:289: warning: suggest parentheses around '&&' within '||'
simple_remove_l.c:289: warning: value computed is not used
simple_remove_l.c:292: warning: suggest parentheses around '&&' within '||'
simple_remove_l.c:292: warning: value computed is not used
simple_remove_l.c: In function 'all_save':
simple_remove_l.c:427: warning: implicit declaration of function 'save_result'

Templatize deconseq configuration

Using Jinja/Mako or the like:

https://github.com/SciLifeLab/facs/pull/76/files#diff-5373e37fece1c8f8f30e196037f1dd2dR158

Transparancy of output

Currently if -o is not specified the output will be located in the same path as the binary file. I think it is bad to clutter the binary directory. I think we should use the current directory of where the script was called from.
There is no notation on what the output file will be called when using the default. This is not obvious and should be recorded in the help text. The name should also reflect which modules was used e.g. Testing1_check.info, Testing1_query.info.
There are few numbers in the outputfile and no documentation of what the numbers stand for. It would be great with a tab-sep file with pedagogical annotations to each number so that the user knows what each number stands for and in format that is easily parsable.
The output from FACS2.0 should also include which dataset was used in the analysis and it should be easy to see and parse which statistics belong to which bloom filter if several was used in the same analysis.

facs remove -h

Header says "---contamination remove---". Should say "---facs remove---".
Tolerant rate is perhaps not ideal. Maybe threshold value is a better name.
(The program will automatically select a value if you don't provide any.) I would prefer: The program will automatically estimate a proper threshold value from the reference size and K-mer length.
-l a list containing all bloom files. --> - l a list containing all Bloom filter names in....and then what the list format should be.
-r reference bloom filter file or dir --> -r Bloom filter file or directory
!!! either -r or -l can only be allowed each time !!!. See build issue!

Make sure number of threads are the same in tests

Add threads attribute to FACS JSON output with underlying code:

http://scv.bu.edu/~kadin/Tutorials/Alliance/OpenMP/omp_set_num_threads.html

Check/set the environment variable OMP_NUM_THREADS accordingly before running runtests on FACS.

The threads attribute is already present in fastq_screen

test_simnsg.py fails if executed alone

Hej,

This test fails if executed alone before any other test has been executed:


Generates a synthetic library and runs with built-in simNGS runfile ... ERROR

======================================================================
ERROR: Generates a synthetic library and runs with built-in simNGS runfile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/guillem/facs/tests/test_simngs.py", line 57, in test_2_run_simNGS
    p1 = subprocess.Popen(cl1, stdout=subprocess.PIPE)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1228, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

----------------------------------------------------------------------

The problem is that the folder tests/data has not been created.

Thanks!

License

I have not found any license information. Am I missing something?

Add plots for the original FACS 1.0 dataset

We should get some metrics (runtime/accuracy) from the old dataset (the one used in the original paper) with the current version of FACS.

Preferrably in an automated/reproducible way. Manually if it is too cumbersome.

@tzcoolman, @henrikstranneheim, can you take care of that?

Automate Deconseq runs via testsuite

Use the structure present to generate a Deconseq test integrated with the general testsuite:

http://deconseq.sourceforge.net/

facs remove Example3

*** glibc detected *** ./facs: free(): invalid pointer: 0x00002abf680db010 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3fa6875916]
./facs[0x402c52]
./facs[0x407dc8]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3fa681ecdd]
./facs[0x4016d9]
======= Memory map: ========
00400000-0040c000 r-xp 00000000 00:15 3107197168 /lynx/cvol/v26/b2012037/private/facs/facs/facs
0060b000-0060c000 rw-p 0000b000 00:15 3107197168 /lynx/cvol/v26/b2012037/private/facs/facs/facs
00981000-0132c000 rw-p 00000000 00:00 0 [heap]
38ad000000-38ad015000 r-xp 00000000 08:03 416403 /lib64/libz.so.1.2.3
38ad015000-38ad214000 ---p 00015000 08:03 416403 /lib64/libz.so.1.2.3
38ad214000-38ad215000 r--p 00014000 08:03 416403 /lib64/libz.so.1.2.3
38ad215000-38ad216000 rw-p 00015000 08:03 416403 /lib64/libz.so.1.2.3
3fa6400000-3fa6420000 r-xp 00000000 08:03 394073 /lib64/ld-2.12.so
3fa661f000-3fa6620000 r--p 0001f000 08:03 394073 /lib64/ld-2.12.so
3fa6620000-3fa6621000 rw-p 00020000 08:03 394073 /lib64/ld-2.12.so
3fa6621000-3fa6622000 rw-p 00000000 00:00 0
3fa6800000-3fa6989000 r-xp 00000000 08:03 394074 /lib64/libc-2.12.so
3fa6989000-3fa6b89000 ---p 00189000 08:03 394074 /lib64/libc-2.12.so
3fa6b89000-3fa6b8d000 r--p 00189000 08:03 394074 /lib64/libc-2.12.so
3fa6b8d000-3fa6b8e000 rw-p 0018d000 08:03 394074 /lib64/libc-2.12.so
3fa6b8e000-3fa6b93000 rw-p 00000000 00:00 0
3fa6c00000-3fa6c83000 r-xp 00000000 08:03 412474 /lib64/libm-2.12.so
3fa6c83000-3fa6e82000 ---p 00083000 08:03 412474 /lib64/libm-2.12.so
3fa6e82000-3fa6e83000 r--p 00082000 08:03 412474 /lib64/libm-2.12.so
3fa6e83000-3fa6e84000 rw-p 00083000 08:03 412474 /lib64/libm-2.12.so
3fa7400000-3fa7417000 r-xp 00000000 08:03 394568 /lib64/libpthread-2.12.so
3fa7417000-3fa7617000 ---p 00017000 08:03 394568 /lib64/libpthread-2.12.so
3fa7617000-3fa7618000 r--p 00017000 08:03 394568 /lib64/libpthread-2.12.so
3fa7618000-3fa7619000 rw-p 00018000 08:03 394568 /lib64/libpthread-2.12.so
3fa7619000-3fa761d000 rw-p 00000000 00:00 0
3fa7c00000-3fa7c07000 r-xp 00000000 08:03 394947 /lib64/librt-2.12.so
3fa7c07000-3fa7e06000 ---p 00007000 08:03 394947 /lib64/librt-2.12.so
3fa7e06000-3fa7e07000 r--p 00006000 08:03 394947 /lib64/librt-2.12.so
3fa7e07000-3fa7e08000 rw-p 00007000 08:03 394947 /lib64/librt-2.12.so
3fa8400000-3fa840d000 r-xp 00000000 08:03 264890 /usr/lib64/libgomp.so.1.0.0
3fa840d000-3fa860c000 ---p 0000d000 08:03 264890 /usr/lib64/libgomp.so.1.0.0
3fa860c000-3fa860d000 rw-p 0000c000 08:03 264890 /usr/lib64/libgomp.so.1.0.0
3fa9000000-3fa9016000 r-xp 00000000 08:03 412532 /lib64/libgcc_s-4.4.6-20120305.so.1
3fa9016000-3fa9215000 ---p 00016000 08:03 412532 /lib64/libgcc_s-4.4.6-20120305.so.1
3fa9215000-3fa9216000 rw-p 00015000 08:03 412532 /lib64/libgcc_s-4.4.6-20120305.so.1
2abf07949000-2abf0794a000 rw-p 00000000 00:00 0
2abf07968000-2abf0796d000 rw-p 00000000 00:00 0
2abf0796d000-2abf27be7000 r--s 00000000 00:15 2726700565 /lynx/cvol/v1/b2012094/Innocentive/data/Example/Example3.fq
2abf27be7000-2ac0bac37000 rw-p 00000000 00:00 0
2ac0bac37000-2ac0bac38000 ---p 00000000 00:00 0
2ac0bac38000-2ac0bae38000 rw-p 00000000 00:00 0
2ac0bae38000-2ac0bae39000 ---p 00000000 00:00 0
2ac0bae39000-2ac0bb039000 rw-p 00000000 00:00 0
2ac0bb039000-2ac0bb03a000 ---p 00000000 00:00 0
2ac0bb03a000-2ac0bb23a000 rw-p 00000000 00:00 0
2ac0bb23a000-2ac0bb23b000 ---p 00000000 00:00 0
2ac0bb23b000-2ac0bb43b000 rw-p 00000000 00:00 0
2ac0bb43b000-2ac0bb43c000 ---p 00000000 00:00 0
2ac0bb43c000-2ac0bb63c000 rw-p 00000000 00:00 0
2ac0bb63c000-2ac0bb63d000 ---p 00000000 00:00 0
2ac0bb63d000-2ac0bb83d000 rw-p 00000000 00:00 0
2ac0bb83d000-2ac0bb83e000 ---p 00000000 00:00 0
2ac0bb83e000-2ac0bba43000 rw-p 00000000 00:00 0
2ac0bc000000-2ac0bcbf4000 rw-p 00000000 00:00 0
2ac0bcbf4000-2ac0c0000000 ---p 00000000 00:00 0
2ac0c4000000-2ac0c4a2b000 rw-p 00000000 00:00 0
2ac0c4a2b000-2ac0c8000000 ---p 00000000 00:00 0
2ac0cc000000-2ac0cca3a000 rw-p 00000000 00:00 0
2ac0cca3a000-2ac0d0000000 ---p 00000000 00:00 0
2ac0d4000000-2ac0d4a48000 rw-p 00000000 00:00 0
2ac0d4a48000-2ac0d8000000 ---p 00000000 00:00 0
2ac0dc000000-2ac0de76c000 rw-p 00000000 00:00 0
2ac0de76c000-2ac0e0000000 ---p 00000000 00:00 0
2ac0e4000000-2ac0e49ae000 rw-p 00000000 00:00 0
2ac0e49ae000-2ac0e8000000 ---p 00000000 00:00 0
2ac0ec000000-2ac0ec9ef000 rw-p 00000000 00:00 0
2ac0ec9ef000-2ac0f0000000 ---p 00000000 00:00 0
7fff76134000-7fff7614a000 rw-p 00000000 00:00 0 [stack]
7fff761ff000-7fff76200000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
/var/spool/slurmd/job3017171/slurm_script: rad 17: 19178 Avbruten (SIGABRT) (minnesutskrift skapad)

simNGS mapping for chromosomes

Different organisms generate reads proportional to the number of chromosomes, we should account for that.

Regression after merging query and remove code

The tests break after the last update, please @tzcoolman, validate that it still works by running the testsuite (make tests) before merging into SciLifeLab.

https://travis-ci.org/SciLifeLab/facs/builds/5291012:

(...)
{
194 "total_read_count": 0,
195 "contaminated_reads": 0,
196 "total_hits": 0,
197 "contamination_rate": -nan,
198 "bloom_filename":"/home/travis/build/SciLifeLab/facs/tests/data/bloom/phiX.bloom"
199}

fastq_screen.py

@brainstorm @guillermo-carrasco
Roman I think in this test file, you added all reference libraries into .conf file. But when you run the test, if you don't specify which reference library should be used, which could cause conflicts in the final results. You sure this is a good idea?

Remove the generation of *.info files

All the relevant information is found already on stdout already and JSON-like formatted output:

{
        "total_read_count": 1804624,
        "contaminated_reads": 4253,
        "total_hits": 2148086,
        "contamination_rate": 0.002357,
        "bloom_file":"facs/tests/data/bloom/Arabidopsis_thaliana_TAIR10.bloom"
}

Not only *.info files do not add valuable information to the one it's already present, but they get overwritten (only last run present) if multiple queries are performed against a particular sample.

FP and FN for Fastq_screen

As far as I know, fastq_screen only provides the proportion of reads match a or more reference genomes. How do we be able to know the FP and FN when using our synthetic dataset?
@brainstorm
@guillermo-carrasco
@arvestad

We are not supporting fasta format anymore?

@brainstorm
@arvestad
@henrikstranneheim
I saw u have merged your changes to scilifelab/facs. And you deleted the fasta_read_check and fasta_full_check. Changing names for some variable is fine. But u sure we dont need both functions any more?

facs remove

I still do not get facs remove to finish within 1 hour on the example1 dataset after updating to the latest version.

FAcs build -h

The header of the output is still Bloom build. It should be facs build.
Filename is inconsistently spelled choose either filename or file name, do not use both in the same document.
!!! either -r or -l can only be allowed each time !!! Would be better to say !!! Use either -r or -l !!!
says K-mer default is 21, but the default is automatically estimated from the reference size is it not? The help text should reflect that. Why is K-mer spelled K_mer?
-e error rate. Should be more specific e.g. Bloom filter false positive frequency and maybe then the flag name should be changed as well.

Merge query.c and check.c into check.c

There is no need to keep two files for the bloom check functionality, it might confuse newcomers.

Further filtering/cleaning and unification of results in iriscouch for the benchmark

Now that the reporting functionality works in single thread for all programs, we should put some effort into normalizing/unifying the data reported to the CouchDB database and plot accordingly via matplotlib or D3.

I have disabled the dummy fastq files generation since what we want is only simNGS-generated reads (i.e: simngs_hostORG_contamORG_numREADS.fastq) for the different species with a single ecoli read spiked in there (for instance).

@guillermo-carrasco, @b97pla. Let me know if this sketching needs some more clarification, I will be happy to fit the code to the needs.

gzip support lacking for "facs remove"

When Daniel Lundin was looking into using facs to screen for rRNA in his dataset, he found that "facs query" supports reading from gzipped files, but not "facs remove". Why is this?

Facs version

FACS version says 0.1 in help text should say 2.0

Python's FACS remove needs to capture stdout/stderr

Right now, it is not handled properly, since the strings returned by facs remove cannot be handled by python:

http://stackoverflow.com/questions/2420317/why-does-my-hello-world-python-c-module-work-correctly-in-everything-but-idle

void writeout(const char* nullterminated)
{
    PyObject* sysmod = PyImport_ImportModuleNoBlock("sys");
    PyObject* pystdout = PyObject_GetAttrString(sysmod, "stdout");
    PyObject* result = PyObject_CallMethod(pystdout, "write", "s", nullterminated);
    Py_XDECREF(result);
    Py_XDECREF(pystdout);
    Py_XDECREF(sysmod);    
 }

Command line options

Trying out facs query for the first time, I was surprised about getting an error message for

./facs query -b hg19.bloom -q /proj/b2012094/Innocentive/data/Example/Example1.fq
query: invalid option -- 'b'

(hg19.bloom is a filter for hg19 which I previously built using facs build) The instructions say:

Options:
-b reference Bloom filter to query against
-q FASTA/FASTQ file containing the query
-l input list containing all Bloom filters, one per line
-r single input Bloom filters
-t threshold value
-s sampling rate, default is 1 so it reads the whole query file

so I thought I should use the -b flag, but apparently I am supposed to use -r. Perhaps the command line option instructions could be clarified a bit?

Deconseq does not report filter and returns incorrect contamination rate

Contamination rate should never be >100...

{
   "_id": "e206960e0662df946a20e43b4f000c36",
   "_rev": "1-15137438404e50191513c6ee194c92e4",
   "sample": "tests/data/synthetic_fastq/simngs_phiX_1000000.fastq",
   "contamination_rate": 1301,
   "start_timestamp": "2013-12-02 12:08:17.599624Z",
   "total_reads": 500000,
   "end_timestamp": "2013-12-02 12:15:26.696080Z"
}

python: do not bail out when file is not found

After a file not being found, the python interface should ideally return to the python interpreter:

>>> facs.query("jarl", "karl")
karl: No such file or directory
>>>

Instead of exiting to the underlying shell as it does now:

>>> facs.query("jarl", "karl")
karl: No such file or directory
$