Git Product home page Git Product logo

pheniqs's Introduction

Pheniqs

Pheniqs is a flexible generic barcode classifier for high-throughput next-gen sequencing that caters to a wide variety of experimental designs and has been designed for efficient data processing.

Qestions? lior [dot] galanti [ at sign ] nyu.edu or just open a ticket.

Citing Pheniqs: Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing

Please visit the Pheniqs website for more information. You might also want to check the intro talk given by Lior Galanti on April 29, 2021 for the NYU gencore.

Powerful and intuitive syntax

  • Classifies standard barcode types: Sample, Cellular, and Molecular Index
  • Directly writes barcodes to standard or custom BAM fields
  • Addresses index tags in arbitrary locations along reads
  • Easily accommodates custom barcode types, eliminating the need for pre- or post-processing
  • Easily handles any number of combinatorial barcode tags

Noise and quality aware probabilistic classifier

  • Increased accuracy over standard edit distance methods
  • Reports classification error probabilities in SAM auxiliary tags
  • Modular design allows addition of new classifiers

Robust engineering

  • Multithreaded C++ implementation optimized for speed
  • POSIX standard stream integration
  • Directly interfaces with low level HTSLib C API
  • Performance scales linearly with the number of available processing cores

Easy to install or build

  • Stable releases available from Bioconda
  • Custom package manager can build dependencies and binaries from scratch
  • Easily installed on clusters or cloud without elevated permissions
  • Portable compiled binaries available
  • Available in a Docker container

Easy to use

  • Simple command line syntax with autocomplete
  • Reusable, inheritence enabled, JSON encoded configuration
  • Preconfigured barcode library sets
  • Reads and writes multiple file formats: FASTQ, SAM/BAM/CRAM
  • Fast standalone file format interconversion
  • Helper scripts to assist in configuration file bootstraping
  • Facilitates more robust and reproducible downstream analysis

Pheniqs runs on all modern POSIX systems and provides an easy to learn command line interface with autocomplete and an extensible reusable configuration syntax. Pheniqs is an ideal utility to pre- and post-process sequence reads for other bioinformatics tools, and it may also be used simply to rapidly and efficiently interconvert a variety of standard sequence file formats without invoking any of its barcode processing features.

For more advanced users and sequencing core managers, we provide detailed build instructions and a custom package manager to easily build portable, statically linked, Pheniqs binaries for deployment on computing clusters. Developers can find code examples and API documention that enable them to expand Pheniqs with new classification algorithms and take advantage of the optimized multithreaded pipeline.

Pheniqs is open sourced and free for academic use under the terms of the NYU license agreement.

Build Status

pheniqs's People

Contributors

jerowe avatar kriscgun avatar moonwatcher avatar twaddlac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pheniqs's Issues

demultiplexing by multiple barcode positions

Hello!

Pheniqs is amazing piece of software and seems to be just the right tool for custom demultiplexing of Illumina runs.
But I have trouble understanding configuration options and/or possibilities - especially regarding output.

I would like to demultiplex fastq file based on barcodes in multiple positions (combinatorial barcoding) and write output to separate files for each barcode combination. White it seems I can achieve this with modification of following configuration, I can't seem to find option to write output to separate files based on barcode combinations and not segments:

{
    "PL": "ILLUMINA",
    "cellular": [
        {
            "algorithm": "pamld",
            "base": "SPLiT-seq 96",
            "comment": "First round ",
            "confidence threshold": 0.99,
            "noise": 0.05,
            "transform": {
                "token": [
                    "0::8"
                ]
            }
        },
        {
            "algorithm": "pamld",
            "base": "SPLiT-seq 96",
            "comment": "Second round",
            "confidence threshold": 0.99,
            "noise": 0.05,
            "transform": {
                "token": [
                    "0:12:20"
                ]
            }
        },
    ],
    "import": [
        "splitseq_core_barcodes.json"
    ],
    "molecular": [
        {
            "transform": {
                "token": [
                    "0::"
                ]
            }
        }
    ],
    "template": {
        "transform": {
            "token": [
                "0::40"
            ]
        }
    }
}

I would appreciate any help regarding demultiplexing output configuration options or stategies.

Thanks a lot for your help,
Simon

Trouble replicating basic behavior

Hi team. This looks like a really useful tool. Unfortunately, I'm having a hard time replicating even basic behaviors of pheniqs. Hopefully this is just due to my own misunderstanding. But as one of the simplest examples, here is one configuration file I've tried using

{
    "input": [
        "I1_mini.fastq.gz",
        "R1_mini.fastq.gz",
        "R2_mini.fastq.gz"
    ],
    "template": {
        "transform": { "token": [ "0::" ] }
    }
}

From this very basic test, I would expect to only see reads from I1_mini.fastq.gz in the output sam format (essentially a small adaptations to Example 1.4) . However, here are the first few lines I get:

@HD     VN:1.0  SO:unknown      GO:query
@PG     ID:pheniqs      PN:pheniqs      CL:pheniqs mux --config pheniqs_test_config.json        VN:2.0.4
@RG     ID:undetermined PU:undetermined
A01125:65:HLGNKDRXX:1:1101:1090:1000    77      *       0       0       *       *       0       0       CATGCGAT        FFFFFFFF        FI:i:1  TC:i:3
A01125:65:HLGNKDRXX:1:1101:1090:1000    13      *       0       0       *       *       0       0       CTTAACTTCGGTGTCGGCCCGTAATC      FFFFFFFFFFFFFFFFFFFFFFFFFF      FI:i:2      TC:i:3
A01125:65:HLGNKDRXX:1:1101:1090:1000    141     *       0       0       *       *       0       0       ATGTTTGGGTTCATTTTTCTTTGCATAATCCAGGGAATCATAAATCATGCCAAAGCCAGTTGTCTTGCCACCACCAAAATGAGTTCTGAAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF     FI:i:3  TC:i:3

As you can see, reads from all 3 input fastq files are present. Can you point out what I am missing?

Thanks!

Citing Pheniqs

Hello,

Do you have a preferred way of citing Pheniqs? (sorry if I missed it somewhere on the website)

Thanks :)

Jo

error while installing pheniqs under centos 6 using ppkg.py

Tried to install pheniqs under centos 6 using the ppkg.py script but it exited with a CRITICAL ERROR,

`./tool/ppkg.py build build/trunk_static.json
DEBUG:Pipeline:loading /home/nd48/pheniqs-master/build/trunk_static.json
DEBUG:Pipeline:creating directory /home/nd48/pheniqs-master/bin/trunk_static
DEBUG:Pipeline:creating directory /home/nd48/pheniqs-master/bin/trunk_static/install
DEBUG:Pipeline:creating directory /home/nd48/pheniqs-master/bin/trunk_static/package
INFO:Package:unpacking zlib 1.2.11
INFO:Package:configuring make environment zlib 1.2.11
DEBUG:Package:./configure --prefix=/home/nd48/pheniqs-master/bin/trunk_static/install
INFO:Package:building with make zlib 1.2.11
DEBUG:Package:make
INFO:Package:installing with make zlib 1.2.11
DEBUG:Package:make install
INFO:Package:unpacking bz2 1.0.6
INFO:Package:building bz2 1.0.6 dynamic library
DEBUG:Package:make --file Makefile-libbz2_so PREFIX=/home/nd48/pheniqs-master/bin/trunk_static/install
DEBUG:Package:copying /home/nd48/pheniqs-master/bin/trunk_static/package/bzip2-1.0.6/libbz2.so.1.0.6 to /home/nd48/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1.0.6
INFO:Package:symlinking libbz2.so.1.0.6 to /home/nd48/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1.0
INFO:Package:symlinking libbz2.so.1.0.6 to /home/nd48/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1
INFO:Package:building with make bz2 1.0.6
DEBUG:Package:make PREFIX=/home/nd48/pheniqs-master/bin/trunk_static/install
INFO:Package:installing with make bz2 1.0.6
DEBUG:Package:make install PREFIX=/home/nd48/pheniqs-master/bin/trunk_static/install
DEBUG:Package:fetching https://tukaani.org/xz/xz-5.2.4.tar.bz2
INFO:Package:downloaded archive saved xz 5.2.4 50ad451279404fb5206e23c7b1ba9c4aa858c994
INFO:Package:unpacking xz 5.2.4
INFO:Package:configuring make environment xz 5.2.4
DEBUG:Package:./configure --prefix=/home/nd48/pheniqs-master/bin/trunk_static/install --enable-static
INFO:Package:building with make xz 5.2.4
DEBUG:Package:make
INFO:Package:installing with make xz 5.2.4
DEBUG:Package:make install
DEBUG:Package:fetching https://github.com/ebiggers/libdeflate/archive/v1.0.tar.gz
INFO:Package:downloaded archive saved libdeflate 1.0 17da81b2a058906f087e797fc69399c606a2c011
INFO:Package:unpacking libdeflate 1.0
INFO:Package:building with make libdeflate 1.0
DEBUG:Package:make CC=gcc
DEBUG:Package:copying /home/nd48/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.a to /home/nd48/pheniqs-master/bin/trunk_static/install/lib
DEBUG:Package:copying /home/nd48/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.h to /home/nd48/pheniqs-master/bin/trunk_static/install/include
DEBUG:Package:copying /home/nd48/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.so to /home/nd48/pheniqs-master/bin/trunk_static/install/lib
DEBUG:Package:fetching https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2
INFO:Package:downloaded archive saved htslib 1.9 21be5187203df30637dda2e1133cae2e833ef050
INFO:Package:unpacking htslib 1.9
INFO:Package:configuring make environment htslib 1.9
DEBUG:Package:./configure --prefix=/home/nd48/pheniqs-master/bin/trunk_static/install --disable-libcurl
INFO:Package:building with make htslib 1.9
DEBUG:Package:make
INFO:Package:installing with make htslib 1.9
DEBUG:Package:make install
DEBUG:Package:fetching https://github.com/miloyip/rapidjson/archive/v1.1.0.tar.gz
INFO:Package:downloaded archive saved rapidjson 1.1.0 a3e0d043ad3c2d7638ffefa3beb30a77c71c869f
INFO:Package:unpacking rapidjson 1.1.0
DEBUG:Package:copying rapidjson 1.1.0 header files to /home/nd48/pheniqs-master/bin/trunk_static/install/include
DEBUG:Package:fetching https://codeload.github.com/biosails/pheniqs/zip/HEAD
INFO:Package:downloaded archive saved pheniqs 2.0-trunk None
INFO:Package:unpacking pheniqs 2.0-trunk
INFO:Package:building with make pheniqs 2.0-trunk
DEBUG:Package:make PREFIX=/home/nd48/pheniqs-master/bin/trunk_static/install with-static=1 PHENIQS_ZLIB_VERSION=1.2.11 PHENIQS_BZIP2_VERSION=1.0.6 PHENIQS_XZ_VERSION=5.2.4 PHENIQS_LIBDEFLATE_VERSION=1.0 PHENIQS_HTSLIB_VERSION=1.9 PHENIQS_RAPIDJSON_VERSION=1.1.0
2
version.h generated with PHENIQS_VERSION 2.0.3
configuration.h command line interface configuration generated.
zsh completion _pheniqs generated.
g++ -Wall -Wsign-compare -I/home/nd48/pheniqs-master/bin/trunk_static/install/include -std=c++11 -O3 -c -o json.o json.cpp

cc1plus: error: unrecognized command line option "-std=c++11"
make: *** [json.o] Error 1

CRITICAL:main:make returned 2`

Below is the error log,
more bin/trunk_static/error configure: WARNING: GCS support not enabled: requires libcurl support configure: WARNING: S3 support not enabled: requires libcurl support

Compile on linux?

Pinging @nizardrou

The makefile specifically references clang, which is an osx only thing. Can I do a dropin replacement for gcc?

I will try somethings from my side, but what I would really like to see from your side is a minimal docker image with pheniq installed.

demultiplexing based on primer

Hi,

I just discovered the tool and used it for de multiplexing of an Illumina MiSeq run, it seemed to work well! (only 6% of reads with unclear barcodes for 91 pooled samples).

On the same run however, I also pooled to differnt primer pairs. The Primer sequences are part of the biological read.

I was wondering if pheniqs could be used for that? I use cutadapt but the PHRED error model would be a nicer way than just allowing for a certain number of mismatches.

edit: for primer search, IUPAC wildcard characters would ideally need to be supported

--help bug

I just updated my pheniqs install and have noticed a strange bug. When i run
pheniqs mux --help
the normal help is displayed. However, if I do it a second time the help does not display and the terminal seems to be stuck in a loop of printing blank lines until I force quit. This continues every subsequent time I try to pull up the help. This first happened to me in gnome-terminal but I get the same behavior using xterm (e.g. it only worked the first time). The bug does not seem to affect the normal functioning of pheniqs (e.g. I can still run a demultiplexing command and it works properly). The bug persisted through multiple uninstall / reinstall attempts. I am not experiencing this behavior with any other command line programs. I installed using miniconda and I am running Linux mint 18.1.

IO error

Hi, I am new to pheniqs and receiving the following error when trying to demultiplex Illumina reads with i5-i7 pairings on my campus computer cluster. I am getting all the .fastq.gz files I would like as output, but am unable to unzip them. I am not sure if that is due to the error. Thank you for any help you can provide.

IO error : failed to open ./ST2_8_07_H.R1.fastq.gz for writing with error code 24
Full Command: pheniqs mux --config pheniqs_ex.json
Memory (kb): 769164
# SWAP (freq): 0
# Waits (freq): 4063
CPU (percent): 36%
Time (seconds): 59.77
Time (hh:mm:ss.ms): 0:59.77
System CPU Time (seconds): 5.47
User CPU Time (seconds): 16.24

[Linux@vaughan sge.pheniqs.ex]$ pheniqs --version
pheniqs version git-HEAD
zlib 1.2.11
bzlib 1.0.8
xzlib 5.2.5
libdeflate 1.6
rapidjson 1.1.0
htslib 1.11

json file attached
pheniqs_ex.json.txt

output knitted and corrected barcodes to fastq

Hi @moonwatcher,

This tool is super helpful as I am trying to implement a new single cell protocol called scifi-RNA-seq. I have been able to get the desired barcode manipulations and correction to output to an organized bam. Unfortunately, My immediate downstream step requires fastq input so I need to either have pheniqs output fastqs with the corrected barcodes and RNA insert in one run, or set up a second run after generating a corrected BAM that quickly takes the BAM and outputs fastqs.

I have been trying to get the first option to work, but I can only seem to address the input tokens and create fastqs with the raw barcodes. I'm not sure how I can refer to the corrected barcodes.

For the second option, I don't understand how to access sequences in specific tags from a BAM input file. I saw some tutorials addressing interleaved BAM/CRAM, but my data is not interleaved and is one main read with a bunch of barcode stuff in tags.

I am attaching a small test case below. I use scifi_main.json as the config file for generating the correcttest.bam file. scifi_main_fastq.json was my attempt at outputting FASTQs with correction which resulted in uncorrected output (R1_correcttest.fastq and R2_correcttest.fastq).

Thank you!

scifi_pheniqs.tar.gz

Pheniqs only processes a small fraction of reads

Hi, I am trying trying to demux a PE run using inline barcodes on read 2:

{
    "input": [
        "LS01-2018-09-24-LS-UA-CAAF-001_R1_001.fastq.gz",
        "LS01-2018-09-24-LS-UA-CAAF-001_R2_001.fastq.gz"
    ],
    "transform": { 
        "token": [ "0::", "1:14:" ],
        "segment pattern": [ "0", "1" ]
    },
    "multiplex": {
        "transform": { "token": [ "1::12" ] },
        "codec": {
            "@TATGAACGTCCG": { "barcode": [ "TATGAACGTCCG" ], "output": [ "BC4CAAF1-10m-20180912-1120_L001_R1_001.fastq.gz", "BC4CAAF1-10m-20180912-1120_L001_R2_001.fastq.gz" ] },
            "@CCACATTGGGTC": { "barcode": [ "CCACATTGGGTC" ], "output": [ "BC1CAAF1-200m-20180912-0920_L001_R1_001.fastq.gz", "BC1CAAF1-200m-20180912-0920_L001_R2_001.fastq.gz" ] },
            "@TCAGTCAGATGA": { "barcode": [ "TCAGTCAGATGA" ], "output": [ "BC4CAAF1-10m-20180912-1015_L001_R1_001.fastq.gz", "BC4CAAF1-10m-20180912-1015_L001_R2_001.fastq.gz" ] },
            "@AAGTCACACACA": { "barcode": [ "AAGTCACACACA" ], "output": [ "BC1CAAF1-200m-20180912-1020_L001_R1_001.fastq.gz", "BC1CAAF1-200m-20180912-1020_L001_R2_001.fastq.gz" ] },
            "@GCTGTGATTCGA": { "barcode": [ "GCTGTGATTCGA" ], "output": [ "BC2CAAF2-200m-20180912-0920_L001_R1_001.fastq.gz", "BC2CAAF2-200m-20180912-0920_L001_R2_001.fastq.gz" ] }
        },
        "undetermined": { "output": [ "LS01-undetermined_L001_R1_001.fastq.gz", "LS01-undetermined_L001_R2_001.fastq.gz" ] },
        "algorithm": "mdd",
        "include filtered": true
    }
}

However, I always get back only 116 reads, despite the fact that the two input files have thousands of reads. I have tried with PAML too, same results.

{
    "multiplex": {
        "average classified distance": 0.219298245614035,
        "average pf classified distance": 0.219298245614035,
        "classified": [
            {
                "ID": "AAGTCACACACA",
                "PU": "AAGTCACACACA",
                "average distance": 0.461538461538461,
                "average pf distance": 0.461538461538461,
                "barcode": [
                    "AAGTCACACACA"
                ],
                "concentration": 0.198,
                "count": 13,
                "index": 1,
                "pf count": 13,
                "pf fraction": 1.0,
                "pf pooled classified fraction": 0.114035087719298,
                "pf pooled fraction": 0.112068965517241,
                "pooled classified fraction": 0.114035087719298,
                "pooled fraction": 0.112068965517241
            },
            {
                "ID": "CCACATTGGGTC",
                "PU": "CCACATTGGGTC",
                "average distance": 0.071428571428571,
                "average pf distance": 0.071428571428571,
                "barcode": [
                    "CCACATTGGGTC"
                ],
                "concentration": 0.198,
                "count": 28,
                "index": 2,
                "pf count": 28,
                "pf fraction": 1.0,
                "pf pooled classified fraction": 0.245614035087719,
                "pf pooled fraction": 0.241379310344827,
                "pooled classified fraction": 0.245614035087719,
                "pooled fraction": 0.241379310344827
            },
            {
                "ID": "GCTGTGATTCGA",
                "PU": "GCTGTGATTCGA",
                "average distance": 0.260869565217391,
                "average pf distance": 0.260869565217391,
                "barcode": [
                    "GCTGTGATTCGA"
                ],
                "concentration": 0.198,
                "count": 46,
                "index": 3,
                "pf count": 46,
                "pf fraction": 1.0,
                "pf pooled classified fraction": 0.403508771929824,
                "pf pooled fraction": 0.396551724137931,
                "pooled classified fraction": 0.403508771929824,
                "pooled fraction": 0.396551724137931
            },
            {
                "ID": "TATGAACGTCCG",
                "PU": "TATGAACGTCCG",
                "average distance": 1.0,
                "average pf distance": 1.0,
                "barcode": [
                    "TATGAACGTCCG"
                ],
                "concentration": 0.198,
                "count": 1,
                "index": 4,
                "pf count": 1,
                "pf fraction": 1.0,
                "pf pooled classified fraction": 0.008771929824561,
                "pf pooled fraction": 0.008620689655172,
                "pooled classified fraction": 0.008771929824561,
                "pooled fraction": 0.008620689655172
            },
            {
                "ID": "TCAGTCAGATGA",
                "PU": "TCAGTCAGATGA",
                "average distance": 0.153846153846153,
                "average pf distance": 0.153846153846153,
                "barcode": [
                    "TCAGTCAGATGA"
                ],
                "concentration": 0.198,
                "count": 26,
                "index": 5,
                "pf count": 26,
                "pf fraction": 1.0,
                "pf pooled classified fraction": 0.228070175438596,
                "pf pooled fraction": 0.224137931034482,
                "pooled classified fraction": 0.228070175438596,
                "pooled fraction": 0.224137931034482
            }
        ],
        "classified count": 114,
        "classified fraction": 0.982758620689655,
        "classified pf fraction": 1.0,
        "count": 116,
        "pf classified count": 114,
        "pf classified fraction": 0.982758620689655,
        "pf count": 116,
        "pf fraction": 1.0,
        "unclassified": {
            "ID": "undetermined",
            "PU": "undetermined",
            "count": 2,
            "index": 0,
            "pf count": 2,
            "pf fraction": 1.0,
            "pf pooled classified fraction": 0.017543859649122,
            "pf pooled fraction": 0.017241379310344,
            "pooled classified fraction": 0.017543859649122,
            "pooled fraction": 0.017241379310344
        }
    }
}

Validate failed

My JSON configuration isn't being parsed, and the output doesn't provide information for figuring out what's wrong. I've tried to model it as best I could on the website example to get it to work. the python JSON validation shows no errors.
My setting is the following: I have two fastq files from illumina paired end sequencing. Read 1 contains the sequenced DNA, and read 2 contains three ranges of multiplex barcodes, and one range of molecular barcode.
Any help with figuring out what I'm doing wrong would be great!

The config file (changed extension to txt to upload):
pheniqsConf.txt

Pheniqs on very large barcode spaces

Hello,

Thank you for developing and maintaining Pheniqs, I'm excited to try it in a number of data processing workflows. Would you be able to provide some pointers on how/if Pheniqs's PAMLD decoder could be used for very large barcoding spaces, of 2-3 Billion barcodes. I'm working with a dataset with an 18 base barcode with a small amount of non-permitted sequences (no homopolymers longer than 5 bases). This is more akin to a random UMI sequence than the smaller whitelists of barcodes used in the use examples in this repo. The Pheniqs2.0 paper describes the intractability of very large initial whitelists, and mentions that other strategies should be considered for an initial first pass through the data to reduce the barcode space. Are there any approaches or tools which you have found to work well for these scenarios? Let me know if I can provide any more information about my particular use case.

Thanks,
Derek

Desirable future features

  • BCL input requires either sourcing code from bcl2fastq or reimplementing the specs if license is an issue
  • Aligned input passing through alignment fields is very tricky when tokenizing the read because CIGAR string will need to be adjusted and recalulating alignment score will depend on the aligner's logic. It can however be easy when the segment is not manipulated.

Quadruple indexing, variable index length

Hi,

We are running a variant of the Adapterama II protocol (http://doi.org/10.7717/peerj.7786) which uses a quadruple index approach (two truseq indexes, two custom indexes). The custom indexes have different lengths to increase the sequence diversity at each base position in the amplicon pools.

I am pretty sure I could adapt the tutorial for the quadruple approach if the custom indexes were the same length. However, I am unsure how to do this with variable custom index lengths.

Any advice/help on how to do this would be appreciated! Thanks!

-shane

Last record missing in barcode corrected BAM file

Hi developers,

I encountered a mysterious issue when using Pheniqs for barcode correction.

Basically, when performing the correction, I first prepare a CRAM file containing barcodes to be corrected and a json file according to the tutorial, then run Pheniqs. However, the resulting BAM files is one record less than the CRAM, and the missing one seems to be always the last record in the CRAM.

This happens only occasionally. For example, I run the same code for 6 times and capture one (bam3):

$ samtools view -c pbmc_500_10p_2_aa.corrected.bam
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam1
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam2
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam3
449999
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam4
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam5
450000

Not a big matter, but still annoying, I can provide more info if you would like to look into it.

Thanks,

--Kai

Tutorial info not correct?

Hi there,

I tried to follow the tutorial here: https://biosails.github.io/pheniqs/illumina_vignette

The following snippet won't pass the config validation:

"undetermined": {
        "output": [
            "H7LT2DSXX_l01_undetermined_s01.fastq.gz",
            "H7LT2DSXX_l01_undetermined_s02.fastq.gz"
        ]
    }

The config validator complains that there is no "barcode". So, for "undetermined", how to specify the "barcode" for them?

Thanks,

EOF error

Hi,
I was running Pheniqs the other day on cat'd fastq.gz files from different lanes, great tool! It runs really well until the very end, where it generates an EOF error:

Pheniqs mux --config run_demux.json
[W::bgzf_read_block] [W::bgzf_read_block] [W::bgzf_read_block] [W::bgzf_read_block] EOF marker is absent. The input is probably truncatedEOF marker is absent. The input is probably truncatedEOF marker is absent. The input is probably truncatedEOF marker is absent. The input is probably truncated

{
"multiplex": {
"average classified confidence": 0.999889611317004,
"average pf classified confidence": 0.999889611317004,
"classified": [
{
"ID": "GATCGTGT",
"PU": "GATCGTGT",
"average confidence": 0.999936941876671,
"average pf confidence": 0.999936941876671,
"barcode": [
"GATCGTGT"
],
"concentration": 0.49,
"count": 52793403,
"index": 1,
"low conditional confidence count": 107840070,
"low confidence count": 107,
"pf count": 52793403,
"pf fraction": 1.0,
"pf pooled classified fraction": 0.436594251439055,
"pf pooled fraction": 0.11722381582157,
"pooled classified fraction": 0.436594251439055,
"pooled fraction": 0.11722381582157
},
{
"ID": "AGATATAA",
"PU": "AGATATAA",
"average confidence": 0.999852933930732,
"average pf confidence": 0.999852933930732,
"barcode": [
"AGATATAA"
],
"concentration": 0.49,
"count": 68127573,
"index": 2,
"low conditional confidence count": 221602938,
"low confidence count": 49,
"pf count": 68127573,
"pf fraction": 1.0,
"pf pooled classified fraction": 0.563405748560944,
"pf pooled fraction": 0.151272197204688,
"pooled classified fraction": 0.563405748560944,
"pooled fraction": 0.151272197204688
}
],
"classified count": 120920976,
"classified fraction": 0.268496013026259,
"classified pf fraction": 1.0,
"count": 450364140,
"low conditional confidence count": 329443008,
"low confidence count": 156,
"pf classified count": 120920976,
"pf classified fraction": 0.268496013026259,
"pf count": 450364140,
"pf fraction": 1.0,
"unclassified": {
"ID": "undetermined",
"PU": "undetermined",
"count": 329443164,
"index": 0,
"pf count": 329443164,
"pf fraction": 1.0,
"pf pooled classified fraction": 2.724450090445846,
"pf pooled fraction": 0.73150398697374,
"pooled classified fraction": 2.724450090445846,
"pooled fraction": 0.73150398697374
}
}
}

Would you perhaps be able to tell me how to solve this error? On my side I downloaded the files multiple times to make sure the fasts weren't corrupted.
Thanks!

Install failure with pheniqs-tools (ppkg.py)

Not sure if this is the expected behaviour, but I was running ppkg.py on Dalma (after loading the appropriate build modules) like so,
./tool/ppkg.py build build/trunk_static.json

And this is what was printed on the screen,
DEBUG:Pipeline:loading /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/build/trunk_static.json DEBUG:Pipeline:creating directory /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static DEBUG:Pipeline:creating directory /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install DEBUG:Pipeline:creating directory /home/gencore/.pheniqs/download DEBUG:Pipeline:creating directory /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/package DEBUG:Package:fetching https://zlib.net/zlib-1.2.11.tar.gz INFO:Package:downloaded archive saved zlib 1.2.11 e6d119755acdf9104d7ba236b1242696940ed6dd INFO:Package:unpacking zlib 1.2.11 INFO:Package:configuring make environment zlib 1.2.11 DEBUG:Package:./configure --prefix=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install INFO:Package:building with make zlib 1.2.11 DEBUG:Package:make INFO:Package:installing with make zlib 1.2.11 DEBUG:Package:make install DEBUG:Package:fetching https://fossies.org/linux/misc/bzip2-1.0.6.tar.gz INFO:Package:downloaded archive saved bz2 1.0.6 3f89f861209ce81a6bab1fd1998c0ef311712002 INFO:Package:unpacking bz2 1.0.6 INFO:Package:building bz2 1.0.6 dynamic library DEBUG:Package:make --file Makefile-libbz2_so PREFIX=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install DEBUG:Package:copying /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/package/bzip2-1.0.6/libbz2.so.1.0.6 to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1.0.6 INFO:Package:symlinking libbz2.so.1.0.6 to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1.0 INFO:Package:symlinking libbz2.so.1.0.6 to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/lib/libbz2.so.1 INFO:Package:building with make bz2 1.0.6 DEBUG:Package:make PREFIX=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install INFO:Package:installing with make bz2 1.0.6 DEBUG:Package:make install PREFIX=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install DEBUG:Package:fetching https://tukaani.org/xz/xz-5.2.4.tar.bz2 INFO:Package:downloaded archive saved xz 5.2.4 50ad451279404fb5206e23c7b1ba9c4aa858c994 INFO:Package:unpacking xz 5.2.4 INFO:Package:configuring make environment xz 5.2.4 DEBUG:Package:./configure --prefix=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install --enable-static INFO:Package:building with make xz 5.2.4 DEBUG:Package:make INFO:Package:installing with make xz 5.2.4 DEBUG:Package:make install DEBUG:Package:fetching https://github.com/ebiggers/libdeflate/archive/v1.0.tar.gz INFO:Package:downloaded archive saved libdeflate 1.0 17da81b2a058906f087e797fc69399c606a2c011 INFO:Package:unpacking libdeflate 1.0 INFO:Package:building with make libdeflate 1.0 DEBUG:Package:make CC=gcc DEBUG:Package:copying /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.a to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/lib DEBUG:Package:copying /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.h to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/include DEBUG:Package:copying /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/package/libdeflate-1.0/libdeflate.so to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/lib DEBUG:Package:fetching https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 INFO:Package:downloaded archive saved htslib 1.9 21be5187203df30637dda2e1133cae2e833ef050 INFO:Package:unpacking htslib 1.9 INFO:Package:configuring make environment htslib 1.9 DEBUG:Package:./configure --prefix=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install --disable-libcurl INFO:Package:building with make htslib 1.9 DEBUG:Package:make INFO:Package:installing with make htslib 1.9 DEBUG:Package:make install DEBUG:Package:fetching https://github.com/miloyip/rapidjson/archive/v1.1.0.tar.gz INFO:Package:downloaded archive saved rapidjson 1.1.0 a3e0d043ad3c2d7638ffefa3beb30a77c71c869f INFO:Package:unpacking rapidjson 1.1.0 DEBUG:Package:copying rapidjson 1.1.0 header files to /scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install/include DEBUG:Package:fetching https://codeload.github.com/biosails/pheniqs/zip/HEAD INFO:Package:downloaded archive saved pheniqs 2.0-trunk None INFO:Package:unpacking pheniqs 2.0-trunk INFO:Package:building with make pheniqs 2.0-trunk DEBUG:Package:make PREFIX=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install with-static=1 PHENIQS_ZLIB_VERSION=1.2.11 PHENIQS_BZIP2_VERSION=1.0.6 PHENIQS_XZ_VERSION=5.2.4 PHENIQS_LIBDEFLATE_VERSION=1.0 PHENIQS_HTSLIB_VERSION=1.9 PHENIQS_RAPIDJSON_VERSION=1.1.0 INFO:Package:installing with make pheniqs 2.0-trunk DEBUG:Package:make install PREFIX=/scratch/nd48/Pheniqs/pheniqs_latest/pheniqs-master/bin/trunk_static/install

And here is the error log in the bin folder (bin/trunk_static/error),
In function โ€˜mainSortโ€™: blocksort.c:347:6: warning: inlining failed in call to โ€˜mainGtU.part.0โ€™: call is unlikely and code size would grow [-Winline] Bool mainGtU ( UInt32 i1, ^ cc1: warning: called from here [-Winline] blocksort.c:347:6: warning: inlining failed in call to โ€˜mainGtU.part.0โ€™: call is unlikely and code size would grow [-Winline] Bool mainGtU ( UInt32 i1, ^ cc1: warning: called from here [-Winline] blocksort.c:347:6: warning: inlining failed in call to โ€˜mainGtU.part.0โ€™: call is unlikely and code size would grow [-Winline] Bool mainGtU ( UInt32 i1, ^ cc1: warning: called from here [-Winline] bzip2.c: In function โ€˜testStreamโ€™: bzip2.c:557:37: warning: variable โ€˜nreadโ€™ set but not used [-Wunused-but-set-variable] Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i; ^ bzip2.c: In function โ€˜testStreamโ€™: bzip2.c:557:37: warning: variable โ€˜nreadโ€™ set but not used [-Wunused-but-set-variable] Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i; ^ configure: WARNING: GCS support not enabled: requires libcurl support configure: WARNING: S3 support not enabled: requires libcurl support

Installation problem

I would like to test Pheniqs but I am having trouble compiling the program. When I clone the github directory and attempt to run make I get the following error:

drl@rhombus2 ~/software/pheniqs $ make
Generate command line interface configuration
Generate version.h with 1.0.a2b1725c3a8793fe13dc039efc20188418f94e77
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o constant.o constant.cpp
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o model.o model.cpp
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o feed.o feed.cpp
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o environment.o environment.cpp
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o pipeline.o pipeline.cpp
clang++ -c -std=c++11 -O3 -Wall -Wsign-compare  -c -o pheniqs.o pheniqs.cpp
clang++ constant.o model.o feed.o environment.o pipeline.o pheniqs.o -o pheniqs -lhts -lz
/usr/bin/ld: pipeline.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:81: recipe for target 'pheniqs' failed
make: *** [pheniqs] Error 1

I am not sure how to address this and assistance would be appreciated.

Help understanding json config for basic demultiplexing

Hi,

I'm wondering if you'd be able to help me understand how to construct a correct JSON config file. My goal is to demultiplex based on 6bp I1 indices from 4 files provided by our sequencing center (we have I2, but it's meaningless in this case).

What I've tried:

  • Constructing a json file based on the templates you had in the documentation, and then running it through the python script you provided:

pheniqs/tool/pheniqs-io-api.py -c 211123_run064_pheniqs_test.json -LS -F fastq --compression gz > 211129_run064_pheniqs_post-io-api.json.

Where I'm confused

When I try to run pheniqs as follows pheniqs mux -F fastq --compression gz -c 211129_run064_pheniqs_post-io-api.json, it just prints the gzip-compressed output to standard out. Yet in the config JSON produced using your python script there are filenames as output. When I run the validate step, it also specifies output will be put to stdout which is a bit confusing to me. I've provided both configs in plaintext below.

Please accept my apologies for the vague query, but I am a bit stumped. I'm unfamiliar with JSON-formatted configs and used to providing this info in a TSV/CSV and am not quite sure where to start in troubleshooting this. Even if you can tell me this must be due to some stray punctuation / a missing section in my original config or some such issue that would be really helpful for me to understand how pheniqs works.

Thanks a lot for your help,
Jesse

Original JSON file I constructed with your template:

{
	"PL" : "ILLUMINA",
	"PM" : "NovaSeq_2x250_UCDavis_run064",
        "base input url": ".",
	"filter incoming qc fail" : true,
	"flowcell id" : "JMSNRSCF",
	"report url" : "211123_JMSNRSCF_demux_sample_report.json",
	"input" : [
		"JMSNRSCF_S1_L001_R1_001.fastq.gz",
		"JMSNRSCF_S1_L001_I1_001.fastq.gz",
		"JMSNRSCF_S1_L001_I2_001.fastq.gz",
		"JMSNRSCF_S1_L001_R2_001.fastq.gz"
	],
	"template" : {
		"transform" : {
			"token" : [ "0::", "3::" ]
		}
	},
	"sample" : {
		"transform" : {
			"token" : [ "1::6" ]
		},
		"algorithm" : "pamld",
		"confidence threshold" : 0.95,
		"noise" : 0.05,
		"codec" : {
            "@11_15": {
                "LB": "11_15",
                "barcode": [
                    "AGTCAA"
                ],
                "output": [
                    "JMSNRSCF_11_15_s01.fastq.gz",
                    "JMSNRSCF_11_15_s02.fastq.gz",
                    "JMSNRSCF_11_15_s03.fastq.gz",
                    "JMSNRSCF_11_15_s04.fastq.gz"
                ]
            },
            "@11_200": {
                "LB": "11_200",
                "barcode": [
                    "ATGTCA"
                ],
                "output": [
                    "JMSNRSCF_11_200_s01.fastq.gz",
                    "JMSNRSCF_11_200_s02.fastq.gz",
                    "JMSNRSCF_11_200_s03.fastq.gz",
                    "JMSNRSCF_11_200_s04.fastq.gz"
                ]
            },
            "@11_25": {
                "LB": "11_25",
                "barcode": [
                    "CGTACG"
                ],
                "output": [
                    "JMSNRSCF_11_25_s01.fastq.gz",
                    "JMSNRSCF_11_25_s02.fastq.gz",
                    "JMSNRSCF_11_25_s03.fastq.gz",
                    "JMSNRSCF_11_25_s04.fastq.gz"
                ]
            },
            "@2_120": {
                "LB": "2_120",
                "barcode": [
                    "AGTCAA"
                ],
                "output": [
                    "JMSNRSCF_2_120_s01.fastq.gz",
                    "JMSNRSCF_2_120_s02.fastq.gz",
                    "JMSNRSCF_2_120_s03.fastq.gz",
                    "JMSNRSCF_2_120_s04.fastq.gz"
                ]
            },
            "@2_400": {
                "LB": "2_400",
                "barcode": [
                    "ATCTCA"
                ],
                "output": [
                    "JMSNRSCF_2_400_s01.fastq.gz",
                    "JMSNRSCF_2_400_s02.fastq.gz",
                    "JMSNRSCF_2_400_s03.fastq.gz",
                    "JMSNRSCF_2_400_s04.fastq.gz"
                ]
            },
            "@2_45": {
                "LB": "2_45",
                "barcode": [
                    "AGGAAT"
                ],
                "output": [
                    "JMSNRSCF_2_45_s01.fastq.gz",
                    "JMSNRSCF_2_45_s02.fastq.gz",
                    "JMSNRSCF_2_45_s03.fastq.gz",
                    "JMSNRSCF_2_45_s04.fastq.gz"
                ]
            },
            "@7_120": {
                "LB": "7_120",
                "barcode": [
                    "ATTCCT"
                ],
                "output": [
                    "JMSNRSCF_7_120_s01.fastq.gz",
                    "JMSNRSCF_7_120_s02.fastq.gz",
                    "JMSNRSCF_7_120_s03.fastq.gz",
                    "JMSNRSCF_7_120_s04.fastq.gz"
                ]
            },
            "@7_200": {
                "LB": "7_200",
                "barcode": [
                    "GTGGCC"
                ],
                "output": [
                    "JMSNRSCF_7_200_s01.fastq.gz",
                    "JMSNRSCF_7_200_s02.fastq.gz",
                    "JMSNRSCF_7_200_s03.fastq.gz",
                    "JMSNRSCF_7_200_s04.fastq.gz"
                ]
            },
            "@7_25": {
                "LB": "7_25",
                "barcode": [
                    "GTGAAA"
                ],
                "output": [
                    "JMSNRSCF_7_25_s01.fastq.gz",
                    "JMSNRSCF_7_25_s02.fastq.gz",
                    "JMSNRSCF_7_25_s03.fastq.gz",
                    "JMSNRSCF_7_25_s04.fastq.gz"
                ]
            },
            "@7_45": {
                "LB": "7_45",
                "barcode": [
                    "ATGAGC"
                ],
                "output": [
                    "JMSNRSCF_7_45_s01.fastq.gz",
                    "JMSNRSCF_7_45_s02.fastq.gz",
                    "JMSNRSCF_7_45_s03.fastq.gz",
                    "JMSNRSCF_7_45_s04.fastq.gz"
                ]
            },
            "@8_200": {
                "LB": "8_200",
                "barcode": [
                    "ATGAGC"
                ],
                "output": [
                    "JMSNRSCF_8_200_s01.fastq.gz",
                    "JMSNRSCF_8_200_s02.fastq.gz",
                    "JMSNRSCF_8_200_s03.fastq.gz",
                    "JMSNRSCF_8_200_s04.fastq.gz"
                ]
            },
            "@8_800": {
                "LB": "8_800",
                "barcode": [
                    "CCGTCC"
                ],
                "output": [
                    "JMSNRSCF_8_800_s01.fastq.gz",
                    "JMSNRSCF_8_800_s02.fastq.gz",
                    "JMSNRSCF_8_800_s03.fastq.gz",
                    "JMSNRSCF_8_800_s04.fastq.gz"
                ]
            },
            "@9_200": {
                "LB": "9_200",
                "barcode": [
                    "ACTTCC"
                ],
                "output": [
                    "JMSNRSCF_9_200_s01.fastq.gz",
                    "JMSNRSCF_9_200_s02.fastq.gz",
                    "JMSNRSCF_9_200_s03.fastq.gz",
                    "JMSNRSCF_9_200_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev11": {
                "LB": "BGT_PL01_rev11",
                "barcode": [
                    "GCGGAC"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev11_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev12": {
                "LB": "BGT_PL01_rev12",
                "barcode": [
                    "TTTCAC"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev12_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev13": {
                "LB": "BGT_PL01_rev13",
                "barcode": [
                    "CCGGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev13_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev14": {
                "LB": "BGT_PL01_rev14",
                "barcode": [
                    "ATCGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev14_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s04.fastq.gz"
                ]
            },
            "@BGT_PL02_rev15": {
                "LB": "BGT_PL02_rev15",
                "barcode": [
                    "TGAGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL02_rev15_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s04.fastq.gz"
                ]
            },
            "@BGT_PL02_rev16": {
                "LB": "BGT_PL02_rev16",
                "barcode": [
                    "CGCCTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL02_rev16_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev01": {
                "LB": "CAP_PL01_rev01",
                "barcode": [
                    "CGTGAT"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev01_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev02": {
                "LB": "CAP_PL01_rev02",
                "barcode": [
                    "ACATCG"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev02_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev03": {
                "LB": "CAP_PL01_rev03",
                "barcode": [
                    "GCCTAA"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev03_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev04": {
                "LB": "CAP_PL01_rev04",
                "barcode": [
                    "TGGTCA"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev04_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s04.fastq.gz"
                ]
            },
            "@Colette_rev21": {
                "LB": "Colette_rev21",
                "barcode": [
                    "TTCGTC"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev21_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s04.fastq.gz"
                ]
            },
            "@Colette_rev22": {
                "LB": "Colette_rev22",
                "barcode": [
                    "CCAACT"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev22_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s04.fastq.gz"
                ]
            },
            "@Colette_rev23": {
                "LB": "Colette_rev23",
                "barcode": [
                    "TCAGTT"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev23_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s04.fastq.gz"
                ]
            },
            "@Colette_rev24": {
                "LB": "Colette_rev24",
                "barcode": [
                    "CTGACC"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev24_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s04.fastq.gz"
                ]
            },
            "@Delaney_rev19": {
                "LB": "Delaney_rev19",
                "barcode": [
                    "GGAACT"
                ],
                "output": [
                    "JMSNRSCF_Delaney_rev19_s01.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s02.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s03.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s04.fastq.gz"
                ]
            },
            "@Delaney_rev20": {
                "LB": "Delaney_rev20",
                "barcode": [
                    "CGAAAC"
                ],
                "output": [
                    "JMSNRSCF_Delaney_rev20_s01.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s02.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s03.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s04.fastq.gz"
                ]
            },
            "@Rae_rev05": {
                "LB": "Rae_rev05",
                "barcode": [
                    "CACTGT"
                ],
                "output": [
                    "JMSNRSCF_Rae_rev05_s01.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s02.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s03.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s04.fastq.gz"
                ]
            },
            "@mystery12264": {
                "LB": "mystery12264",
                "barcode": [
                    "ACTGAT"
                ],
                "output": [
                    "JMSNRSCF_mystery12264_s01.fastq.gz",
                    "JMSNRSCF_mystery12264_s02.fastq.gz",
                    "JMSNRSCF_mystery12264_s03.fastq.gz",
                    "JMSNRSCF_mystery12264_s04.fastq.gz"
                ]
            },
            "@mystery17075": {
                "LB": "mystery17075",
                "barcode": [
                    "GTGGCC"
                ],
                "output": [
                    "JMSNRSCF_mystery17075_s01.fastq.gz",
                    "JMSNRSCF_mystery17075_s02.fastq.gz",
                    "JMSNRSCF_mystery17075_s03.fastq.gz",
                    "JMSNRSCF_mystery17075_s04.fastq.gz"
                ]
            },
            "@mystery4368": {
                "LB": "mystery4368",
                "barcode": [
                    "CGCCTG"
                ],
                "output": [
                    "JMSNRSCF_mystery4368_s01.fastq.gz",
                    "JMSNRSCF_mystery4368_s02.fastq.gz",
                    "JMSNRSCF_mystery4368_s03.fastq.gz",
                    "JMSNRSCF_mystery4368_s04.fastq.gz"
                ]
            },
            "@mystery4895": {
                "LB": "mystery4895",
                "barcode": [
                    "GGGGGG"
                ],
                "output": [
                    "JMSNRSCF_mystery4895_s01.fastq.gz",
                    "JMSNRSCF_mystery4895_s02.fastq.gz",
                    "JMSNRSCF_mystery4895_s03.fastq.gz",
                    "JMSNRSCF_mystery4895_s04.fastq.gz"
                ]
            },
            "@mystery8619": {
                "LB": "mystery8619",
                "barcode": [
                    "AGTTCC"
                ],
                "output": [
                    "JMSNRSCF_mystery8619_s01.fastq.gz",
                    "JMSNRSCF_mystery8619_s02.fastq.gz",
                    "JMSNRSCF_mystery8619_s03.fastq.gz",
                    "JMSNRSCF_mystery8619_s04.fastq.gz"
                ]
            },
            "@mystery9788": {
                "LB": "mystery9788",
                "barcode": [
                    "CCGTCC"
                ],
                "output": [
                    "JMSNRSCF_mystery9788_s01.fastq.gz",
                    "JMSNRSCF_mystery9788_s02.fastq.gz",
                    "JMSNRSCF_mystery9788_s03.fastq.gz",
                    "JMSNRSCF_mystery9788_s04.fastq.gz"
                ]
            }      
        
		}
	}
}

The output of your python processing script:

{
    "PL": "ILLUMINA",
    "PM": "NovaSeq_2x250_UCDavis_run064",
    "base input url": ".",
    "filter incoming qc fail": true,
    "flowcell id": "JMSNRSCF",
    "input": [
        "JMSNRSCF_S1_L001_R1_001.fastq.gz",
        "JMSNRSCF_S1_L001_I1_001.fastq.gz",
        "JMSNRSCF_S1_L001_I2_001.fastq.gz",
        "JMSNRSCF_S1_L001_R2_001.fastq.gz"
    ],
    "report url": "211123_JMSNRSCF_demux_sample_report.json",
    "sample": {
        "algorithm": "pamld",
        "codec": {
            "@11_15": {
                "LB": "11_15",
                "barcode": [
                    "AGTCAA"
                ],
                "output": [
                    "JMSNRSCF_11_15_s01.fastq.gz",
                    "JMSNRSCF_11_15_s02.fastq.gz",
                    "JMSNRSCF_11_15_s03.fastq.gz",
                    "JMSNRSCF_11_15_s04.fastq.gz"
                ]
            },
            "@11_200": {
                "LB": "11_200",
                "barcode": [
                    "ATGTCA"
                ],
                "output": [
                    "JMSNRSCF_11_200_s01.fastq.gz",
                    "JMSNRSCF_11_200_s02.fastq.gz",
                    "JMSNRSCF_11_200_s03.fastq.gz",
                    "JMSNRSCF_11_200_s04.fastq.gz"
                ]
            },
            "@11_25": {
                "LB": "11_25",
                "barcode": [
                    "CGTACG"
                ],
                "output": [
                    "JMSNRSCF_11_25_s01.fastq.gz",
                    "JMSNRSCF_11_25_s02.fastq.gz",
                    "JMSNRSCF_11_25_s03.fastq.gz",
                    "JMSNRSCF_11_25_s04.fastq.gz"
                ]
            },
            "@2_120": {
                "LB": "2_120",
                "barcode": [
                    "AGTCAA"
                ],
                "output": [
                    "JMSNRSCF_2_120_s01.fastq.gz",
                    "JMSNRSCF_2_120_s02.fastq.gz",
                    "JMSNRSCF_2_120_s03.fastq.gz",
                    "JMSNRSCF_2_120_s04.fastq.gz"
                ]
            },
            "@2_400": {
                "LB": "2_400",
                "barcode": [
                    "ATCTCA"
                ],
                "output": [
                    "JMSNRSCF_2_400_s01.fastq.gz",
                    "JMSNRSCF_2_400_s02.fastq.gz",
                    "JMSNRSCF_2_400_s03.fastq.gz",
                    "JMSNRSCF_2_400_s04.fastq.gz"
                ]
            },
            "@2_45": {
                "LB": "2_45",
                "barcode": [
                    "AGGAAT"
                ],
                "output": [
                    "JMSNRSCF_2_45_s01.fastq.gz",
                    "JMSNRSCF_2_45_s02.fastq.gz",
                    "JMSNRSCF_2_45_s03.fastq.gz",
                    "JMSNRSCF_2_45_s04.fastq.gz"
                ]
            },
            "@7_120": {
                "LB": "7_120",
                "barcode": [
                    "ATTCCT"
                ],
                "output": [
                    "JMSNRSCF_7_120_s01.fastq.gz",
                    "JMSNRSCF_7_120_s02.fastq.gz",
                    "JMSNRSCF_7_120_s03.fastq.gz",
                    "JMSNRSCF_7_120_s04.fastq.gz"
                ]
            },
            "@7_200": {
                "LB": "7_200",
                "barcode": [
                    "GTGGCC"
                ],
                "output": [
                    "JMSNRSCF_7_200_s01.fastq.gz",
                    "JMSNRSCF_7_200_s02.fastq.gz",
                    "JMSNRSCF_7_200_s03.fastq.gz",
                    "JMSNRSCF_7_200_s04.fastq.gz"
                ]
            },
            "@7_25": {
                "LB": "7_25",
                "barcode": [
                    "GTGAAA"
                ],
                "output": [
                    "JMSNRSCF_7_25_s01.fastq.gz",
                    "JMSNRSCF_7_25_s02.fastq.gz",
                    "JMSNRSCF_7_25_s03.fastq.gz",
                    "JMSNRSCF_7_25_s04.fastq.gz"
                ]
            },
            "@7_45": {
                "LB": "7_45",
                "barcode": [
                    "ATGAGC"
                ],
                "output": [
                    "JMSNRSCF_7_45_s01.fastq.gz",
                    "JMSNRSCF_7_45_s02.fastq.gz",
                    "JMSNRSCF_7_45_s03.fastq.gz",
                    "JMSNRSCF_7_45_s04.fastq.gz"
                ]
            },
            "@8_200": {
                "LB": "8_200",
                "barcode": [
                    "ATGAGC"
                ],
                "output": [
                    "JMSNRSCF_8_200_s01.fastq.gz",
                    "JMSNRSCF_8_200_s02.fastq.gz",
                    "JMSNRSCF_8_200_s03.fastq.gz",
                    "JMSNRSCF_8_200_s04.fastq.gz"
                ]
            },
            "@8_800": {
                "LB": "8_800",
                "barcode": [
                    "CCGTCC"
                ],
                "output": [
                    "JMSNRSCF_8_800_s01.fastq.gz",
                    "JMSNRSCF_8_800_s02.fastq.gz",
                    "JMSNRSCF_8_800_s03.fastq.gz",
                    "JMSNRSCF_8_800_s04.fastq.gz"
                ]
            },
            "@9_200": {
                "LB": "9_200",
                "barcode": [
                    "ACTTCC"
                ],
                "output": [
                    "JMSNRSCF_9_200_s01.fastq.gz",
                    "JMSNRSCF_9_200_s02.fastq.gz",
                    "JMSNRSCF_9_200_s03.fastq.gz",
                    "JMSNRSCF_9_200_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev11": {
                "LB": "BGT_PL01_rev11",
                "barcode": [
                    "GCGGAC"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev11_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev11_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev12": {
                "LB": "BGT_PL01_rev12",
                "barcode": [
                    "TTTCAC"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev12_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev12_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev13": {
                "LB": "BGT_PL01_rev13",
                "barcode": [
                    "CCGGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev13_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev13_s04.fastq.gz"
                ]
            },
            "@BGT_PL01_rev14": {
                "LB": "BGT_PL01_rev14",
                "barcode": [
                    "ATCGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL01_rev14_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL01_rev14_s04.fastq.gz"
                ]
            },
            "@BGT_PL02_rev15": {
                "LB": "BGT_PL02_rev15",
                "barcode": [
                    "TGAGTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL02_rev15_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev15_s04.fastq.gz"
                ]
            },
            "@BGT_PL02_rev16": {
                "LB": "BGT_PL02_rev16",
                "barcode": [
                    "CGCCTG"
                ],
                "output": [
                    "JMSNRSCF_BGT_PL02_rev16_s01.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s02.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s03.fastq.gz",
                    "JMSNRSCF_BGT_PL02_rev16_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev01": {
                "LB": "CAP_PL01_rev01",
                "barcode": [
                    "CGTGAT"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev01_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev01_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev02": {
                "LB": "CAP_PL01_rev02",
                "barcode": [
                    "ACATCG"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev02_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev02_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev03": {
                "LB": "CAP_PL01_rev03",
                "barcode": [
                    "GCCTAA"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev03_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev03_s04.fastq.gz"
                ]
            },
            "@CAP_PL01_rev04": {
                "LB": "CAP_PL01_rev04",
                "barcode": [
                    "TGGTCA"
                ],
                "output": [
                    "JMSNRSCF_CAP_PL01_rev04_s01.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s02.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s03.fastq.gz",
                    "JMSNRSCF_CAP_PL01_rev04_s04.fastq.gz"
                ]
            },
            "@Colette_rev21": {
                "LB": "Colette_rev21",
                "barcode": [
                    "TTCGTC"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev21_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev21_s04.fastq.gz"
                ]
            },
            "@Colette_rev22": {
                "LB": "Colette_rev22",
                "barcode": [
                    "CCAACT"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev22_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev22_s04.fastq.gz"
                ]
            },
            "@Colette_rev23": {
                "LB": "Colette_rev23",
                "barcode": [
                    "TCAGTT"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev23_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev23_s04.fastq.gz"
                ]
            },
            "@Colette_rev24": {
                "LB": "Colette_rev24",
                "barcode": [
                    "CTGACC"
                ],
                "output": [
                    "JMSNRSCF_Colette_rev24_s01.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s02.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s03.fastq.gz",
                    "JMSNRSCF_Colette_rev24_s04.fastq.gz"
                ]
            },
            "@Delaney_rev19": {
                "LB": "Delaney_rev19",
                "barcode": [
                    "GGAACT"
                ],
                "output": [
                    "JMSNRSCF_Delaney_rev19_s01.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s02.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s03.fastq.gz",
                    "JMSNRSCF_Delaney_rev19_s04.fastq.gz"
                ]
            },
            "@Delaney_rev20": {
                "LB": "Delaney_rev20",
                "barcode": [
                    "CGAAAC"
                ],
                "output": [
                    "JMSNRSCF_Delaney_rev20_s01.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s02.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s03.fastq.gz",
                    "JMSNRSCF_Delaney_rev20_s04.fastq.gz"
                ]
            },
            "@Rae_rev05": {
                "LB": "Rae_rev05",
                "barcode": [
                    "CACTGT"
                ],
                "output": [
                    "JMSNRSCF_Rae_rev05_s01.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s02.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s03.fastq.gz",
                    "JMSNRSCF_Rae_rev05_s04.fastq.gz"
                ]
            },
            "@mystery12264": {
                "LB": "mystery12264",
                "barcode": [
                    "ACTGAT"
                ],
                "output": [
                    "JMSNRSCF_mystery12264_s01.fastq.gz",
                    "JMSNRSCF_mystery12264_s02.fastq.gz",
                    "JMSNRSCF_mystery12264_s03.fastq.gz",
                    "JMSNRSCF_mystery12264_s04.fastq.gz"
                ]
            },
            "@mystery17075": {
                "LB": "mystery17075",
                "barcode": [
                    "GTGGCC"
                ],
                "output": [
                    "JMSNRSCF_mystery17075_s01.fastq.gz",
                    "JMSNRSCF_mystery17075_s02.fastq.gz",
                    "JMSNRSCF_mystery17075_s03.fastq.gz",
                    "JMSNRSCF_mystery17075_s04.fastq.gz"
                ]
            },
            "@mystery4368": {
                "LB": "mystery4368",
                "barcode": [
                    "CGCCTG"
                ],
                "output": [
                    "JMSNRSCF_mystery4368_s01.fastq.gz",
                    "JMSNRSCF_mystery4368_s02.fastq.gz",
                    "JMSNRSCF_mystery4368_s03.fastq.gz",
                    "JMSNRSCF_mystery4368_s04.fastq.gz"
                ]
            },
            "@mystery4895": {
                "LB": "mystery4895",
                "barcode": [
                    "GGGGGG"
                ],
                "output": [
                    "JMSNRSCF_mystery4895_s01.fastq.gz",
                    "JMSNRSCF_mystery4895_s02.fastq.gz",
                    "JMSNRSCF_mystery4895_s03.fastq.gz",
                    "JMSNRSCF_mystery4895_s04.fastq.gz"
                ]
            },
            "@mystery8619": {
                "LB": "mystery8619",
                "barcode": [
                    "AGTTCC"
                ],
                "output": [
                    "JMSNRSCF_mystery8619_s01.fastq.gz",
                    "JMSNRSCF_mystery8619_s02.fastq.gz",
                    "JMSNRSCF_mystery8619_s03.fastq.gz",
                    "JMSNRSCF_mystery8619_s04.fastq.gz"
                ]
            },
            "@mystery9788": {
                "LB": "mystery9788",
                "barcode": [
                    "CCGTCC"
                ],
                "output": [
                    "JMSNRSCF_mystery9788_s01.fastq.gz",
                    "JMSNRSCF_mystery9788_s02.fastq.gz",
                    "JMSNRSCF_mystery9788_s03.fastq.gz",
                    "JMSNRSCF_mystery9788_s04.fastq.gz"
                ]
            }
        },
        "confidence threshold": 0.95,
        "noise": 0.05,
        "transform": {
            "token": [
                "1::6"
            ]
        },
        "undetermined": {
            "output": [
                "JMSNRSCF_undetermined_s01.fastq.gz",
                "JMSNRSCF_undetermined_s02.fastq.gz",
                "JMSNRSCF_undetermined_s03.fastq.gz",
                "JMSNRSCF_undetermined_s04.fastq.gz"
            ]
        }
    },
    "template": {
        "transform": {
            "token": [
                "0::",
                "3::"
            ]
        }
    }
}

Quickstart Example not working for me

Hi!
I would like to use your tool to demultiplex my files. However, when I try out the Quickstart example I do not manage to demultiplex the given files. Working on a Mac with python 3.6.
In the attached folder I run
pheniqs mux --config pheniqs.json > report.txt

In the end I would like to simply to demultiplex a single FASTQ file (my-example) with a sample code at position 1:5 of the read.

If you could point me to my mistake that would be highly appreciated.
Best,
Adriano

pheniqs_quickstart.zip
my-example.zip

Troubleshooting "SequenceError" error

Hi,

I keep getting a "SequenceError" message that results in "Aborted (core dumped)" while I'm trying to run the demux command. Any idea why this might be the case?

Thanks!

support for passthrough auxiliary tags

Pheniqs manipulates some of the SAM auxiliary tags during demultiplexing but there is still the issue of how to handle pre-existing tags when the inputs are SAM.

demultiplexing previously processed SAM files can benefit from carrying over auxiliary tags from input to output. During demultiplexing read segments are rearranged into new read segments. This raises the question of how to decide which auxiliary tags of each input segments are replicated on which output read segment.

one very brute option is to set a leader segment and only copy tags from it to all output segments.

a more subtle option is to expand the transform syntax to list which tags to copy.
For instance: < input segment index >[:<two character auxiliary tag code>]+ for example 0:BC:QT will mean copy BC and QT from input segment 0.

in terms of implementation, pheniqs does not need to actually decode tags it does not interact with. Sufficient is that an Auxiliary object keep an unordered_map (a hash table) with 2 character code to byte array pointer. during decoding the pointer to the byte array is populated and during encoding that byte array can be copied as is.

the marginal cost this will probably be unnoticeable because it will take place when processing threads are waiting for IO.

Build pheniq as conda package

ping @nizardrou

/opt/conda/conda-bld/pheniqs_1531757120804/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++ -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -Wall -Wsign-compare -I/opt/conda/conda-bld/pheniqs_1531757120804/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/opt/conda/conda-bld/pheniqs_1531757120804/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include -fdebug-prefix-map=/opt/conda/conda-bld/pheniqs_1531757120804/work=/usr/local/src/conda/pheniqs-2.1.0 -fdebug-prefix-map=/opt/conda/conda-bld/pheniqs_1531757120804/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place=/usr/local/src/conda-prefix -std=c++11 -O3 -c -o interface.o interface.cpp
In file included from interface.h:26:0,
                 from interface.cpp:22:
version.h:2:26: warning: ISO C++11 requires whitespace after the macro name
 #define PHENIQS_VERSION_H\n
                          ^
version.h:4:1: error: stray '\' in program
 \n#endif /* PHENIQS_VERSION_H */\n
 ^
version.h:4:3: error: stray '#' in program
 \n#endif /* PHENIQS_VERSION_H */\n
   ^
version.h:4:33: error: stray '\' in program
 \n#endif /* PHENIQS_VERSION_H */\n
                                 ^
version.h:1:0: error: unterminated #ifndef
 #ifndef PHENIQS_VERSION_H

version.h:4:2: error: 'n' does not name a type; did you mean 'yn'?
 \n#endif /* PHENIQS_VERSION_H */\n
  ^
  yn
interface.cpp: In function 'std::__cxx11::string get_cwd()':
interface.cpp:52:23: error: 'InternalError' was not declared in this scope
                 throw InternalError("error " + to_string(errno) + " when probing working directory");
                       ^~~~~~~~~~~~~
interface.cpp: In member function 'virtual void Interface::load_sub_action(const Value&)':
interface.cpp:1107:20: error: 'InternalError' was not declared in this scope
     } else { throw InternalError("interface action must declare a name"); }
[conda@f4ec53437ebf ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.