qmarcou / igor Goto Github PK

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data. Find full documentation at:

Home Page: https://qmarcou.github.io/IGoR/

License: GNU General Public License v3.0

Makefile 0.51% Shell 0.93% M4 1.33% C++ 8.55% C 85.95% Perl 1.85% XSLT 0.01% Python 0.87%

inference igor recombination immunology simulation hypermutation

igor's Introduction

IGoR: Inference and Generation Of Repertoires

This repository contains all sources and models useful to infer V(D)J recombination related processes for TCR or BCR sequencing data using IGoR.

IGoR’s exhaustive documentation can be found here!

Download the latest release here.

Quick summary

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data such as:

Recombination model probability distribution
Hypermutation model
Best candidates recombination scenarios
Generation probabilities of sequences (even hypermutated)

The following paper describes the methodology, performance tests and some new biological results obtained with IGoR:

High-throughput immune repertoire analysis with IGoR, Nature Communications, (2018) Quentin Marcou, Thierry Mora, Aleksandra M. Walczak

Its heavily object oriented and modular style was designed to ensure long term support and evolvability for new tasks in assessing TCR and BCR receptors features using modern parallel architectures.

IGoR is a free (as in freedom) software released under the GNU-GPLv3 license.

Version

role=include Download the latest release here.

Documentation

A comprehensive documentation of IGoR is available on the internet at: http://qmarcou.github.io/IGoR or in your local installation of IGoR as the ./docs/index.html file.

role=include

Contact

For any question or issue please open an issue or email us.

Copying

Free use of IGoR is granted under the terms of the GNU General Public License version 3 (GPLv3).

igor's People

Contributors

Stargazers

Watchers

igor's Issues

NaN likelihood and custom code section

Hello,

I tried to construct a BCR light chain model by modifying the C++ code in “run_demo” section. The modified code passed compiler but failed in running inference. (I use IMGT germline gene and 1000 mouse kappa chain as test input). It always shows -nan value for likehoods, and all values in final_marginal files are 0. I had modified “ probability threshold” as suggested, but it doesn’t help. There are some parameter need to be set when calling the C++API, is there a detail document for C++ API?

Thanks,
Brian

Compilation Issue with igor.1 missing

When running make:

make[2]: *** No rule to make target igor.1', needed by all-am'. Stop.

When the autogen.sh runs, I get the following:

pandoc -s -f markdown -t man README.md -o igor.1
pandoc: README.md: openFile: does not exist (No such file or directory)

Doing:
asciidoc -b docbook README.adoc
pandoc -f docbook -t markdown_strict README.xml -o README.md
(from https://tinyapps.org/blog/nix/201701240700_convert_asciidoc_to_markdown.html)

Seems to resolve the issue.

Perhaps these steps should be added to the autogen.sh? Or am I missing some dependency?

CDR3 anchors information and other models/ref_genome information

Hi Quentin,

How did you obtain the indices of where the anchor index is for the J and V genes in your models? Is there implementation or a feature in IGoR so that it can do this? If not, what was the method you used? IMGT doesn't appear to contain the anchor indices information.

Additionally, what was the process in deciding which genes would be used and which would not be used? For example, let's consider the information in models/human/bcr_heavy/ref_genome. I am attempting to see where you got these values from using IMGT, specifically the hyperlinks in the table, which is two-thirds down the page, available at http://www.imgt.org/vquest/refseqh.html. F+ORF+all P IGHV Human has 477 sequences (http://www.imgt.org/genedb/GENElect?query=7.2+IGHV&species=Homo+sapiens). Your genomicVs.fasta file in the igor_1-3-0/models/human/bcr_heavy/ref_genome directory has only 97. Why is it so small compared to the IGMT files? Were those the only ones available at the time?

Thanks,
Zach

make check failures but make install worked

Hi,

I installed Igor 1-4 what appears to be successfully: make and make install presented no issues that broke either step. I used gcc 6.3.1 20170216 on an HPC running CentOS Linux release 7.6.1810. However, I received this make check message which cause the make check to break. Is this something I should be concerned about while running igor?

Making check in cdf
make[3]: Entering directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs/gsl_sub/cdf'
make  test
make[4]: Entering directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs/gsl_sub/cdf'
make[4]: *** No rule to make target '../randist/libgslrandist.la', needed by 'test'.  Stop.
make[4]: Leaving directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs/gsl_sub/cdf'
Makefile:966: recipe for target 'check-am' failed
make[3]: *** [check-am] Error 2
make[3]: Leaving directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs/gsl_sub/cdf'
Makefile:790: recipe for target 'check-recursive' failed
make[2]: *** [check-recursive] Error 1
make[2]: Leaving directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs/gsl_sub'
Makefile:357: recipe for target 'check-recursive' failed
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory '/gscratch/stf/zachmon/software/igor_1-4-0/libs'
Makefile:553: recipe for target 'check-recursive' failed
make: *** [check-recursive] Error 1

Thanks,
Zach

Failed to create the man page - where is README.md?

Hello,

The installation fails when igor.1, the man page, is not available. After a glimpse at autogen.sh I understand that this is autocreated with pandoc from the README.md, but somehow I fail to locate the README.md file. I tried using README.adoc, instead, but that bails out with too many errors, so I may conclude that there is a file missing from the archive? But then again the build_release script has a comment on README.md but then modifies README.adoc, which kind of indicates that the two are the same file. Sigh. I personally would not mind to see igor.1 distributed directly, so I must admit.

I spotted this issue while preparing a Debian package for IGoR and hope this package to be in your interest. When there are other tools for immune repertoire interpretation that you would like to become part of the distribution then please kindly tell me. MIXCR and changeo are next. A (partial) overview of biological software already shipping with Debian and its derivative distributions is listed on https://blends.debian.org/med/tasks/bio.

Many thanks and regards,

Steffen

Option to print just CDR3s

When creating very large synthetic repertoires, sometimes we want to save space and generate just the CDR3s. I found a way to modify the code to comment out the lines that write out the full sequences (lines 542 and 612 in the version of GenModel.cpp from late February)

outfile_ind_seq<<seq<<";"<<sequence.first<<endl;
outfile_ind_seq<<index<<";"<<(*iter).first<<endl;

Would it be possible to add a command line option to control this behavior (e.g. --no_full_seq or --crd3_only)?

Issue: sample size of more than 100000 sequences

Hello,

I'm trying to run Igor on my T cell receptor beta chain sequences, and everything works great until my sample size is above 100,000 sequences.

I'm getting the following error when using -evaluate command:

[IGoR] ERROR: Exception caught while reading J alignments before inference/evaluation. Make sure alignments were carried previously using "-align --J" or "-align --all" with similar path parameters (working directory, batchname, ...)

I have done -align -all, just like i did for all my other samples, and the the J_alignments file was generated in the aligns folder and looks fine. I tried splitting the sample in 4 files and doing all separately, which worked perfectly, so it shouldn't be a problem with the sequences. It is only when I use the whole file that I get that error.

Do you have any advise on how to go around this, or are there limitations with file sizes?

Thanks,
Kristina

Format of genomic references files

I have a question regarding the input FASTA files for the genomic references. Are these files supposed to follow a specific format, like IMGT's annotation for example? It looks like IGoR expects the header to be the gene name of the sequences. That would mean that I have to apply some pre-processing to the IMGT reference files before I can use them. Is this correct?

Cheers, Wout

Separating pygor from IGoR repo.

Hi Quentin,

I created this separated issue (based on pull request #26) to continue our talk in pygor development.

Since I'm working a lot with IGoR and pygor at the moment (check out the following git repo: https://github.com/penuts7644/TcrDataComparison), I would like be more involved with the development of pygor and its codebase. Many of the code pieces I'm writing in the git repo above could be additions to pygor, especially when my thesis work progresses.

To make pygor and IGoR easier to maintain, I propose the following:

Separate pygor into another GitHub repository, from there pygor can be maintained and developed on its own, separately from IGoR.
New versions of pygor can be committed to PyPI without having to make a new release of IGoR. This also applies the other way around where a new IGoR release doesn't have to release a new version of pygor, even when nothing has changed to pygor's code. This streamlines pygor's installation process (in my case for server use).
Have the possibility to expand pygor's codebase without IGoR
Let pygor have its own documentation including documentation webpage.

I would love to spend more of my time on the development process of pygor and I have no problem to take some initiative to set all of this up.
Please let me know what you think 😄

Cheers, Wout

General guidelines for Pgen

Hi Quentin,

I've been playing around with IGoR a bit after having read the paper and I'm wondering if you can give some suggestions about how to go through the process of estimating Pgen for small datasets:

Say for instance we have anywhere from 50 to a few hundred human single-chain TCR sequences that are also epitope-specific. Would you recommend simply using the model that comes with IGoR to estimate Pgen, or do you anticipate any improvement in first using -infer to update the model, even though there are relatively few sequences and they are not representative of random selection from the repertoire?
I recall from the paper that you recommend considering at least 50 scenarios for each somatic recombination event. But when I set --scenarios to any value, I can't see evidence in the logs or the output that the number of scenarios I specified is actually being used. Perhaps I'm looking in the wrong place. Can you advise? Or perhaps estimating Pgen doesn't benefit from considering more than the 10 most likely scenarios?

Thanks!

Insertions prevent consideration of alignments

Top-scoring alignments with scores well over the relevant threshold seem to be discarded when they also contain insertions. For instance, the following V alignment will not be included in the n_V_aligns for seq_index 4 in inference_logs.txt: 4;TRAV29/DV502;1128;-4;{52,114};{};{0,111,218,219};275;0;272
But the alignment will be included if the insertions are no longer present: 4;TRAV29/DV502;1128;-4;{};{};{0,111,218,219};275;0;272

Thus far I've only tested this with V alignments, but it seems that even a single insertion, regardless of where it is in the sequence, has this effect. I'm wondering if this is intentional or if I've missed some optional toggle to allow alignment insertions. I'm unsure of the broad implications that this would have, but for my purposes it prevents estimation of Pgen for sequences that do have high-quality alignments but also have putative insertions.

wrong default path for genomic templates in 1.2

Hi, Quentin!

version 1.2 seems to crush on demo example, because it looks for genomic templates in wrong place by default. Same code performs fine on previous version.

WDPATH=. ./igor -set_wd $WDPATH -batch foo -species human -chain beta -align --all

returns following:

Batch name set to: foo_ Species parameter set to: human Chain parameter set to: beta Working directory set to: "./" [IGoR] ERROR: Exception caught while reading TRB V genomic templates. [IGoR] ERROR: File not found: /usr/local/share/igor/models/human/tcr_beta/ref_genome/genomicVs.fasta

Best, Misha.

Questions in alignment results

Hi Quentin @qmarcou,

I am a new bioinformatics postdoc in Prof. Frederick Alt's lab. I have read your 2018 Nature Communication paper introducing the fantastic tool IGoR, and I am now trying to apply IGoR to our data in mouse. However, I encounter some questions about IGoR, firstly the align step.

(1) I have run through IGoR demo, written a script to display the alignment, and compared the alignment with IgBLAST results. Then, I had two strange observations: 1) The alignment of D and J often has one-bp mismatch at 5' end, which is dropped in IgBLAST. 2) Although I saw no mismatch in the alignment, IGoR -align reported many mismatches indexes.

For example, the third read sequence in the demo (seq_index = 2) is TCCCCAACCAGACAGCTCTTTACTTCTGTGCCACCAGTGACCCGGGTACAACGACGAGCA, and IGoR top alignments include:

#seq_index;gene_name;score;offset;insertions;deletions;mismatches;length;5_p_align_offset;3_p_align_offset
2;M11951|TRBV24-1*01|Homo sapiens|F|V-REGION|189..476|288 nt|1| | | | |288+0=288| | |;195;-244;{};{};{40,41,42};40;0;39
2; TRBD1*01;25;8;{};{};{8,9,15,16,17,18,19};6;9;14
2;M14159|TRBJ2-7*01|Homo sapiens|F|J-REGION|2316..2362|47 nt|2| | | | |47+0=47| | |;35;48;{};{};{49,50,52};8;52;59

With my scripts, I displayed the alignment, as shown below

V D J target length = 288 12 47
V TRBV24-1*01, obs 1:40:40 , ref 245:500:256  (1-based start:end:length)
obs TCCCCAACCAGACAGCTCTTTACTTCTGTGCCACCAGTGA
ref ........................................TTTG
D TRBD1*01, obs 10:15:6 , ref 2:7:6
obs AGACAG
ref G.....
J TRBJ2-7*01, obs 53:60:8 , ref 5:12:8
obs GACGAGCA
ref T.......

On the other hand, IgBLAST result is

V D J target length = 288 1000 47
V TRBV24-1*01, obs 1:40:40 , ref 245:284:40
obs TCCCCAACCAGACAGCTCTTTACTTCTGTGCCACCAGTGA
ref ........................................
D -, obs 41:40:0 , ref 1:0:0
obs 
ref 
J TRBJ2-7*01, obs 54:60:7 , ref 6:12:7
obs ACGAGCA
ref .......

I looked at some other examples, and see the one-bp mismatch at 5'-end of D or J in IGoR align results, but not in IgBLAST results, and I do not know why. Also, I am not sure why IGoR reported so many indexes of mismatches than actually displayed (maybe outside the alignment region?).

(2) In order to run IGoR -align, I am preparing input files. Along with three VDJ sequence fasta files, I saw V_gene_CDR3_anchors.csv and J_gene_CDR3_anchors.csv containing anchor_index for most V,J segments. I examined IMGT annotation (eg. http://www.imgt.org/ligmdb/view?id=U66059) and guessed the anchor_index for Vs is the start position of 2nd-CYS, and the anchor_index for Js is the start position of J-MOTIF (both 0-based). Is this guess correct?
I read IGoR document webpage, which says "The index should correspond to the first letter of the cysteine (for V) or tryptophan/phenylalanin (for J) for the nucleotide sequence of the gene.", and "If the considered sequences are nucleotide CDR3 sequences (delimited by its anchors on 3' and 5' sides) using the command --ntCDR3 alignments will be performed using gene anchors information as offset bounds.". So if I do not use --ntCDR3, is it necessary to provide the anchor_index for V and J?

Thank you very much for your kind attention. Looking forward to your reply.

Best regards,
Adam Yongxin Ye

about "iteration"of -evaluate

Hi Quentin,

when I tried to use my small dataset for a test of IGoR, I found the -evaluate cannot set up the "N" of iteration(as shown in webpage), not like -infer --N_iter 5(only for inference). If so, the -evaluate only iterates once according to my results. the likelihood in "-evaluate" result is not the best(~ -17, mean-log-likelihood). but if it iterates 5 times in "-infer", the likelihood will reach a plateau(~ -13, better than -17). What should I do to set the iterations number in "-evaluate"? or use -set_custom_model to the final_parms.txt and final_marginals.txt for -evaluate ? In your web documentation -set_custom_model corresponds to:

Use a custom model as a baseline for inference or evaluation. Note that this will override custom genomic templates for inference and evaluation. Alternatively, providing only the model parameters file will lead IGoR to create model marginals initialized to a uniform distribution

, which says it is a custom model set by customers. Thanks a lot!

Best,

Decen

Limiting IGoR's CPU usage

Installation issue on Mac OS X

First of all, congratulations for this fantastic software!
Unfortunately, I had a little issue during the installation process on a MacBook Pro 15" (late 2011, OS X version 10.11.6). Indeed, I found that the last step of the installation procedure (namely the make install command) aborted after trying to address the share/igor directory into my home:
mkdir: /Users/lupo/share/igor: No such file or directory
The same occurred on a newer MacBook laptop, so it would seem to be a general issue when trying to install IGoR on Mac OS X. Notice also that an analogous error occurred when trying to install IGoR in the default place with the administrator privileges.
I found a workaround, that is manually creating that directory and then launching again the make install command. IGoR seems to work perfectly now!
Finally, I would suggest the following guide, which I found very useful when trying to install the GNU gcc compiler on Mac OS X directly from the source (i.e. not via MacPorts or HomeBrew).

Install error

Hi,I use these commans as follows to install IGoR ,some error as follows:

include/jemalloc/internal/../jemalloc.h:59:28: error: redefinition of 'aligned_alloc'

define je_aligned_alloc aligned_alloc

src/jemalloc.c:1978:1: note: in expansion of macro 'je_aligned_alloc'
je_aligned_alloc(size_t alignment, size_t size) {
^~~~~~~~~~~~~~~~
In file included from include/jemalloc/internal/jemalloc_internal_decls.h:50:0,
from include/jemalloc/internal/jemalloc_preamble.h:5,
from src/jemalloc.c:2:
/root/anaconda3/x86_64-conda_cos6-linux-gnu/sysroot/usr/include/stdlib.h:513:21: note: previous definition of 'aligned_alloc' was here
static inline void* aligned_alloc (size_t al, size_t sz)

Could you help me ?
My centos version is 7.0.
Then:
./configure
make(some error)

Visualising insertion distributions with IGoR

Hi there,

I'm trying to access the insertion/deletion/gene-choice/etc probabilities from an inferred IGoR model for visualisation using pygor. I imported the pygor.models.genmodel.GenModel object and ran

g = GenModel(<path_to_parms_file>,<path_to_marginals_file>)

for the final_parms.txt and final_marginals.txt files in the igor/evaluate directory. Sticking with VD-insertions for now as an example, I could find two entries that looked promising:

g.marginals[0]["vd_ins"] gives me a vector of what look like probabilities, presumably of different numbers of insertions.
g.get_event("vd_ins", True).realizations gives me a list of Event_realization objects of the same length as the marginal array from the previous point. Each realisation R has an associated integer accessible by R.value (or R.index - the two are always the same).

To get from here to a plot I need some more information about how data is organised in the GenModel object, which I'd be grateful if you could shed some light on:

Are the probabilities in the marginals array in order (i.e. element 0 is the probability of 0 insertions, element 1 the probability of 1 insertion, and so on)? If not, what is the order based on?
Are the realization.value integers the number of insertions associated with each realization? What does the order of realizations in the event tell me?
What's the relationship between the array of probabilities and the list of realizations?

Some of these questions are probably redundant. :-)

Thanks!
Will

Allelic variants in generated repertoire

Hi Quentin

I haven't managed to find any information about how IGoR handles allelic variants presented in models. In default models some IGHV, TRAV and TRBV genes have several alleles (up to 7) and all of them have non-zero probability. Obviously, having more than 2 alleles of one gene in one repertoire is not realistic (if not considering chimeras).

It seems that I have to edit the models manually to have not more than 2 alleles of each gene. How to do it in a proper way? Should I just set to zero the probabilities of all alleles except of the most frequent two and recalculate their probabilities?

Best, Anna

NaN values for (alpha) chain recombination probabilities!

Hi Quentin,

I noticed that for calculating alpha chain probabilities I have the problem of almost always getting nan-values in the output. I looked at the sequences but I do not see a pattern. At the same time nan-values are an exception for my beta-chains.
Do you have an idea about what is going wrong there?

Best,
Jonas

request the command line for "-run_demo"

Hi Qmarcou,

Could you please post the command line for the "-run_demo"? It is not easy for the layman like me to understand what should be started with. If I (and the other laymen like me) have the original codes, I can follow to study. At the beginning of TCR analysis, it is really big trouble understanding the codes and descriptions, since I have no experience in this. Thanks a lot!

Best,

Decen

Continuing a random number sequence

In our research, we anticipate the need to generate very large synthetic repertoires. Since this process can take a long time, it would be nice to have the ability to pick up the random number sequence where we left off so that the repertoire generation does not need to be done as a single compute job.

Although we can probably choose a new seed for each run - for a 64 bit random number generator it is highly unlikely that we would choose a seed that overlaps the previous sequence - it would be better to continue where we stopped.

This is a low-priority request and we are happy to assist.

Segmentation Fault in -run_demo

Hello,

I believe I have installed IGoR successfully with ./config --prefix=$(pwd) && make && make install. When running the demo, I run into a segmentation fault after alignment during construction of the model. The alignment files seem to populate properly and have contents in them. Below is the output from command line in -run_demo:

Running demo code
Working directory set to: "/tmp/"
Reading genomic templates
Reading sequences and aligning
mkdir: cannot create directory ‘/tmp/igor_demo/’: File exists
V_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
D_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
J_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Alignments procedure lasted: 10.6972 seconds
for 300 TCRb sequences of 60bp(from murugan and al), against 89 Vs,3 Ds, and 15 Js full sequences
Construct the model
Segmentation fault

We appreciate your help with this tool and look forward to running it soon. Below is the output and strack trace from gdb:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) file ./bin/igor
Reading symbols from /users/PAS0472/osu8725/tools/igor_1-4-0/bin/igor...done.
(gdb) run -run_demo
Starting program: /users/PAS0472/osu8725/tools/igor_1-4-0/./bin/igor -run_demo
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/apps/gnu/8.4.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
        add-auto-load-safe-path /apps/gnu/8.4.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/users/PAS0472/osu8725/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/users/PAS0472/osu8725/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
Running demo code
Working directory set to: "/tmp/"
Reading genomic templates
Reading sequences and aligning
Detaching after fork from child process 116403.
mkdir: cannot create directory ‘/tmp/igor_demo/’: File exists
[New Thread 0x2aaaacba8700 (LWP 116404)]
[New Thread 0x2aaaacda9700 (LWP 116405)]
[New Thread 0x2aaaacfaa700 (LWP 116406)]
[New Thread 0x2aaaad1ab700 (LWP 116407)]
[New Thread 0x2aaaad3ac700 (LWP 116408)]
[New Thread 0x2aaaad5ad700 (LWP 116409)]
[New Thread 0x2aaaad7ae700 (LWP 116410)]
[New Thread 0x2aaaad9af700 (LWP 116411)]
[New Thread 0x2aaaadbb0700 (LWP 116412)]
[New Thread 0x2aaaaddb1700 (LWP 116413)]
[New Thread 0x2aaaadfb2700 (LWP 116414)]
[New Thread 0x2aaaae1b3700 (LWP 116415)]
[New Thread 0x2aaaae3b4700 (LWP 116416)]
[New Thread 0x2aaaae5b5700 (LWP 116417)]
[New Thread 0x2aaaae7b6700 (LWP 116418)]
[New Thread 0x2aaaae9b7700 (LWP 116419)]
[New Thread 0x2aaaaebb8700 (LWP 116420)]
[New Thread 0x2aaaaedb9700 (LWP 116421)]
[New Thread 0x2aaaaefba700 (LWP 116422)]
[New Thread 0x2aaaaf1bb700 (LWP 116423)]
[New Thread 0x2aaaaf3bc700 (LWP 116424)]
[New Thread 0x2aaaaf5bd700 (LWP 116425)]
[New Thread 0x2aaaaf7be700 (LWP 116426)]
[New Thread 0x2aaaaf9bf700 (LWP 116427)]
[New Thread 0x2aaaafbc0700 (LWP 116428)]
[New Thread 0x2aaaafdc1700 (LWP 116429)]
[New Thread 0x2aaaaffc2700 (LWP 116430)]
[New Thread 0x2aaab01c3700 (LWP 116431)]
[New Thread 0x2aaab03c4700 (LWP 116432)]
[New Thread 0x2aaab05c5700 (LWP 116433)]
[New Thread 0x2aaab07c6700 (LWP 116434)]
[New Thread 0x2aaab09c7700 (LWP 116435)]
[New Thread 0x2aaab0bc8700 (LWP 116436)]
[New Thread 0x2aaab0dc9700 (LWP 116437)]
[New Thread 0x2aaab0fca700 (LWP 116438)]
[New Thread 0x2aaab11cb700 (LWP 116439)]
[New Thread 0x2aaab13cc700 (LWP 116440)]
[New Thread 0x2aaab15cd700 (LWP 116441)]
[New Thread 0x2aaab17ce700 (LWP 116442)]
V_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
D_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
J_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Alignments procedure lasted: 11.541 seconds
for 300 TCRb sequences of 60bp(from murugan and al), against 89 Vs,3 Ds, and 15 Js full sequences
Construct the model

Program received signal SIGSEGV, Segmentation fault.
0x0000000000451b66 in operator= (other=..., this=0x7fffffff7590) at Utils.h:139
139                     for(int i = 0 ; i != rows*cols ; i++){
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64
(gdb) bt
#0  0x0000000000451b66 in operator= (other=..., this=0x7fffffff7590) at Utils.h:139
#1  Dinucl_markov::Dinucl_markov(Gene_class) () at Dinuclmarkov.cpp:43
#2  0x000000000041e516 in main () at main.cpp:1621
#3  0x00002aaaabdfc545 in __libc_start_main () from /lib64/libc.so.6
#4  0x00000000004281d9 in _start () at Dinuclmarkov.cpp:583

Thanks again,

Altan

How to get everyone sequence clonotype ?

Hi, I start to learn IGoR software.I use the command as follows:
(1).igor -set_wd $WDPATH -batch foo -read_seqs ./demo/demo.fasta
(2).igor -set_wd $WDPATH -batch foo -species human -chain IGL -align --all -threads 25
(3).igor -set_wd $WDPATH -batch foo -species human -chain IGL -evaluate -output --scenarios 10
(4).igor -set_wd $WDPATH -batch bar -species human -chain IGL -generate 100
and I get the foo_output:
seq_index;scenario_rank;scenario_proba_cond_seq;GeneChoice_V_gene_Undefined_side_prio7_size69;GeneChoice_J_gene_Undefined_side_prio6_size7;Deletion_V_gene_Three_prime_prio5_size21;Deletion_J_gene_Five_prime_prio5_size23;Insertion_VJ_gene_Undefined_side_prio4_size41;DinucMarkov_VJ_gene_Undefined_side_prio3_size16;Mismatches
39959;1;1;(4);(3);(8);(7);(4);(1,3,1,3);(1,2,3,5,7,8,9,11,13,14,15,16,18,19,20,125,156)
38201;1;0.382984;(41);(2);(2);(4);(0);();(0,2,3,4,6,7,8,9,12,14,17,18,19,48,58,67,166)
38201;2;0.245643;(41);(1);(2);(4);(0);();(0,2,3,4,6,7,8,9,12,14,17,18,19,48,58,67,166)
38201;3;0.193471;(41);(2);(2);(5);(1);(3);(0,2,3,4,6,7,8,9,12,14,17,18,19,48,58,67,166)

How could I get everyone sequence clonotype information,such as:

clonotype identifier	representative query sequence name	count	frequency (%)	CDR3 nucleotide sequence	CDR3 amino acid sequence	productive status	chain type	V gene	D gene	J gene
1a	number:1_length:188_4051	7	0.0241	CAGTCCTATGACAGCAGCCTGAGTGGTGCGGTG	QSYDSSLSGAV	No	VL	IGLV1-4001,IGLV1-4002	N/A	IGLJ3*02

Standardize CSV output file separator.

I have been working with the CSV files that IGoR produces as output, but some of them are inconsistent with use of column separator. For example generated_realizations_werr.csv and generated_seqs_werr.csv us the ; character as separator, while the generated_seqs_werr_CDR3_info.csv file uses a , comma to separate the fields. Is it possible to make this more consistent by using ; for all of these files?

Confusion regarding ntCDR3 / set_CDR3_anchors

Hi @qmarcou,

I'm a bit confused about use of the ntCDR3 option in standard analysis and have had trouble inferring how exactly to use it in typical analysis by looking through the code. If for instance I have a set of CDR3 sequences that are additionally annotated with V and J information, does --ntCDR3 allow for alignment and downstream analysis (in particular, Pgen calculation) while maintaining the known V/J annotations? Any chance for a brief tutorial on this option included in the demo?

Thanks for your help!

File not found when IGoR is installed locally

Hello, I am getting the following error:

[IGoR] ERROR: Exception caught while reading TRA V genomic templates.
[IGoR] ERROR: File not found: /usr/local/share/igor/models/human/tcr_alpha/ref_genome/genomicVs.fasta
[IGoR] ERROR: Use "man igor", "igor -help" or visit https://github.com/qmarcou/IGoR to see available commands and their effects

It appears to be related to installing igor locally since these files do exist but they are located at: ~/share/igor/models/human/tcr_alpha/ref_genome/genomicVs.fasta

Thanks for your help!

read-seqs input parameter improvement

Hi Quentin,

According to the documentation for the -read-seqs parameter, the input CSV file should be formatted as: with the sequence index as first column and the sequence in the second separated by a semicolon ';'.

I would think that I would be able to pass in a CSV file with multiple semicolon separated columns and that IGoR will only use the first two. However, what happens is that each line is only separated on the first semicolon character found in that line. This means that the second column is combined with the remaining columns.

Example:

This index;sequence;other_data will turn into: index as first column and sequence;other_data as second column.
I would expect the following to happen: index as first column and sequence as second column.

Is there a reason for this behaviour?

Cheers, Wout

Chain IGoR commands

Hi Quentin,

Is it possible to chain IGoR commands? When I try to run something like this, it seems to work fine...

igor -set_wd $WDPATH -batch foo -read_seqs test.txt -species human -chain beta -align --all -evaluate -output --scenarios 10

or do I have to separate the commands and run them one by one?

Cheers, Wout

Add built in support the mouse igH analysis

Hi, Quentin
Could you add an database for IGoR so it can support mouse IgH analysis? In addition, I have some mouse IgH sequencing data, and could you show me how to build the customized mouse IgH model step by step? Highly appreciated!

Best
Zhou

Segmentation fault during inference

Hi Quentin,

I have a recurrent "Segmentation fault", crashing IGoR, with no error message during inference. It happens both on my laptop and on the cluster, usually at the start of an iteration (rarely the first one). The iteration at which the crash happens seems to be random, even with the same sequences, sometimes it can even end without a crash (especially if I look at small number of sequences, say < 10000) . The alignment part always goes smoothly. I tested that it is not tied to a particular sequence. Additionally, when there is no crash final_marginals.txt is produced and looks reasonable.

I'm providing IGoR my own genomics files and model_parms.txt and running it with the following commands:

cmd="igor -set_wd $wdpath -batch foo"
$cmd -read_seqs $sequences
cmd="$cmd -set_genomic --V $mygenomicVs --J $mygenomicsJs --D $mygenomicsDs -set_CDR3_anchors --V $myanchorsV --J $myanchorsJ"  
$cmd -align --V
$cmd -align --D --thresh 30
$cmd -align --J
$cmd -infer -set_custom_model $mymodel_params

I'm using the current version of IGoR, compiled with

./configure --prefix=$HOME/.local/ && make && make install

I've tried to run it through valgrind to pinpoint where the error happens, so I've compiled IGoR with debug flags:

./configure --prefix=$HOME/.local/ CPPFLAGS=-DDEBUG CXXFLAGS="-g -O0" && make && make install

And ran the same command as before (without the alignment part)

cmd="valgrind --leak-check=yes $cmd"
$cmd -infer -set_custom_model $mymodel_params

At the start of the inference process (just after "Initialization of probability bounds over."), I start to get multiple valgrind errors, with the first one being:

==24496== Invalid read of size 4
==24496==    at 0x169635: Dinucl_markov::update_event_internal_probas(std::unique_ptr<long double [], std::default_delete<long double []> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&) (Dinuclmarkov.cpp:479)
==24496==    by 0x186731: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:315)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496==  Address 0x13d50240 is 0 bytes after a block of size 209,760 alloc'd
==24496==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24496==    by 0x1D93B9: Model_marginals::Model_marginals(Model_marginals const&) (Model_marginals.cpp:48)
==24496==    by 0x18543E: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:193)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496==

That's it I think, if you need my genes/CDR3/model_parms files, or the sequences, just ask.

Additional question, is this the "right way" to provide IGoR with a non-supplied specie ?

Thanks,

Thomas

error : make/make install

1 error generated.
make[1]: *** [igor-Coverageerrcounter.o] Error 1
make: *** [install-recursive] Error 1

Hello qmarcou,
As suggested I did make clean , make and then make install. Still I am getting the same errors. Below is the log file.
[error_logs.txt](https://github.com/qmarcou/IGoR/files/2917457/error_logs.txt)

Thanks

Compilation issue with GCC 8.1 on Mac OS X

Hi,

I'm having a problem installing the software on a MacBook Pro 17" (late 2011, OS X version 10.10.5). I have GCC 8.1 installed via Homebrew. To begin I run ./configure CC=gcc-8 CXX=g++-8 and then I run make. During the make process, however, I receive the following error, causing the process to crash:

Undefined symbols for architecture x86_64:
"comp_nt_int(int const&, int const&)", referenced from:
Deletion::iterate(double&, Enum_fast_memory_map<Seq_type, double>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Int_Str const&, Enum_fast_memory_map<int, unsigned long>&, std::uno rdered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::shared_ptr<Rec_Event const>, int>, std::allocator<std::pair<std::shared_ptr<Rec_Event const>, int> > >, std::h ash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_ string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::shared_ptr<Rec_Event const>, int>, std::allocator<std::pair<std::shared_ptr<Rec_Event const>, int> > > > > > const&, std::shared_ptr<Re c_Event*>&, std::unique_ptr<long double [], std::default_delete<long double []> >&, std::unique_ptr<long double [], std::default_delete<long double []> > const&, std::unordered_map<Gene_class, std::vector<Alignment_data, std::all ocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > const&, Enum_fast_memory_map<Seq_type, Int_ Str*>&, Enum_fast_memory_dual_key_map<Seq_type, Seq_side, int>&, std::shared_ptr<Error_rate>&, std::map<unsigned long, std::shared_ptr<Counter>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ ptr<Counter> > > >&, std::unordered_map<std::tuple<Event_type, Gene_class, Seq_side>, std::shared_ptr<Rec_Event>, std::hash<std::tuple<Event_type, Gene_class, Seq_side> >, std::equal_to<std::tuple<Event_type, Gene_class, Seq_side > >, std::allocator<std::pair<std::tuple<Event_type, Gene_class, Seq_side> const, std::shared_ptr<Rec_Event> > > > const&, Enum_fast_memory_map<Event_safety, bool>&, Enum_fast_memory_map<Seq_type, std::vector<int, std::allocator< int> >*>&, double&, double&) in igor-Deletion.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[2]: *** [igor] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

I've also attached the entire make log in case this snippet isn't satisfactory.
make.log

finding non-productive sequences(original/no hypermutation/no insertion/no deletions)

Hi Quentin,

According to your published article, How could you find out and extract the non-productive sequences (for new model construction) from the raw data? Do you have any good ideas?

In your article, page 3, you mentioned: "By contrast, V and J usage varied moderately but significantly across individuals,......., suggesting possible primer-dependent biases." How could you understand this fact? after selection, the survived T cells are MHC-dependent, and the MHCs in individuals are substantially distinctive.
Thanks!

Cheers,

Decen

using the --coverage error

-infer -output --coverage

Batch name set to: foo_
Species parameter set to: human
Chain parameter set to: beta
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
when -infer -output --Pgen is OK ,but coverage will return error.why ?

Missing unknown subargument error for -output

Embarrassed to say I spent more time than I will admit trying to figure out why the -output --pgen option was not producing a Pgen_counts.csv file before I finally realized that the command is case-sensitive -output --Pgen. Might be worthwhile to report errors when oblivious people like me include subargs and subsubargs that don't exist!

Compilation issue with Mac OS Mojave

I am having trouble compiling IGoR on my Mac running macOS Mojave 10.14.3

$ ./configure CC=/usr/local/Cellar/gcc/7.3.0_1/bin/gcc-7 CXX=/usr/local/Cellar/gcc/7.3.0_1/bin/g++-7
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-apple-darwin18.2.0
checking host system type... x86_64-apple-darwin18.2.0
checking how to print strings... printf
checking for style of include used by make... GNU
checking for gcc... /usr/local/Cellar/gcc/7.3.0_1/bin/gcc-7
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... configure: error: in `/Users/jgagnon/igor_1-3-0':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.

I am trying to use gcc-7 based on a prior issue I saw regarding gcc-8 but I am getting a different error. Attached is the log.

Thanks for your help!

config.log

what's the definition of model or contents of model?

Hi Quentin,

As a beginner, I have to come up with this kind of stupid question. What are the detailed contents in the model when learning? As I can, IGoR could learn from a dataset to get all the V, D, J genes, junctions, length of CDR3, insertions and deletions information followed by -infer/evalute process. Yes, this step is fulfilled by -align. Is that right?
However, in your article and "[https://qmarcou.github.io/IGoR/]" or man -igor, there are a lot of index/indices. I am a bit confused by this universal "index", what's it? Thanks!

Best,

Decen

Using IGoR for species not supplied

Hi Quentin,

In your docs, you stated in the third paragraph of Advanced Usage:

Note that changing the GeneChoice realizations can be done automatically (without manually editing the recombination parameter file) by supplying the desired set of genomic templates to IGoR using the -set_genomic command. This could be used e.g to define a model for a chain in a species for which IGoR does not supply a model starting from of model for this chain from another species.

What exactly do you mean in the last sentence? It's unclear. Do you mean, perhaps, that the commands I've written below would allow for model_parms.txt and the like to be produced for a new species?

Example bash script template and IGoR commands

We first set up a convenient variable for setting all the genomic options.

genomeDirectory=/path/to/genomeInformationForNewSpecies
setCustomGenome="-set_genomic --V ${genomeDirectory}genomicVs.fasta\
                -set_genomic --D ${genomeDirectory}genomicDs.fasta\
                -set_genomic --J ${genomeDirectory}genomicJs.fasta\
                -set_CDR3_anchors --V ${genomeDirectory}V_gene_CDR3_anchors.csv\
                -set_CDR3_anchors --J ${genomeDirectory}J_gene_CDR3_anchors.csv"

We then run IGoR employing these options.

igor -set_wd workingDirectory -batch batchName -read_seqs inputFile
igor -set_wd workingDirectory $setCustomGenome -batch batchName -align --all
igor -set_wd workingDirectory $setCustomGenome -species human\
 -chain heavy_naive -batch batchName -infer -output --scenarios 10 --Pgen

We initialize a model for the new species from another (predefined) species, here the human model in IGoR.

Recommendations for inference options

For a new species, what do you recommend be the options for N_iter and L_thresh in the inference stage? Currently, N_iter defaults to five and my Pgen values are mostly NaNs. Would increase N_iter improve this as well as manipulating L_thresh?

make new database

hi，i want to know if igor can create a new database based on the given fasta data？if it is possible,how to do？

Using the --coverage output subarg

Hi,

I've been unable to use the --coverage subarg of output without generating an error:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)

I was wondering if you could provide an example use of this subarg and the resulting output? Thanks!

TRA CDR3 Anchors

Hi qmarcou,

I noticed there are no CDR3 anchors present for TRA (*_gene_CDR3_anchors.csv contains only the headers). Are these anchors available somewhere? If not, do you have some pointers on how to obtain them?

Thanks in advance.

Pim

IGoR sequence generation - random number generation issue

I’ve been using the IGoR software to generate simulated T cell beta chain repertoires and ran into an issue that I believe is related to the random number generation. My colleagues at Vanderbilt University Medical Center, Cinque Soto and James Crowe, suggested I get in touch with you.

We need to generate very large repertoires to compare with the results obtained from deep immunome sequencing being carried out at Vanderbilt. When I was processing IGoR output, I noticed that the same results are generated every 52,895,649 sequences. We discovered this when calculating the number of unique clonotypes as a function of the number of sequences – after 52M+ reads, the number of clonotypes was unchanged.

Here’s the igor command line that we’ve been using. Note that I tried this with two different seeds and get the same behavior and same cycle length (52,895,649 sequences).

igor -threads -1 -set_wd $PWD -batch tcr_beta -species human -chain beta
-generate 1100000000 --CDR3 --seed 12345678 --noerr

For reasons that I don’t understand, the first few sequences are not repeated, but after a short time the random number generator settles into a predictable cycle.

The example grep output below shows how the 750th sequence is repeated. As you probably know, the “-n” option to grep prefixes the output with the line number for the matching line. You’ll notice that the difference in line numbers is always a multiple of 52,895,649 (e.g. 1,057,913,732 - 752 = 52,895,649 x 20)

$ grep -n 'TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT' \ generated_seqs_noerr_CDR3_info.csv

752:750,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
52896401:52896399,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
105792050:105792048,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
158687699:158687697,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
211583348:211583346,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
264478997:264478995,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
317374646:317374644,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
370270295:370270293,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
423165944:423165942,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
476061593:476061591,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
528957242:528957240,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
581852891:581852889,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
634748540:634748538,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
687644189:687644187,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
740539838:740539836,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
793435487:793435485,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
846331136:846331134,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
899226785:899226783,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
952122434:952122432,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1005018083:1005018081,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1057913732:1057913730,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0

$ grep -n '(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)' \ generated_realizations_noerr.csv

752:750;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
52896401:52896399;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
105792050:105792048;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
158687699:158687697;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
211583348:211583346;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
264478997:264478995;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
317374646:317374644;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
370270295:370270293;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
423165944:423165942;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
476061593:476061591;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
528957242:528957240;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
581852891:581852889;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
634748540:634748538;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
687644189:687644187;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
740539838:740539836;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
793435487:793435485;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
846331136:846331134;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
899226785:899226783;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
952122434:952122432;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1005018083:1005018081;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1057913732:1057913730;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)

Any help would be greatly appreciated. Let me add that this is fantastic software. We’ve been looking for a rigorous way to generate simulated repertoires and we’re convinced that IGoR is the right way to do this.

IGoR crashes when reading mouse V CDR3 anchors

Hi @qmarcou!

I am trying to run IGoR with the standard mouse beta chain model but have some problems with the alignment:

igor -set_wd . -batch foo -read_seqs myseqs.fasta -species mouse -chain beta -align --all

yields the following output:

Batch name set to: foo_
FASTA extension detected for the input sequence file
Species parameter set to: mouse
Chain parameter set to: beta
Working directory set to: "./"
[IGoR] ERROR: Exception caught while reading V CDR3 anchors.
[IGoR] ERROR: stoi
[IGoR] ERROR: Use "man igor", "igor -help" or visit https://bitbucket.org/qmarcou/igor to see available commands and their effects.
[IGoR] ERROR: Please report any bug to: [email protected]
[IGoR] ERROR: Terminating IGoR...

Note that I don't have this problem when I just read the sequences with
igor -set_wd . -batch foo -read_seqs myseqs.fasta
or try the alignment on the human model with
igor -set_wd . -batch foo -read_seqs myseqs.fasta -species human -chain beta -align --all
But I do get the same error about V CDR3 anchors even when I try to align only the J genes:
igor -set_wd . -batch foo -read_seqs myseqs.fasta -species mouse -chain beta -align --J

So the issue seems to be in the mouse model itself somewhere and probably has something to do with the V_gene_CDR3_anchors.csv file.

I looked into the V_gene_CDR3_anchors.csv file provided, and noticed that some of the TRBV genes that occur in the genomicVs.fasta file and the model_parms.txt file are missing from the anchor file (for example, TRBV11*01). Could this be why IGoR crashes? Or am I doing something wrong?

Thanks!

Inge

Errors: python to parse the output results?

Hi @qmarcou

I have installed the pygor by your command line. when I tried to use the submodule: pygor to dispaly the result. igor reported an error:
decen@bio:~/Downloads/seq/igor/foo3_output$ igor -pygor best_scenarios_counts.csv [IGoR] ERROR: Unknown IGoR command line argument "-pygor" [IGoR] ERROR: Use "man igor", "igor -help" or visit https://github.com/qmarcou/IGoR to see available commands and their effects. [IGoR] ERROR: Please report any bug by opening an issue on https://github.com/qmarcou/IGoR or email: [email protected] [IGoR] ERROR: Terminating IGoR...
the "-pygor" cannot be recognized.
If I run independently as the below command:
. /home/decen/packages/igor_1-3-0/pygor/pygor/counters/bestscenarios/bestscenarios.py csv.file
it also show errors:
from: can't read /var/mail/...models.genmodel from: can't read /var/mail/...utils bash: /home/decen/packages/igor_1-3-0/pygor/pygor/counters/bestscenarios/bestscenarios.py: line 32: syntax error near unexpected token('
bash: /home/decen/packages/igor_1-3-0/pygor/pygor/counters/bestscenarios/bestscenarios.py: line 32: def read_bestscenarios_values(scenarios_file, model_parms_file):'

Could you please help this ?

Thanks a lot!

Best,

Decen

Support output to AIRR tsv standard

There seems to be a community wish to establish standard formats.
The AIRR group propose a standard TSV format: http://docs.airr-community.org/en/latest/datarep/rearrangements.html

Model edge gene choice relations differ

Hi Quentin,

I was recently looking into some model comparisons and noticed that for the default IGoR TCRB human model has edges between the V-gene choice with the D-gene of J-gene choices as well as the J-gene choice with the D-gene choice:

%GeneChoice_V_gene_Undefined_side_prio7_size89;GeneChoice_D_gene_Undefined_side_prio6_size3
%GeneChoice_V_gene_Undefined_side_prio7_size89;GeneChoice_J_gene_Undefined_side_prio7_size15
%GeneChoice_J_gene_Undefined_side_prio7_size15;GeneChoice_D_gene_Undefined_side_prio6_size3

However, when I compare this to the TCRB human model that OLGA supplies by default or the ones I'm constructing locally. These only have the edge with the J-gene choice against the D-gene choice:

%GeneChoice_J_gene_Undefined_side_prio7_size15;GeneChoice_D_gene_Undefined_side_prio6_size3

Do you have any idea why this is and how it is possible to make a model with the additional gene choice edges?

Cheers, Wout

Paired end sequences and MiXCR.

Hi Quentin,

Can IGoR process paired end sequences? I didn't see this option in the manual input file section, see below,

Can be a fasta file, a csv file (with the sequence index as first column and the sequence in the second separated by a semicolon ';') or a text file with one sequence per line (format recognition is based on the file extension).

IGoR paper mentioned it can take grouped unique sequences from MiXCR, does that mean CDR3 sequences before clone assembly in MiXCR? Can you please enlighten more on this?

Thanks,
Poorva

[Alignments] Ill defined reversed offset upon in/dels

Insertions or deletions in sequences introduce a shift in the final offset which is counter intuitive when the offset is defined as a reversed offset.
The ad-hoc fix is to widen a bit the constraint on offsets to compensate for a few in/dels, however this is silly and make the aligner lose some precision.

Installation problems

Hello!

I've got the following problems with installing on our Ubuntu server:

./configure results in a warning "configure: WARNING: no configuration information is in igor_src"
make install fails with minmax.c:26:28: fatal error: gsl/gsl_minmax.h: No such file or directory

I use linuxbrew, so brew install gsl (installing GNU Scientific Library) and then make install solved my problem. I mean I've got a functional binary in igor_src (without man pages), and make install exited with

libtool: install: /usr/bin/install -c igor /usr/local/bin/igor
/usr/bin/install: cannot remove '/usr/local/bin/igor': Permission denied
Makefile:376: recipe for target 'install-binPROGRAMS' failed
make[2]: *** [install-binPROGRAMS] Error 1
make[2]: Leaving directory '/home/mikesh/distr/igor_1-1-0/igor_src'
Makefile:873: recipe for target 'install-am' failed
make[1]: *** [install-am] Error 2
make[1]: Leaving directory '/home/mikesh/distr/igor_1-1-0/igor_src'
Makefile:509: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1

qmarcou / igor Goto Github PK

igor's Introduction

IGoR: Inference and Generation Of Repertoires

Quick summary

Version

Documentation

Contact

Copying

igor's People

Contributors

Stargazers

Watchers

Forkers

igor's Issues

define je_aligned_alloc aligned_alloc

Example bash script template and IGoR commands

Recommendations for inference options

Recommend Projects

Recommend Topics

Recommend Org