crowelab / pyir Goto Github PK
View Code? Open in Web Editor NEWImmunoglobulin and T-Cell receptor rearrangement software
License: Other
Immunoglobulin and T-Cell receptor rearrangement software
License: Other
Line 175 in 1a4d9b8
Hello I am looking to run PYir on mice, but as I was going through the code I noticed that mouse is no longer an option that can passed into the tool. Is this something done intentional, or does this tool not work on mice anymore?
Thanks,
Dear PyIR maintainer,
Thanks for maintaining this tool. I installed it and give a test run on one of my fasta files, but get an empty output from it.
$ pyir myfasta.fa --outfmt tsv -m 8
4,555,603 sequences successfully split into 4554 pieces
Starting process pool using 8 processors
0%| | 0/4555603 [04:45<?, ?seq/s]
4,555,603 sequences processed in 297.47 seconds, 15,314 sequences / s
Zipping up final output
Analysis complete, result file: myfasta.tsv.gz
May I ask if you have an idea about why it should happen?
Thanks,
Hi, am looking at the ABhelix derived data from your paper "High frequency of shared clonotypes in human B cell receptor repertoires". The reads corresponding to IgA and IgM cover a part of the C gene which is enough for isotype identification, but reads in files corresponding to IgG1-IgG4 cover only last 5 nucleotides of the C gene. I wonder how IgG1-IG4 isotypes were determined?
There are column names misalignment in TSV output file. The length of TSV header is less than the actual number of output columns and all the column names after the newly added "complete_vdj" are misaligned from their meant column data.
Dear developer,
I noticed that pyIR currently supports input in the form of a single FASTA/FASTQ file. I'm working with paired-end Whole Exome Sequencing (WES) data and was wondering about the best approach to use this data with pyIR. Should I preprocess the paired-end data into a single file using tools like pRESTO before inputting it into pyIR?
Thanks for your guidance!
Dear developer,
I noticed that using inputting the same TCR fasta file (50 sequences, I used a small file so that IgBLAST server could run) into both PyIR and IgBLAST server, it seemed that IgBLAST recognized the CDR3 sequences in all the 50 sequences, while PyIR only recognized 24 of them. I wonder did it happen because of the quality control function of PyIR? Will this issue simply be solved if the fastq/fastq file is in high quality?
Below was my code in PyIR:
PyIR(query=FILE, args=['--outfmt', 'tsv', '-r', 'TCR', '--species', 'human'])
Hi, thank you for creating PyIR! Would you be open to provide an example output of PyIR so we can add it as a parsing option to https://github.com/immunomind/immunarch ? I tried to download files from Wiki, but some of them are not available, and I have troubles with the PyIR installation. So it would be great to have some PyIR outputs to quickly implement the parser and help PyIR users easily explore their AIRR data. Thank you, and let me know if you have any questions!
Issue to track the parser: immunomind/immunarch#84
-- Vadim
Hi,
Thanks for creating this software. Is there any functionality for annoting B cell isotypes? If not, are there any plans to add this in future versions?
Thanks
Hi there,
thanks for creating PyIR! I experience a funny issue where TCR beta chains are perfectly annotated out of the box, but alpha chains are not. Alpha chains are always marked as complete_vdj=F
, most gene calls are missing and most sequences (fwr1 to junctions) are missing. Is there something obvious I might be doing wrong? All sequences I used came from filtered and complete 10X output.
Many thanks,
Andreas
Is murine support no longer available?
For those who find this repository by means other than the BMC Bioinformatics publication, it may be helpful to have a LICENSE file available. I see the publication has the following:
License: Free to academics
Any restrictions to use by non-academics: Yes; non academics should contact the author for permission to use the software or license options for incorporation into software that is being sold for profit.
given that we have a list of protein sequences rather than nucleotide sequences in hand, could we use blastp program for immunoglobulin protein sequence blast as it shows in igBlast website?
When building the docker image, the Ig tests fails with what looks to be an improperly written regex:
Starting process pool using 4 processors
22%|██▏ | 108/500 [00:00<00:03, 109.25seq/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/site-packages/pyir/igblast.py", line 20, in run
return igblast_run.run_single_process(fasta_input_file, fastq_input_file)
File "/usr/local/lib/python3.7/site-packages/pyir/igblast.py", line 144, in run_single_process
total_parsed = parser.parse()
File "/usr/local/lib/python3.7/site-packages/pyir/parsers.py", line 615, in parse
self.output = parser.parse(line, self.output, previous_line_whitespace, self.seq_dict)
File "/usr/local/lib/python3.7/site-packages/pyir/parsers.py", line 92, in parse
self.hits.append({'gene': matches.group(1), 'bit_score': float(matches.group(2)), 'e_value':float(matches.group(3))})
ValueError: could not convert string to float: 'sapiens|IGHV8|P|V-REGION'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/pyir", line 14, in <module>
py_ir.run()
File "/usr/local/lib/python3.7/site-packages/pyir/factory.py", line 65, in run
output_pieces = self.run_pool(input_pieces, fastq_input_pieces, total_seqs)
File "/usr/local/lib/python3.7/site-packages/pyir/factory.py", line 218, in run_pool
for x in imap:
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
raise value
ValueError: could not convert string to float: 'sapiens|IGHV8|P|V-REGION'
This publication suggests that the default IGBlast word_size (9) is already too high, and PyIR is hardcoded to 11.
I'm running into real data where this is causing errors.
Please consider adding an argument that would allow users to set their desired word_size.
http://simlab.biomed.drexel.edu/papers_published/discrimination_zhang_2015.pdf
Hi,
I have been able to run pyir only on human. When I try to set up database for other species by "pyir setup" and then run pyir blast search it does not work. It always default back to human. I noticed in the pyir arg_parse.py, the choices is only for human. I tried uncommented line 175 choices=['human'] and commented the line 176 choices=['human','mouse', 'rabbit', 'rat', 'rhesus_monkey' ] but it pyir still doesn't work for any other species but human. The pyir setup command seems to run fine but it seems like there is some other arguments except the one species choices are preventing it to find the database. Any suggestion?
Hi PyIR team,
With PyIR 1.3, we have noticed that many queries/alignments are missing FR4. A closer look indicates that many sequences do contain the FR4 but not being reported in PyIR 1.3.
Below is one example annotated by PyIR 1.0 and PyIR 1.3 respectively; only minimal information included to reproduce behavior; PyIR 1.0 results has proper end of FR4 and agrees much better with Igblast web tool.
PyIR 1.0
"Raw Sequence":"CAACATCCGAGCAGGGTTATCTGGTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACGTTTTTCAGGTCCTCTGTGACCGTGAGCCTGGTGCCCGGCCCGAAGTACTGCTCGTAGGATTCCAACCCCCACTCCCATCTAGCACTGCAGATGTAGAAGCTGCTGTCTTCAGGATGGGCACTGGTCACTGTCAGAGTGGACAAGGTCAGGCTTGCATGGTTGATGAGAAACTTGTCCTTCTCGACGCCTTGCTCGTATGTGGCCTTGGAGCCCTCATTGGAAGTTGCCATCAGCATGAGACTCTGTTTCGGGAACTGACGATACCAAAACATAGTTGTGGCCTGAAAGTCCAGGGAACGGCACTCGATCTTCACAGAGGTTCCACTCTTACAGATAACCCTGCTCGGATGTTGAGAGACGACAGCACCAAGCCCGGAGCCTGGCCCCAGAAGCAGCAGAAGCAGCAGCATCTTCCGTGATGGCCTCACACCACCTTCTCTGGGGAGAGTTCAGAGCGCAGAGC",
"CDR1":{
"from":174.0,
"to":191.0,
"length":18.0,
"matches":18.0,
"mismatches":0.0,
"gaps":0.0,
"percent identity":100.0,
"AA":"DFQATT",
"NT":"GACTTTCAGGCCACAACT"
},
"FR2":{
"from":192.0,
"to":242.0,
"length":51.0,
"matches":51.0,
"mismatches":0.0,
"gaps":0.0,
"percent identity":100.0,
"AA":"MFWYRQFPKQSLMLMAT",
"NT":"ATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACT"
},
"CDR2":{
"from":243.0,
"to":263.0,
"length":21.0,
"matches":21.0,
"mismatches":0.0,
"gaps":0.0,
"percent identity":100.0,
"AA":"SNEGSKA",
"NT":"TCCAATGAGGGCTCCAAGGCC"
},
"FR3":{
"from":264.0,
"to":377.0,
"length":114.0,
"matches":114.0,
"mismatches":0.0,
"gaps":0.0,
"percent identity":100.0,
"AA":"TYEQGVEKDKFLINHASLTLSTLTVTSAHPEDSSFYIC",
"NT":"ACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGC"
},
"CDR3":{
"from":378.0,
"to":386.0,
"length":9.0,
"matches":9.0,
"mismatches":0.0,
"gaps":0.0,
"percent identity":100.0,
"AA":"SARWEWGLESYEQY",
"NT":"AGTGCTAGATGGGAGTGGGGGTTGGAATCCTACGAGCAGTAC"
"FR4":{
"AA":"FGPGTRLTVT",
"NT":"TTCGGGCCGGGCACCAGGCTCACGGTCACAG"
PyIR 1.3 :
"sequence":"GCTCTGCGCTCTGAACTCTCCCCAGAGAAGGTGGTGTGAGGCCATCACGGAAGATGCTGCTGCTTCTGCTGCTTCTGGGGCCAGGCTCCGGGCTTGGTGCTGTCGTCTCTCAACATCCGAGCAGGGTTATCTGTAAGAGTGGAACCTCTGTGAAGATCGAGTGCCGTTCCCTGGACTTTCAGGCCACAACTATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACTTCCAATGAGGGCTCCAAGGCCACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGCAGTGCTAGATGGGAGTGGGGGTTGGAATCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGACCAGATAACCCTGCTCGGATGTTG",
"fwr1":"GGTGCTGTCGTCTCTCAACATCCGAGCAGGGTTATCTGTAAGAGTGGAACCTCTGTGAAGATCGAGTGCCGTTCCCTG",
"fwr1_aa":"GAVVSQHPSRVICKSGTSVKIECRSL",
"cdr1":"GACTTTCAGGCCACAACT",
"cdr1_aa":"DFQATT",
"fwr2":"ATGTTTTGGTATCGTCAGTTCCCGAAACAGAGTCTCATGCTGATGGCAACT",
"fwr2_aa":"MFWYRQFPKQSLMLMAT",
"cdr2":"TCCAATGAGGGCTCCAAGGCC",
"cdr2_aa":"SNEGSKA",
"fwr3":"ACATACGAGCAAGGCGTCGAGAAGGACAAGTTTCTCATCAACCATGCAAGCCTGACCTTGTCCACTCTGACAGTGACCAGTGCCCATCCTGAAGACAGCAGCTTCTACATCTGC",
"fwr3_aa":"TYEQGVEKDKFLINHASLTLSTLTVTSAHPEDSSFYIC",
"fwr4":"",
"fwr4_aa":"",
"cdr3":"AGTGCTAGATGGGAGTGGGGGTTGGAATCCTACGAGCAGTAC",
"cdr3_aa":"SARWEWGLESYEQY",
UPDATE: A closer look tells me that this is only relating to igblast's AIRR output parser. The legacy parser is still as good but doesn't support tsv outfmt. It will be nice to have the AIRR parser return FR4.
Hi, I am trying to download Adaptive Biotechnologies data sets (FASTA) from the publication "High frequency of shared clonotypes in human B cell receptor repertoires". But the link does not seem to work.
Hi,
I am trying to run PyIR on many files simultaneously. It is a great wrapper for IgBlast and I have been able to parse the json files easily. But, I am running into a space issue with the tmpdir.
Line 26 of factory.py contains this command:
args['tmp_dir'] = tempfile.mkdtemp()
Which appears to be root cause of the "No space left on device" error that I am seeing below. Some of the jobs work but most fail. I am running this job on a cluster.
My current workaround will be to set the environment variable TMPDIR to a directory with sufficient space, but this may be an issue that others may run into as well.
Error output:
File "/data/omicscore/Easterhoff-Easterhoff-20190501/scripts/PyIR/bin/./pyir", line 14, in <module>
py_ir.run()
File "/data/omicscore/Easterhoff-Easterhoff-20190501/scripts/conda/envs/py36/lib/python3.6/site-packages/pyir/factory.py", line 53, in run
total_seqs, input_pieces, fastq_input_pieces = self.split_input_file(input_format)
File "/data/omicscore/Easterhoff-Easterhoff-20190501/scripts/conda/envs/py36/lib/python3.6/site-packages/pyir/factory.py", line 159, in split_input_file
Bio.SeqIO.write(seq, current_pieces[proc_index], 'fasta')
File "/data/omicscore/Easterhoff-Easterhoff-20190501/scripts/conda/envs/py36/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 529, in write
fp.write(format_function(record))
OSError: [Errno 28] No space left on device
Thanks,
Jen
These are my args
'--species', 'human', '-r', 'TCR', '--legacy', '--sequence_type','prot'
when I reverse translate any given sequence into nucleotides and run it under the nucl flag, I get a correct output.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.