simon-anders / htseq Goto Github PK

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Home Page: https://htseq.readthedocs.io/en/release_0.11.1/

License: GNU General Public License v3.0

Python 47.47% Shell 1.15% C++ 47.48% Makefile 0.56% SWIG 3.34%

htseq's People

Contributors

Stargazers

Watchers

htseq's Issues

CI tidying-up

Hey Simon,

If that's ok with you I'd start by fixing the status of travis on both branches 'master' and 'python3'. Having good tests in place is essential to fix bugs down the road.

For instance, the python2 travis worker requires gcc 5.0 but many folks will be stuck with gcc 4.8+ so.
That ok?

How to perform exon level quantifications?

Hi,

This is my htseq-count command:

htseq-count -f bam \
-r name -s yes \
-a 10 -t exon -i exon_id \
-m union \
sample_Aligned.out.bam gencode.v23.annotation_subset.gff > sample_Aligned.out.table.txt

I converted Gencode v23 which looks like this:

chr1	HAVANA	gene	229431245	229434098	.	-	.	gene_id "ENSG00000143632.14"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; level 2; havana_gene "OTTHUMG00000038006.3";
chr1	HAVANA	transcript	229431245	229434094	.	-	.	gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";
chr1	HAVANA	exon	229434004	229434094	.	-	.	gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; exon_number 1; exon_id "ENSE00001433404.2"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";
chr1	HAVANA	exon	229432987	229433127	.	-	.	gene_id "ENSG00000143632.14"; transcript_id "ENST00000366684.7"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "ACTA1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "ACTA1-001"; exon_number 2; exon_id "ENSE00001380587.1"; level 2; protein_id "ENSP00000355645.3"; tag "basic"; transcript_support_level "1"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS1578.1"; havana_gene "OTTHUMG00000038006.3"; havana_transcript "OTTHUMT00000092781.1";

to gff using dexseq_prepare_annotation:

python dexseq_prepare_annotation.py gencode.v23.annotation_subset.gtf gencode.v23.annotation_subset.gff

And these are the first few lines of the gff file:

chr1    dexseq_prepare_annotation.py    aggregate_gene  229431245   229434098   .   -   .   gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431245   229431248   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "001"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431249   229431642   .   -   .   transcripts "ENST00000366683.3+ENST00000366684.7"; exonic_part_number "002"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431721   229431862   .   -   .   transcripts "ENST00000366683.3+ENST00000366684.7"; exonic_part_number "003"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431863   229431902   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "004"; gene_id "ENSG00000143632.14"
chr1    dexseq_prepare_annotation.py    exonic_part 229431994   229432185   .   -   .   transcripts "ENST00000366684.7"; exonic_part_number "005"; gene_id "ENSG00000143632.14"

Using this when I run my htseq-count command, all reads map to these features:

__no_feature    85273631
__ambiguous 0
__too_low_aQual 0
__not_aligned   2970335
__alignment_not_unique  12964119

How do I do exon level quantifications using htseq-count?

releasing 0.7.0 on PyPI

@simon-anders I made HTSeq compatible with pip for both the python2 and python3 branches and tagges 2 new releases (each for one python version) on the github repo.

To complete the modernization of HTSeq, we have to push the master 0.7.0 release (python2) onto PyPI, which is currently stuck at 0.6.1:

https://pypi.python.org/pypi/HTSeq

That requires separate credentials which I do not have, so could you push it over there please?

NOTE: because of bugs in old setuptools, one cannot just pip install HTSeq but must rather install Cython and matplotlib by hand before. Details are in the current README.md file, so if you could include some of that onto the PyPI page that would be helpful.

NOTE: another nice thing would be if you could mention the python3 branch for folks that prefer that.

--nonunique all

Hi, I have been trying to get the counts for features that are very close together, but reads will likely overlap (ChIP-seq, not RNA). I have been having errors when I try to use the --nonunique option. This is the error I get:

$ htseq-count -f bam -r pos -s reverse -t transcript -i transcript_id --nonunique-all ARPE19_ChIP_Control01.final.sorted.bam /projects/scratch/skcm/ARPE19/RNA/Homo_sapiens.GRCh38.90.gtf >> Control_01.reverse.transcript.counts.txt

Usage: htseq-count [options] alignment_file gff_file

htseq-count: error: no such option: --nonunique-all

I have tried updating my version with pip install --upgrade HTSeq, but I am still getting the error. I am using Python 2.7.3.

Is there a solution for this?

Thank you,

Dave

new SAM field 'I'

Guillermo Parada from Sanger reported on 1 Feb 2016:

[...]
I writing to you to kindly inform that dexseq_count.py is currently incompatible with the SAM output of STAR (probably the most widely used mapping tool nowadays). This is the error that I get:

Traceback (most recent call last):
    File “$/dexseq_count.py", line 187, in <module>
      for a in reader( sam_file ):
    File “$/miniconda/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 536, in __iter__
      algnt = SAM_Alignment.from_SAM_line( line )
    File "_HTSeq.pyx", line 1311, in HTSeq._HTSeq.SAM_Alignment.from_SAM_line (src/_HTSeq.c:25196)
    File "_HTSeq.pyx", line 1180, in HTSeq._HTSeq._parse_SAM_optional_field_value (src/_HTSeq.c:23178)
 ValueError: ("SAM optional field with illegal type letter ':'", 'line 12 of file Index1/STAR.worm.results/Index1.Aligned.out.sam.C_SJ’)

I figured out that this error is due to two optional field of the STAR SAM output that are not currently supported by the HTSeq library. This is how STAR SAM format looks like:

HISEQ:98:HA4PLADXX:1:1101:1160:84026    163     chrI    15064313        255     101M    =       15064403        125     GCATCGAGTATCGATGAAGAACGCAGCTTGCTGCGTTACTTACCACGAATTGCAGACGCTTAGAGTGGTGAAATTTCGAACGCATAGCACCAACTGGGCCT   BBBFFFFFFFFFFIIFIFIFFFFFFFIBFFBFFFFIIIFFFIIIIIIFFFFIFIIIFFFFFFFFFBBB7<<BBFBBFFBFFFFBFFBBFFBBFBBFFBFBB   NH:i:1  HI:i:1  AS:i:132
          nM:i:0  NM:i:0  MD:Z:101        jM:B:c,-1       jI:B:i,-1

The two last optional fields are the responsible of this incompatibility issue. Fortunately, I can continue my work by manually removing those flags (jM:B:c,-1 and jI:B:i,-1) But I thought that was nice to report this to your group as probably other users are experiencing similar difficulties. In my experience the usage of pysam library instead of HTSeq can make your script more robust to this odd optional fields.

htseq-count: my_showwarning() takes from 3 to 5 positional arguments but X were given

hello! I'm trying to use htseq-count to count reads from sam file, but it always fails.

Error occured when processing SAM input (line 537 of file lnc1-r.sortn.sam):
my_showwarning() takes from 3 to 5 positional arguments but 6 were given
[Exception type: TypeError, raised in warnings.py:99]

The script is
#!/bin/bash
#SBATCH -N 1 -c 16
exe=htseq-count
$exe -s no lnc1-r.sortn.sam gencode.all.gtf

I got sam file with hisat2, and the script is

#!/bin/bash
#SBATCH -N 1 -c 16
exe=hisat2
$exe -p 16 -x /data/wu/zeng/RNAseq/lncrna/GRCh38revised/GRCh38r -1 /data/wu/zeng/RNAseq/lncrna/1r/z-1-34_L7_1.clean.fq -2 /data/wu/zeng/RNAseq/lncrna/1r/z-1-34_L7_2.clean.fq -S lnc1-r.sam

I convert sam to bam ,sort by name ,and covert bam to sam.

The genome data and gtf were downloaded from Gencode.

memmap_dir not passed on to ChromVector.create

Hi,

I found that when creating a GenomicArray according to
GenomicArray(chromsize, stranded=False, storage='memmap', memmap_dir='my/path')
the .nmm files are be created in the current directory ./ instead of
in the desired folder my/path.

I have solved this issue in PR #46 .

Best,
Wolfgang

Can't import HTSeq with version 0.9.0

After updating to version 0.9.0, I get the following error when trying to import HTSeq:

Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/HTSeq/init.py", line 9, in
from _HTSeq import *
File "init.pxd", line 155, in init HTSeq._HTSeq (src/_HTSeq.c:41483)
ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96

pickling does not work

Thank you very much for supporting python3!

In the python3 release (pip) pickling does not seem to work with neither pickle nor dill.

The unpickled object always reports: "Unable to get repr for <class 'HTSeq._HTSeq.Sequence'>"

I hope you can solve this issue as otherwise multiprocessing does not make too much fun :(

Pysam error

Using HTSeq 0.6.1p2 and pysam 0.9 i got this error but only on some bam file from TCGA (aligned with STAR). The error arise both with python 2.7 and above (2.7.3). The error seems to be cause by reads like this one:
UNC11-SN627_70:1:1101:3483:1995 69 * 0 0 * * 0 0 NGGTGCTTTATTCTCCACAGAGTGATACATGCTAAGGTGGGTTGGGCTTG #:99:;<<<;D:7DDDDDDDDD@DDDDDDDDDDDDDDDDD6:DDDDDDDD NH:i:0 HI:i:0 AS:i:49 nM:i:0 uT:A:4 RG:Z::110325_UNC11-SN627_0070_AB02DVABXX.1

import HTSeq
bamfile = HTSeq.BAM_Reader("file.bam")
for read in bamfile:
pass

Traceback (most recent call last):
File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 556, in
main()
File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 529, in main
cvg_bam_unstranded(bamfile)
File "/u/home/a/annibal/project/scripts/SINEs_find.py", line 195, in cvg_bam_unstranded
for read in file:
File "/u/home/a/annibal/.local/lib/python2.7/site-packages/HTSeq/init.py", line 947, in iter
yield SAM_Alignment.from_pysam_AlignedSegment( pa, sf )
File "python2/src/HTSeq/_HTSeq.pyx", line 1338, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedSegment (src/_HTSeq.c:29512)
File "pysam/libcalignmentfile.pyx", line 1635, in pysam.libcalignmentfile.AlignmentFile.getrname (pysam/libcalignmentfile.c:18288)
File "pysam/libcalignmentfile.pyx", line 699, in pysam.libcalignmentfile.AlignmentFile.get_reference_name (pysam/libcalignmentfile.c:9258)
ValueError: reference_id -1 out of range 0<=tid<2779

I have tried to upgrade both HTSeq at 0.7.2 and pysam to 0.11.2.1 but the problem persists.

Problem with utf-8 Unicode

Hi,

I try to run htseq-count from my MacOs. I am using python3.6. When I type the command line
htseq-count --stranded=no file.bam file.gft
hstseq starts the run and after processing the gtf file, it gave me this error:
Error occurred when reading beginning of SAM/BAM file
'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
[Exception type: UnicodeDecodeError, reaised in codecs.py:321]

The bam file is sorted by name

Any idea how to solve this?

pysam api supporting

I would like to compare the cigarstring of two segments, and ** query_alignment_sequence** is very useful to ignore softclip bases. All these properties is in pysam.AlignedSegment API, would you like to keep these property?

Problem with HTSeq-count with anything other than ENSEMBL

Hi,

I'm currently analyzing RNA-seq data and have been trying to use HTSeq-count for read count. My problem is that feature counts are always 0, even though the program runs through without error.
The only thing that is working is a sam file (with human ENSEMBL alignments) with ENSEMBL human GTF file.
If I try to run sam file that was done by aligning to NCBI human genome with NCBI GFF3 file I get nothing. I adjusted GTF file default settings to support the use of NCBI GFF3 file (changing --idattr and --type accordingly) instead of default ENSEMBL GTF file.
In addition, feature counts are again 0 when using ENSEMBL GTF file and sam file where the reads have been aligned to FASTA from other sources (for example RFAM database or collection of bacterial genomes).

I would need to count my reads from alignment to RFAM database and a collection of bacterial genomes, but I'm a bit lost how to do that or how to modify the settings since the "easy ones" aka NCBI human alignment sam file + NCBI GFF3 file combination is not yielding result.

How could I fix this?
Isn't the HTSeq count supposed to work with any GTG file and sam files aligned to any FASTA if the settings have been adjusted?
I'm a beginner in python and RNA-seq so writing my own script for this is a bit beyond my reach and I really think that there is something that I have completely missed in just the basic usage.

BRs,

Anna

AttributeError, raised in _HTSeq.pyx:1358

Hi simon-anders,

I'm using htseq-count 0.9.1 on conda and I got this error when I tried to use it on a sorted, indexed bam file (using bowtie2 2.3.0 with --sensitive-local --trim5 10 followed by samtools 1.7 view -1 -bS and samtools sort, then samtools index):

$htseq-count -f bam -r pos -t gene -i ID -s no sample.sort.bam genome.gff
5337 GFF lines processed.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
Error occured when processing SAM input (record #416636 in file sample.sort.bam):
'NoneType' object has no attribute 'encode'
[Exception type: AttributeError, raised in _HTSeq.pyx:1358]

It is similar to this unanswered question on Biostars.

I will appreciate any help on this.

Thanks.

error running htseq-qa -t bam alignment.bam

I am running anaconda ->Python 2.7.11 :: Anaconda 2.5.0 (64-bit)

I installed htseq and pysam as as

conda install -c bioconda pysam
conda install -c bioconda htseq

my input file alignment.bam is a bam file generated by bwa aln and sorted with samtools sort and pcr duplicates removed with samtools rmdup

when I run
htseq-qa -t bam alignment.bam

I get this error:
..
..
..
15200000 reads processed
15400000 reads processed
15600000 reads processed
15800000 reads processed
16000000 reads processed
16200000 reads processed
16400000 reads processed
16600000 reads processed
16800000 reads processed
16930142 reads processed
Traceback (most recent call last):
File "/home/cmb-panasas2/tkitapci/libraries/lib_for_python/anaconda2_2_5_0/bin/htseq-qa", line 4, in
import('pkg_resources').run_script('HTSeq===0.6.1p1', 'htseq-qa')
File "/home/cmb-panasas2/tkitapci/libraries/lib_for_python/anaconda2_2_5_0/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/home/cmb-panasas2/tkitapci/libraries/lib_for_python/anaconda2_2_5_0/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script
File "/auto/rcf-40/tkitapci/.local/lib/python2.7/site-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/EGG-INFO/scripts/htseq-qa", line 5, in
HTSeq.scripts.qa.main()
File "/home/rcf-40/tkitapci/.local/lib/python2.7/site-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/scripts/qa.py", line 204, in main
norm=pyplot.normalize( 0, 1 ) )
AttributeError: 'module' object has no attribute 'normalize'

Thanks for the help!
Hamdi

matplotlib.normalize

Adam Tebbe wrote:

Hello Simon,

I was trying out your HTSeq library (installed version 0.6.1 from the public pypi repository). I am specifically trying out the htseq-qa script and have found that it does not run as expected.

Traceback (most recent call last):
File "/usr/local/bin/htseq-qa", line 5, in
HTSeq.scripts.qa.main()
File "/usr/local/lib/python2.7/site-packages/HTSeq/scripts/qa.py", line 204, in main
norm=pyplot.normalize( 0, 1 ) )
AttributeError: 'module' object has no attribute 'normalize'

I found that in version 1.5.1 of matplotlib, this call should be pyplot.Normalize instead of pyplot.normalize. There are three places in HTSeq/scripts/qa.py where this call is made.

I noticed that the code repository is hosted svn, which I have not used in many years. I’m happy to contribute a patch file or updated code if this would be helpful to try to correct this issue.

Thanks,
Adam Tebbe
Immuneering Corporation

New release including fix for qa script in Python 3?

I am getting the following error in the qa script in Python 3 using version 0.7.2:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Julian/Anaconda/envs/imfusion-dev/lib/python3.5/site-packages/HTSeq/scripts/qa.py", line 148
    print i, "reads processed"
          ^
SyntaxError: Missing parentheses in call to 'print'

I saw that this issue was fixed in a recent commit. Will there be a new version that includes this fix?

python 2 vs 3 behaviour

I've just been trying HTSeq 0.7 on python 3. I've noted some differences in string handling. For example this statement works on python 2 but not on 3, unless we encode the byte object.

seq = 'AAATTTGCCGCG'
seq = seq.encode() # needed for python 3
myseq = HTSeq.Sequence(seq, 'name')

The error Argument 'seq' has incorrect type (expected bytes, got str) is thrown on python 3.5.
The same applies when reading a sam file using HTSeq.SAM_Reader(samfile). The resulting iterator yields bytes instead of strings for the sequence field. These issues can be handled by using seq.decode() and seq.encode() methods but not sure if this should be needed.

Could you clarify if this behaviour is intended or if its a bug?
Thanks for updating your package for Python 3 support BTW.

Pysam compatibility issue

I recently upgraded to Pysam v. 0.9.0 and now HTSeq fails to correctly search for alignments within a genomicInterval from a BAM_Reader object.

Here's the error code and an example.

inbam = HTSeq.BAM_Reader("test.bam")
window = HTSeq.GenomicInterval("chrM", 9904, 9926, "-")
>>> window
<GenomicInterval object 'chrM', [9904,9926), strand '-'>
>>> for almt in inbam[window]:
...     print almt
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/HTSeq-0.6.1-py2.7-macosx-10.10-intel.egg/HTSeq/__init__.py", line 976, in __getitem__
    if not self.sf._hasIndex():
AttributeError: 'pysam.csamfile.Samfile' object has no attribute '_hasIndex'

I tracked down the _hasIndex function in pysam and the problem is that the _hasIndex function was renamed to has_index in the calignmentfile.pyx. This change happened in pysam version 0.8.4 I believe.

I edited the HTSeq/__init__.py script line 977

if not self.sf._hasIndex():
              raise ValueError, "The .bam-file has no index, random-access is disabled!"

if not self.sf.has_index():
              raise ValueError, "The .bam-file has no index, random-access is disabled!"

reinstalled HTSeq and then the error went away and my scripts ran normally again. I issued a pull request which fixes this error.

Trouble installing HTSeq (python 3.5) on Mac

Hello,

I am new with python , and trying to install HTSeq on my Mac.
Before I started I have installed xcode and numpy: 1.14.5.

I figured I first need to install pysam, the step I am stuck right now.

pip install pysam

Command "/Users/suzanne/env/bin/python3 -u -c "import setuptools, tokenize;file='/private/var/folders/yp/vd8kgzyj00x4b5dzcq4_vghh0000gn/T/pip-install-b8sgc50v/pysam/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/yp/vd8kgzyj00x4b5dzcq4_vghh0000gn/T/pip-record-u3cbv062/install-record.txt --single-version-externally-managed --compile --install-headers /Users/suzanne/env/include/site/python3.7/pysam" failed with error code 1 in /private/var/folders/yp/vd8kgzyj00x4b5dzcq4_vghh0000gn/T/pip-install-b8sgc50v/pysam/

pip install HTSeq

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/yp/vd8kgzyj00x4b5dzcq4_vghh0000gn/T/pip-install-eirjc94y/HTSeq/

Can someone help me out?

Thanks in advance!

Do not automatically add "[part]" or "revcomp_of_" to name when slicing a segment or using get_reverse_complement()

When I try to slice a segment or use get_reverse_complement() in HTSeq, it will add an unnecessary "[part]" or "revcomp_of_" to the name and I have to write extra code to remove them. I don't think this is a practical function.

Homepage URL broken

On the top of the GitHub page, there is a description and link to the htseq's homepage at http://www-huber.embl.de/users/anders/HTSeq/

This URL appears to be down. I double checked at Down For Everyone Or Just Me.

Error when iterating FastqReader(fastq.gz)

File "foo_iter_fastqgz.py", line 9, in <module>
    for read in FastqReader(fpath):
  File "/ifs/home/yy1533/SHARED_APP/miniconda3/lib/python3.6/site-packages/HTSeq/__init__.py", line 392, in __iter__
    if not qual.endswith( "\n" ):
TypeError: endswith first arg must be bytes or a tuple of bytes, not str

There was no problem when running FastqReader(fastq) but error was reported when using fastq.gz file.

Py3.6 with HTseq installed by pip install HTSeq.

* in column 7 of sam is not recognized

In SAM files, one often encounters a * in column 7, and this is in conformity with the SAM specification. HTseq crashes on the *. If you replace the * by a zero 0 in column 7, then HTseq is happy.
I suggest that HTseq should accept the * in column 7, but have not studied all the consequences.
What do you think
Thank you

Iterating a FastaReader object MUCH slower using python 3.x

I've been using HTSeq to read fasta files and I noticed a huge increase in time when iterating a fasta object between python 2.7 (HTSeq version 0.6.1p1) and 3.4 (HTSeq version 0.7.2).

For instance :

for contig in HTSeq.FastaReader( "genomeX.fasta" ):
	#do nothing
	pass

this task takes 0.01 sec using python 2.7 compared to 6.7 sec with python 3.4 (bacteria genome size 2.2 Mb)

Did you ever notice such behavior?

Issues with "Mate records missing" with position-sorted bam files

I've installed htseq-count from github on our local computing cluster and am attempting to use htseq-count in conjunction with a bowtie2 with and without deduplication (using picard) to generate count profiles for predicted genes in a metagenomic assembly.

I run bowtie2 to map reads and pipe the results through samtools to generate a position-sorted bam file:
bowtie2-build assembly.fna assembly.fna
bowtie2 -x assembly.fna -1 reads.1.fastq -2 reads.2.fastq -p 8 | samtools view -hb | samtools sort - -o assembly.fna.pos.bam

Using a standard gtf as gene location input for each contig, I run htseq-count:
htseq-count -r pos -t CDS -f bam assembly.fna.pos.bam assembly.gtf > assembly.fna.pos.count

...but each time I see a warning suggesting a VERY high number of reads without mates
Warning: Mate records missing for 10524978 records; first such record: <SAM_Alignment object: Paired-end read 'NB501065:182:H53TNBGX3:2:12108:1776:2682' aligned to contig-120_2940:[1721,1767)/+>.

When I sort the bam file by query name and change the htseq-count flag to -r name things run fine
bowtie2 -x assembly.fna -1 reads.1.fastq -2 reads.2.fastq -p 8 | samtools view -hb | samtools sort -n - -o assembly.fna.qname.bam

Any help is appreciated, even if this a stupid question and there's something glaringly wrong with my function calls to bowtie/samtools/htseq-count.

Thanks!

htseq input bam format

Hi,
I think htseq requires bam file to be name sorted is that correct ? If I input a coordinate sorted bam file will I get a warning or error ?

Thanks
Best Regards
Hamdi

readthedocs broken by pysam

Pysam did something in 0.11 or 0.11.2 and now our docs on readthedocs.org don't build correctly anymore, complaining about some missing compression libs (bzip2 for now), which pysam uses for CRAM support. I opened an issue upstream: pysam-developers/pysam#465. Let's keep monitored and hope they fix it on their end first.

htseq-0.9.1 :: make documentation broken due to doc beeing a link

Hello,

while building htsqe-0.9.1 from release archive (https://github.com/simon-anders/htseq/archive/release_0.9.1.tar.gz)

make doc, aka make html was giving the following error message

[gensoft@c467b007697e htseq-release_0.9.1]$ cd doc
[gensoft@c467b007697e doc]$ make html
mkdir -p _build/html _build/doctrees _static
sphinx-build -b html -d _build/doctrees   . _build/html
Running Sphinx v1.5.3

Exception occurred:
  File "conf.py", line 52, in <module>
    version = open( "../VERSION" ).readlines()[0].rstrip()
IOError: [Errno 2] No such file or directory: '../VERSION'

as doc is a link to python2/doc it is correct that ../VERSION does not exist

fix was easy via ln -s, but you may want to implement a more clean solution

regards

Eric

HTSeq error with dexseq_count.py and Python 3

Hello,

I was trying to run DEXSeq count reads mode (dexseq_count.py) and found it fails if Python 3 is used but works with Python 2.7. I think the issue may be HTSeq-related so reporting it here.

The error I got is below (using dexseq 1.24.0 with htseq 0.9.1). Others have reported a similar error here: https://support.bioconductor.org/p/106685/

Fatal error: Exit code 1 () Traceback (most recent call last): File "/usr/local/tools/_conda/envs/mulled-v1-683dc60555985982bcd77839627baa1ef503a6a15fb8cf27f5b492bbedd69ac4/bin/dexseq_count.py", line 98, in <module> features[f.iv] += f File "python3/src/HTSeq/_HTSeq.pyx", line 451, in HTSeq._HTSeq.ChromVector.__iadd__ File "python3/src/HTSeq/_HTSeq.pyx", line 466, in HTSeq._HTSeq.ChromVector.apply File "python3/src/HTSeq/_HTSeq.pyx", line 449, in HTSeq._HTSeq.ChromVector.__iadd__.addval TypeError: unhashable type: 'GenomicFeature'

Mate not appeared for duplication reads

I'm checking the read mates using pair_SAM_alignments. When a read is duplicated, the other mate isn't always available. Do I misunderstand how to use?

sam_reader = HTSeq.SAM_Reader("duplication.sorted.sam")
for first, second in HTSeq.pair_SAM_alignments(sam_reader):
    if (first != None) and first.mate_aligned and first.pcr_or_optical_duplicate:
        print('second:', second) # Always None

htseq-qa AttributeError

Dear all,

I get an error message when trying to call htseq-qa (version 0.8.0, see below).

The program was called from Ubuntu 16.04 LTS. htseq-count works well
on this system.

When I carry out htseq-qa (version 0.6.0) from our cluster computer (Liunx SLES) on the same sam file
it works well.

What could be the reason for the error message below?

Libra66@pc65472:/data/bowtie_workflow$ htseq-qa SRR1619533.sam

Error occured in: file SRR1619533.sam closed

Traceback (most recent call last):

File "/home/Libra66/.local/bin/htseq-qa", line 4, in
import('pkg_resources').run_script('HTSeq==0.8.0', 'htseq-qa')

File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py",
line 719, in run_script
self.require(requires)[0].run_script(script_name, ns)

File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py",
line 1504, in run_script
exec(code, namespace, namespace)

File
"/home/Libra66/.local/lib/python2.7/site-packages/HTSeq-0.8.0-py2.7-l
inux -x86_64.egg/EGG-INFO/scripts/htseq-qa", line 5, in
HTSeq.scripts.qa.main()

File
"/home/Libra66/.local/lib/python2.7/site-packages/HTSeq-0.8.0-py2.7-l
inux -x86_64.egg/HTSeq/scripts/qa.py", line 136, in main
r.add_bases_to_count_array( base_arr_A )

AttributeError: 'HTSeq._HTSeq.SAM_Alignment' object has no attribute
'add_bases_to_count_array'

BAM_Reader Error: CIGAR code

Two new characters (‘=’ and ‘X’) have recently been added to the
CIGAR string in SAM/BAM format. When I try to use the HTSeq
BAM_Reader on most BAM files, I get the following error:

File "/Library/Python/2.7/site-packages/HTSeq/init.py", line 947,
in iter yield SAM_Alignment.from_pysam_AlignedSegment( pa, sf )
File "python2/src/HTSeq/_HTSeq.pyx", line 1326, in
HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedSegment
(src/_HTSeq.c:30549) File "python2/src/HTSeq/_HTSeq.pyx", line 1177,
in HTSeq._HTSeq.build_cigar_list (src/_HTSeq.c:27110) ValueError:
Unknown CIGAR code '=' encountered.

It would be very helpful if you could address this issue in an
update. Page 5 of this document contains the updated CIGAR string:
https://samtools.github.io/hts-specs/SAMv1.pdf

Install locally on CentOS

Hi,
how can I install htseq locally on CentOS. I don't have root privileges on my institution's cluster.
thanks,
-madza

GFF_Reader does not take unicode str

Given a unicode string gtf path , GFF_Reader will report an error when trying to iterate the feature in it.

In "FileOrSequence" class, the following code:
def iter( self ):
self.line_no = 1
if isinstance( self.fos, str ):
if self.fos.lower().endswith( ( ".gz" , ".gzip" ) ):
lines = gzip.open( self.fos, 'rt' )
else:
lines = open( self.fos )
else:
lines = self.fos

The problem is " if isinstance( self.fos, str ):", it should be "if isinstance( self.fos, basestring ):", in order to cover both 'str' and 'unicode'.

linting

Linting and indentations should be standard across the codebase

how to merge exons parts to actual exons?

Hi,

I have a bunch of samples from various tumor types in which I want to get the exon level expression of the gene IDO1. My aim is not to do any differential expression of any sort because there are no control samples. I prepared the gencode v23 gtf to gff using the following command:

python2.7 dexseq_prepare_annotation.py gencode.v23.annotation.gtf gencode.v23.annotation.gff

This gave me 33 exonic parts for the gene IDO1:

chr8	dexseq_prepare_annotation.py	aggregate_gene	39902275	39928444	.	+	.	gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39902275	39902370	.	+	.	transcripts "ENST00000518804.5"; exonic_part_number "001"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39902371	39902374	.	+	.	transcripts "ENST00000518804.5+ENST00000519154.5"; exonic_part_number "002"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39902375	39902379	.	+	.	transcripts "ENST00000518804.5+ENST00000522495.5+ENST00000519154.5"; exonic_part_number "003"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39902380	39902448	.	+	.	transcripts "ENST00000518804.5+ENST00000522840.1+ENST00000522495.5+ENST00000519154.5"; exonic_part_number "004"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39908989	39909030	.	+	.	transcripts "ENST00000522840.1"; exonic_part_number "005"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39909179	39909348	.	+	.	transcripts "ENST00000518804.5+ENST00000522840.1"; exonic_part_number "006"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913001	39913148	.	+	.	transcripts "ENST00000522495.5"; exonic_part_number "007"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913284	39913402	.	+	.	transcripts "ENST00000518237.5"; exonic_part_number "008"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913403	39913773	.	+	.	transcripts "ENST00000518237.5+ENST00000519154.5"; exonic_part_number "009"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913774	39913868	.	+	.	transcripts "ENST00000518237.5+ENST00000522840.1+ENST00000519154.5"; exonic_part_number "010"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913869	39913888	.	+	.	transcripts "ENST00000518237.5+ENST00000522840.1+ENST00000253513.11+ENST00000519154.5"; exonic_part_number "011"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39913889	39914009	.	+	.	transcripts "ENST00000518804.5+ENST00000522495.5+ENST00000519154.5+ENST00000518237.5+ENST00000522840.1+ENST00000253513.11"; exonic_part_number "012"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39917875	39917884	.	+	.	transcripts "ENST00000518804.5+ENST00000522495.5+ENST00000519154.5+ENST00000518237.5+ENST00000522840.1+ENST00000253513.11"; exonic_part_number "013"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39917885	39917929	.	+	.	transcripts "ENST00000518237.5+ENST00000522495.5+ENST00000519154.5+ENST00000518804.5+ENST00000521636.1+ENST00000522840.1+ENST00000253513.11"; exonic_part_number "014"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39917930	39917970	.	+	.	transcripts "ENST00000518804.5+ENST00000522495.5+ENST00000519154.5+ENST00000518237.5+ENST00000521636.1+ENST00000253513.11"; exonic_part_number "015"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39918088	39918138	.	+	.	transcripts "ENST00000518804.5+ENST00000522495.5+ENST00000519154.5+ENST00000518237.5+ENST00000521636.1+ENST00000253513.11"; exonic_part_number "016"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39918139	39918207	.	+	.	transcripts "ENST00000518237.5+ENST00000521636.1+ENST00000522495.5+ENST00000519154.5+ENST00000253513.11"; exonic_part_number "017"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39918815	39918933	.	+	.	transcripts "ENST00000518237.5+ENST00000521636.1+ENST00000522495.5+ENST00000519154.5+ENST00000253513.11"; exonic_part_number "018"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39918934	39918944	.	+	.	transcripts "ENST00000253513.11"; exonic_part_number "019"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39920100	39920114	.	+	.	transcripts "ENST00000518237.5+ENST00000521636.1+ENST00000522495.5+ENST00000519154.5"; exonic_part_number "020"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39922552	39922651	.	+	.	transcripts "ENST00000521480.1+ENST00000522495.5+ENST00000519154.5+ENST00000518237.5+ENST00000521636.1+ENST00000253513.11"; exonic_part_number "021"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39922652	39922654	.	+	.	transcripts "ENST00000521636.1+ENST00000519154.5"; exonic_part_number "022"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39922655	39922911	.	+	.	transcripts "ENST00000521636.1"; exonic_part_number "023"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39923469	39923586	.	+	.	transcripts "ENST00000518237.5+ENST00000521480.1+ENST00000522495.5+ENST00000253513.11"; exonic_part_number "024"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39923587	39923590	.	+	.	transcripts "ENST00000521480.1+ENST00000253513.11"; exonic_part_number "025"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39923591	39923931	.	+	.	transcripts "ENST00000521480.1"; exonic_part_number "026"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39924720	39924720	.	+	.	transcripts "ENST00000523779.1"; exonic_part_number "027"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39924721	39924772	.	+	.	transcripts "ENST00000518237.5+ENST00000523779.1+ENST00000522495.5+ENST00000253513.11"; exonic_part_number "028"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39925223	39925371	.	+	.	transcripts "ENST00000518237.5+ENST00000523779.1+ENST00000522495.5+ENST00000253513.11"; exonic_part_number "029"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39927830	39928138	.	+	.	transcripts "ENST00000518237.5+ENST00000523779.1+ENST00000522495.5+ENST00000253513.11"; exonic_part_number "030"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39928139	39928426	.	+	.	transcripts "ENST00000518237.5+ENST00000522495.5+ENST00000253513.11"; exonic_part_number "031"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39928427	39928431	.	+	.	transcripts "ENST00000518237.5+ENST00000253513.11"; exonic_part_number "032"; gene_id "ENSG00000131203.12"
chr8	dexseq_prepare_annotation.py	exonic_part	39928432	39928444	.	+	.	transcripts "ENST00000253513.11"; exonic_part_number "033"; gene_id "ENSG00000131203.12"

Then, I ran dexseq_count.py like this:

python2.7 dexseq_count.py -f bam -p yes -r pos -s no gencode.v23.annotation.gff sample1_Aligned.out.sorted.bam output.txt

And these are the corresponding counts for all 33 exonic parts of IDO1:

ENSG00000131203.12:001	1
ENSG00000131203.12:002	1
ENSG00000131203.12:003	1
ENSG00000131203.12:004	3
ENSG00000131203.12:005	0
ENSG00000131203.12:006	0
ENSG00000131203.12:007	0
ENSG00000131203.12:008	0
ENSG00000131203.12:009	0
ENSG00000131203.12:010	2
ENSG00000131203.12:011	2
ENSG00000131203.12:012	3
ENSG00000131203.12:013	1
ENSG00000131203.12:014	1
ENSG00000131203.12:015	1
ENSG00000131203.12:016	1
ENSG00000131203.12:017	1
ENSG00000131203.12:018	0
ENSG00000131203.12:019	0
ENSG00000131203.12:020	2
ENSG00000131203.12:021	6
ENSG00000131203.12:022	1
ENSG00000131203.12:023	1
ENSG00000131203.12:024	4
ENSG00000131203.12:025	0
ENSG00000131203.12:026	0
ENSG00000131203.12:027	0
ENSG00000131203.12:028	3
ENSG00000131203.12:029	4
ENSG00000131203.12:030	34
ENSG00000131203.12:031	29
ENSG00000131203.12:032	2
ENSG00000131203.12:033	2

However, when I looked in UCSC there are only 10 IDO1 exons. Because I am only interested in getting the exon level expression and no differential expression I am wondering if there is a way to modify the gff to just keep the actual exon coordinates? Or is there a way to merge these 33 exons to get counts for the 10 exons? Please advise.

Thanks!

get_sam_line() error

When calling alignment.get_sam_line() I got following error:

python3/src/HTSeq/_HTSeq.pyx in HTSeq._HTSeq.SAM_Alignment.get_sam_line()

TypeError: sequence item 9: expected str instance, bytes found

Running HTSeq v 0.9.1 with Python 3.6.4
The offending line seems to be the return call in get_sam_line() where a string is formed by joining few items (seq and qualstr field are bytes and not str).

Bug in samouts when `-f bam`

Hi all,

I have have been trying to use htseq-count with the option --samout in my pipelines, and while testing it came upon what looks luke a bug: when usinf both -samout and f bam the output sam consists of only the tags, but not the alignments:

module load htseq/0.9.0
module load samtools/1.5
ESSENTIAL_GENES="/fsimb/groups/imb-kettinggr/genomes/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes_and_transposons.WBcel235.38.gtf"
aln="N2_wt_rep1_untreated.bam"
exp=$(echo $(basename $aln) | sed 's/.bam//')
echo $exp

htseq-count --samout=${exp}.test2.sam -s yes -f bam -m intersection-nonempty ${aln} ${ESSENTIAL_GENES} > ${exp}.counts

head ${exp}.test2.sam
        XF:Z:__too_low_aQual
        XF:Z:__no_feature
        XF:Z:__no_feature
        XF:Z:WBGene00023193
        XF:Z:WBGene00023193
        XF:Z:__no_feature
        XF:Z:WBGene00023193
        XF:Z:WBGene00023193
        XF:Z:WBGene00023193
        XF:Z:WBGene00023193

However, piping the alignments to htseq-count returns the expected output:

samtools view -h ${aln} | htseq-count --samout=${exp}.test1.sam -s yes -f sam -m intersection-nonempty - ${ESSENTIAL_GENES} > ${exp}.counts

head ${exp}.test1.sam
NB501946:201:H3HMGAFXY:1:21112:24049:15895      16      I       3115    0       22M     *       0      0TGATGTTCTACGCTTAAATTTT  EEEEEEEEEEEEEEEEEEEEEA  XA:i:0  MD:Z:22 NM:i:0  XM:i:2  XF:Z:__too_low_aQual
NB501946:201:H3HMGAFXY:2:11202:12920:18180      16      I       3685    255     21M     *       0      0TATCTACTAGGAATAACTCGA   EEEEEEEEEEEEEEEEEEEEA   XA:i:0  MD:Z:21 NM:i:0  XF:Z:__no_feature
NB501946:201:H3HMGAFXY:2:21105:7328:11769       0       I       3738    255     21M     *       0      0TGTAAAATAGAGGATCAGACC   AAEEEEEEEAAEE6EEEEEEE   XA:i:0  MD:Z:21 NM:i:0  XF:Z:__no_feature
NB501946:201:H3HMGAFXY:2:11112:5811:6494        16      I       3746    255     21M     *       0      0AGAGGATCAGACCCAAAATTC   EEEEEEEEEEEEEEEEEEEEA   XA:i:0  MD:Z:21 NM:i:0  XF:Z:WBGene00023193
NB501946:201:H3HMGAFXY:2:21204:5691:10822       16      I       3746    255     22M     *       0      0AGAGGATCAGACCCAAAATTCA  EEEEEEEEEEEEEAEEEEEEEA  XA:i:0  MD:Z:22 NM:i:0  XF:Z:WBGene00023193
NB501946:201:H3HMGAFXY:1:21209:12787:3406       0       I       3747    255     21M     *       0      0GAGGATCAGACCCAAAATTCA   AEEEEEEEEEEEEEEEEEEEE   XA:i:0  MD:Z:21 NM:i:0  XF:Z:__no_feature
NB501946:201:H3HMGAFXY:4:11512:6158:17507       16      I       3747    255     21M     *       0      0GAGGATCAGACCCAAAATTCA   EEEEEEEEEEEEEEEEEEEEA   XA:i:0  MD:Z:21 NM:i:0  XF:Z:WBGene00023193
NB501946:201:H3HMGAFXY:3:11503:17322:16736      16      I       3747    255     20M     *       0      0GAGGATCAGACCCAAAATTC    EEEEEEEEEEEEEEEEEEEA    XA:i:0  MD:Z:20 NM:i:0  XF:Z:WBGene00023193
NB501946:201:H3HMGAFXY:3:21403:11554:1616       16      I       3747    255     24M     *       0      0GAGGATCAGACCCAAAATTCAGCC        EEEAAEAEEEEE6EEEEEEEEEEA        XA:i:0  MD:Z:24 NM:i:0  XF:Z:WBGene00023193
NB501946:201:H3HMGAFXY:3:11607:5533:12910       16      I       3747    255     31M     *       0      0GAGGATCAGACCCAAAATTCAGCCCGCGAAG EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA XA:i:0  MD:Z:31 NM:i:0  XF:Z:WBGene00023193

Cheers,
António

Install HTSeq with Conda on Windows

Hi is there any chance to install the new version on Windows:

If I execute this Statement:
pip install HTSeq

in Anaconda prompt as Admin I get this Error Message:

Collecting HTSeq
Using cached https://files.pythonhosted.org/packages/f8/87/85bc27f3d96ff4cd9144cbbfb5eb7573654edd0f6970f5c49d15a17e5d6f/HTSeq-0.10.0.tar.gz
Complete output from command python setup.py egg_info:
'.' is not recognized as an internal or external command,
operable program or batch file.
'.' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 154, in save_modules
yield saved
File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
yield
File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "C:\Users\dhg8103\AppData\Local\Temp\easy_install-nj3cyveq\pysam-0.14.1\setup.py", line 223, in
File "C:\Users\dhg8103\AppData\Local\Temp\easy_install-nj3cyveq\pysam-0.14.1\setup.py", line 69, in run_make_print_config
# Update version from VERSION file into module
File "D:\Anaconda\anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "D:\Anaconda\anaconda3\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "D:\Anaconda\anaconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "D:\Anaconda\anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\dhg8103\AppData\Local\Temp\pip-install-lcnym_i4\HTSeq\setup.py", line 200, in <module>
    **kwargs
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\__init__.py", line 128, in setup
    _install_setup_requires(attrs)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\__init__.py", line 123, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\dist.py", line 504, in fetch_build_eggs
    replace_conflicting=True,
  File "D:\Anaconda\anaconda3\lib\site-packages\pkg_resources\__init__.py", line 774, in resolve
    replace_conflicting=replace_conflicting
  File "D:\Anaconda\anaconda3\lib\site-packages\pkg_resources\__init__.py", line 1057, in best_match
    return self.obtain(req, installer)
  File "D:\Anaconda\anaconda3\lib\site-packages\pkg_resources\__init__.py", line 1069, in obtain
    return installer(requirement)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\dist.py", line 571, in fetch_build_egg
    return cmd.easy_install(req)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\command\easy_install.py", line 673, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\command\easy_install.py", line 699, in install_item
    dists = self.install_eggs(spec, download, tmpdir)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\command\easy_install.py", line 884, in install_eggs
    return self.build_and_install(setup_script, setup_base)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\command\easy_install.py", line 1152, in build_and_install
    self.run_setup(setup_script, setup_base, args)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\command\easy_install.py", line 1138, in run_setup
    run_setup(setup_script, args)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 253, in run_setup
    raise
  File "D:\Anaconda\anaconda3\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
    yield
  File "D:\Anaconda\anaconda3\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 166, in save_modules
    saved_exc.resume()
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 141, in resume
    six.reraise(type, exc, self._tb)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\_vendor\six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 154, in save_modules
    yield saved
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 195, in setup_context
    yield
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 250, in run_setup
    _execfile(setup_script, ns)
  File "D:\Anaconda\anaconda3\lib\site-packages\setuptools\sandbox.py", line 45, in _execfile
    exec(code, globals, locals)
  File "C:\Users\dhg8103\AppData\Local\Temp\easy_install-nj3cyveq\pysam-0.14.1\setup.py", line 223, in <module>
  File "C:\Users\dhg8103\AppData\Local\Temp\easy_install-nj3cyveq\pysam-0.14.1\setup.py", line 69, in run_make_print_config
    # Update version from VERSION file into module
  File "D:\Anaconda\anaconda3\lib\subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "D:\Anaconda\anaconda3\lib\subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "D:\Anaconda\anaconda3\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "D:\Anaconda\anaconda3\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
symlinking folders for python3
# pysam: cython is available - using cythonize if necessary
# pysam: htslib mode is shared
# pysam: HTSLIB_CONFIGURE_OPTIONS=None
# pysam: htslib configure options: None

I don't have any clue how set the specific symlinking folders for python3 (google solutions did not help me)

If I download the HTSeq-0.10.0.tar.gz and try to execute pip install setup.py (also in Anaconda Prompt) I get follwing error:

Collecting setup.py
Could not find a version that satisfies the requirement setup.py (from versions: )
No matching distribution found for setup.py

Thanks & Regards,

Marko

StepVector.so Error Message

Hi,

I am trying to install htseq-count on an Ubuntu VM.

I get the following error message if I simply run htseq-count after install HTSeq:

Traceback (most recent call last):
  File "/usr/bin/htseq-count", line 3, in <module>
    import HTSeq.scripts.count
  File "/usr/lib/python2.7/site-packages/HTSeq/__init__.py", line 9, in <module>
    from _HTSeq import *
  File "_HTSeq.pyx", line 14, in init HTSeq._HTSeq (src/_HTSeq.c:34537)
  File "/usr/lib/python2.7/site-packages/HTSeq/StepVector.py", line 26, in <module>
    _StepVector = swig_import_helper()
  File "/usr/lib/python2.7/site-packages/HTSeq/StepVector.py", line 22, in swig_import_helper
    _mod = imp.load_module('_StepVector', fp, pathname, description)
ImportError: /usr/lib/python2.7/site-packages/HTSeq/_StepVector.so: undefined symbol: _ZNSt12out_of_rangeC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I have tried the following installation strategies:

pip install htseq (error message above)
install from PyPI source (error message above)
install from git code (error message: gcc: error: src/_HTSeq.c: No such file or directory)

I've tried uninstalling and re-installing numpy, and I've also tried installing htseq as root. When installing from source, I've tried installing for all users and just myself (and I also get the same error if I try `python setup.py build' and I test the htseq-count in the 'build' folder).

Can you please help me troubleshoot?

Thanks,
Charles

installation issue?

Hi,

I am not sure how to fix the following problem on Ubuntu 16.04.
I installed HTSeq and the necessary dependencies using pip install.
When I try to run htseq-count I get following error:
Traceback (most recent call last):
File "/home/opt/anaconda/bin/htseq-count", line 3, in
import HTSeq.scripts.count
File "/home/opt/anaconda/lib/python2.7/site-packages/HTSeq/init.py", line 12, in
from _HTSeq import *
File "init.pxd", line 164, in init HTSeq._HTSeq
ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96

is this a common issue?

Thanks in advance!

SAM_Alignment : not_primary_alignment not taking flag 0x800 into account

Hello,

In the SAM_Alignment class, the not_primary_alignment attribute only takes the flag 0x100 into account to determine whether a read is primary or not. However, the SAM spec also defines flag 0x800 as "supplementary alignment". This flag is used by programs such as BWA-MEM to mark other alignments reported for the same read, which would fit the definition of "not primary".

I think either the presence of this flag should make not_primary_alignment true, or the variable should be renamed to avoid confusion.

On a related note, in the documentation , the attribute is spelled "nor_primary_alignment" instead of "not_primary_alignment".

Version : HTSeq 0.7.2 (from Pypi) on Python 3.5

iv.chrom AttributeError

Hi there,

I just installed HTSeq 0.10.0 and checked out your guide, using the example data. Whenever I try to print aln.iv.chrom for non-aligned reads, I get this error message:

Traceback (most recent call last):
File "/home/hermi/workspace/Coverage/HTseq/ReadCounts.py", line 46, in
print aln.iv.chrom
AttributeError: 'NoneType' object has no attribute 'chrom'

I get this both when I use the example data and my own files. I wonder if this might cause any problems downstream?

Cheers,
Maike

Problem getting part of sequence

Hey Guys, thanks for the great job in this Library. I wanted ask for help to solve this problem. I'm trying to get part of a sequence using an interval from SAM file.

This part of my code:

genome = HTSeq.FastaReader(genome)
sam = HTSeq.SAM_Reader(sam)
    for line in sam:
        read = None 
        read = line.iv.copy()
        if read.strand == '+':
            read.strand = '-'
            read.end = read.start +1
            read.start = read.start -10
        elif read.strand == '-':
            read.strand = '+'
            read.start = read.end+1
            read.end = read.end +12
        print(genome[read].seq)

and I'm getting this error:

Traceback (most recent call last):
  File "filter.py", line 89, in <module>
    main()
  File "filter.py", line 85, in main
    filter(args.genome, args.sam)
  File "filter.py", line 59, in filter
    print(genome[read].seq)
  File "/usr/lib64/python3.4/site-packages/HTSeq-0.8.0-py3.4-linux-x86_64.egg/HTSeq/__init__.py", line 357, in __getitem__
    ans = list( FastaReader( fasta ) )
  File "/usr/lib64/python3.4/site-packages/HTSeq-0.8.0-py3.4-linux-x86_64.egg/HTSeq/__init__.py", line 290, in __iter__
    for line in FileOrSequence.__iter__( self ):
  File "/usr/lib64/python3.4/site-packages/HTSeq-0.8.0-py3.4-linux-x86_64.egg/HTSeq/__init__.py", line 54, in __iter__
    lines = open( self.fos )
FileNotFoundError: [Errno 2] No such file or directory: '>I:1758-1768\nTTAATTTTCAA\n'

It keep the interval sequence in a string, but try to open this string as a file.

Any help would be greatly appreciated.

Adding ability for using cores

Have there been any discussions about provide the use of cores with this this program?

samout: output bam files

Hi all,

I am wondering if you could consider adding the option to output the annotated sam alignments in bam format (with header). This is would remove an intermediate step of converting the file to bamwhich is, afaik, the most common way to storing alignments these days.

Cheers,
António

HTSeq: Failed to import 'numpy'. when building snap

I am trying to build a snap package on Ubuntu that includes HTSeq. It seems to be the only thing that won't work. I get the following error:

Setup script for HTSeq: Failed to import 'numpy'.
Please install numpy and then try again to install HTSeq.

Obviously numpy and other dependencies are being installed as well and the package is installed on my normal system. This is using python 2.7. Could there be something else missing that is falsely flagged as a numpy error? You might not be familiar with snaps but I do think it's a HTSeq problem as all the other packages build ok.
Any ideas?

htseq-count Read claims to have an aligned mate which could not be found in an adjacent line.

Example:

Read SN860:669:C8F8HACXX:7:1101:2485:2172.firstrun.1 claims to have an aligned mate which could not be found in an adjacent line.

The program continues to run even after spiting out these warnings.
Does it skip the troubled reads and continues counting?
I'm also confused about what these other warnings mean:

Warning: 23877510 reads with missing mate encountered.
30111080 SAM alignment pairs processed.

thanks

download HTSeq in windows

I used HTSeq in Linux,Now I need it in my windows but the modules give this error when I tried to download it.
os.symlink(py_fdn+fdn, fdn)
AttributeError: 'module' object has no attribute 'symlink'

In stackoverflow I found this answer:
https://stackoverflow.com/questions/1236172/py2app-error-module-object-has-no-attribute-symlink
"os.symlink is only available on Unix and Unix-like operating systems (including the Mac), not Windows."

Is it so?
if is so how can be HTSeq available for windows?
http://htseq.readthedocs.io/en/release_0.9.1/install.html#ms-windows
in
https://pypi.python.org/pypi/HTSeq#downloads
all the download files for mac and linux, I downloaded the the source file , is this the one which is available for windows?

simon-anders / htseq Goto Github PK

htseq's People

Contributors

Stargazers

Watchers

Forkers

htseq's Issues

Recommend Projects

Recommend Topics

Recommend Org