Git Product home page Git Product logo

hgvs's Introduction

HGVS variant name parsing and generation

The Human Genome Variation Society (HGVS) promotes the discovery and sharing of genetic variation in the human population. As part of facilitating variant sharing, the society has produced a series of recommendations for how to name and refer to variants within research publications and clinical settings. A compilation of these recommendations is available on their website.

This library provides a simple Python API for parsing, formatting, and normalizing HGVS names. Surprisingly, there are many non-trivial steps necessary in handling HGVS names and therefore there is a need for well tested libraries that encapsulate these steps.

HGVS name example

In most next-generation sequencing applications, variants are first discovered and described in terms of their genomic coordinates such as chromosome 7, position 117,199,563 with reference allele G and alternative allele T. According to the HGVS standard, we can describe this variant as NC_000007.13:g.117199563G>T. The first part of the name is a RefSeq ID NC_000007.13 for chromosome 7 version 13. The g. denotes that this is a variant described in genomic (i.e. chromosomal) coordinates. Lastly, the chromosomal position, reference allele, and alternative allele are indicated. For simple single nucleotide changes the > character is used.

More commonly, a variant will be described using a cDNA or protein style HGVS name. In the example above, the variant in cDNA style is named NM_000492.3:c.1438G>T. Here again, the first part of the name refers to a RefSeq sequence, this time mRNA transcript NM_000492 version 3. Optionally, the gene name can also be given as NM_000492.3(CFTR). The c. indicates that this is a cDNA name, and the coordinate indicates that this mutation occurs at position 1438 along the coding portion of the spliced transcript (i.e. position 1 is the first base of ATG translation start codon). Briefly, the protein style of the variant name is NP_000483.3:p.Gly480Cys which indicates the change in amino-acid coordinates (480) along an amino-acid sequence (NP_000483.3) and gives the reference and alternative amino-acid alleles (Gly and Cys, respectively).

The standard also specifies custom name formats for many mutation categories such as insertions (NM_000492.3:c.1438_1439insA), deletions (NM_000492.3:c.1438_1440delGGT), duplications (NM_000492.3:c.1438_1440dupGGT), and several other more complex genomic rearrangements.

While many of these names appear to be simple to parse or generate, there are many corner cases, especially with cDNA HGVS names. For example, variants before the start codon should have negative cDNA coordinates (NM_000492.3:c.-4G>C), and variants after the stop codon also have their own format (NM_000492.3:c.*33C>T). Variants within introns are indicated by the closest exonic base with an additional genomic offset such as NM_000492.3:4243-20A>G (the variant is 20 bases in the 5' direction of the cDNA coordinate 4243). Lastly, all coordinates and alleles are specified on the strand of the transcript. This library properly handles all logic necessary to convert genomic coordinates to and from HGVS cDNA coordinates.

Another important consideration of any library that handles HGVS names is variant normalization. The HGVS standard aims to provide "uniform and unequivocal" description of variants. Namely, two people discovering a variant should be able to arrive at the same name for it. Such a property is very useful for checking whether a variant has been seen before and connecting all known relevant information. For SNPs, this property is fairly easy to achieve. However, for insertions and deletions (indels) near repetitive regions, many indels are equivalent (e.g. it doesn't matter which AT in a run of ATATATAT was deleted). The VCF file format has chosen to uniquely specify such indels by using the most left-aligned genomic coordinate. Therefore, compliant variant callers that output VCF will have applied this normalization. The HGVS standard also specifies a normalization for such indels. However, it states that indels should use the most 3' position in a transcript. For genes on the positive strand, this is the opposite direction specified by VCF. This library properly implements both kinds of variant normalization and allows easy conversion between HGVS and VCF style variants. It also handles many other cases of normalization (e.g. the HGVS standard recommends indicating an insertion with the dup notation instead of ins if it can be represented as a tandem duplication).

Example usage

Below is a minimal example of parsing and formatting HGVS names. In addition to the name itself, two other pieces of information are needed: the genome sequence (needed for normalization), and the transcript model or a callback for fetching the transcript model (needed for transcript coordinate calculations). This library makes as few assumptions as possible about how this external data is stored. In this example, the genome sequence is read using the pyfaidx library and transcripts are read from a RefSeqGenes flat-file using methods provided by hgvs.

import pyhgvs as hgvs
import hgvs.utils as hgvs_utils
from pyfaidx import Fasta

# Read genome sequence using pyfaidx.
genome = Fasta('hg19.fa')

# Read RefSeq transcripts into a python dict.
with open('hgvs/data/genes.refGene') as infile:
    transcripts = hgvs_utils.read_transcripts(infile)

# Provide a callback for fetching a transcript by its name.
def get_transcript(name):
    return transcripts.get(name)

# Parse the HGVS name into genomic coordinates and alleles.
chrom, offset, ref, alt = hgvs.parse_hgvs_name(
    'NM_000352.3:c.215A>G', genome, get_transcript=get_transcript)
# Returns variant in VCF style: ('chr11', 17496508, 'T', 'C')
# Notice that since the transcript is on the negative strand, the alleles
# are reverse complemented during conversion.

# Format an HGVS name.
chrom, offset, ref, alt = ('chr11', 17496508, 'T', 'C')
transcript = get_transcript('NM_000352.3')
hgvs_name = hgvs.format_hgvs_name(
    chrom, offset, ref, alt, genome, transcript)
# Returns 'NM_000352.3(ABCC8):c.215A>G'

The hgvs library can also perform just the parsing step and provide a parse tree of the HGVS name.

import pyhgvs as hgvs

hgvs_name = hgvs.HGVSName('NM_000352.3:c.215-10A>G')

# fields of the HGVS name are available as attributes:
#
# hgvs_name.transcript = 'NM_000352.3'
# hgvs_name.kind = 'c'
# hgvs_name.mutation_type = '>'
# hgvs_name.cdna_start = hgvs.CDNACoord(215, -10)
# hgvs_name.cdna_end = hgvs.CDNACoord(215, -10)
# hgvs_name.ref_allele = 'A'
# hgvs_name.alt_allele = 'G'

Install

This library can be installed using the setup.py file as follows:

python setup.py install

Tests

Test cases can be run by running

python setup.py nosetests

Requirements

This library requires at least Python 2.6, but otherwise has no external dependencies.

The library does assume that genome sequence is available through a pyfaidx compatible Fasta object. For an example of writing a wrapper for a different genome sequence back-end, see hgvs.tests.genome.MockGenome.

hgvs's People

Contributors

jdavisp3 avatar jtratner avatar mdrasmus avatar mdshw5 avatar naegelyd avatar pkaleta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hgvs's Issues

parse_hgvs_name() crashes if start>end

I have trouble converting chr19:g.10291325_10291323dup (rs147441348) into chrom, pos, ref, alt using parse_hgvs_name(). The traceback is

Traceback (most recent call last):
  File "XXX", line 76, in main
    get_transcript=get_transcript)
  File "xxxx/pyhgvs/__init__.py", line 1360, in parse_hgvs_name
    chrom, start, end, ref, alt = get_vcf_allele(hgvs, genome, transcript)
  File "xxxx/pyhgvs/__init__.py", line 672, in get_vcf_allele
    alt = ref[0] + alt
IndexError: string index out of range

pyghvs is unable to retrieve the ref bases which is likely to be caused by get_genomic_sequence() which in turn does not support end coordinates bigger that start coordinates. Now, I am not sure this is wrong. However, I can paste chr19:g.10291325_10291323dup into Alamut in my case and find the variant. Exchanging start/end seems to yield the correct result, too.

update of genes.refGene files

I need to use an updated version of refseq. Is it available any script to download the current version of the file 'genes.refGene' or I should to build it by hand?. Thank you. Angela

AttributeError: 'module' object has no attribute 'read_transcripts'

Hello, I installed the 'hgvs', use:
pip install 'hgvs'
pip install 'pygr'

But there are some issues, how to fix it ?

[root@bio-x-2 hgvs]# python
Python 2.7.5 (default, Sep 15 2016, 22:37:39)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pyhgvs as hgvs
import hgvs.utils as hgvs_utils
from pygr.seqdb import SequenceFileDB
hgvs_utils.read_transcripts()
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'read_transcripts'

how to get pdot

Hello, I see from example usage how to get HGVS cdot from REF/ALT. Is there a built-in function to get the pdot? Thanks.

add pip install

having an option to pip install pyhgvs would make package management much easier.

Incorrect translation when the HGVS string does not contain a reference or alt allele

I've come across this problem with strings such as NM_007294.3:c.1209dup - which IMHO should actually be NM_007294.3:c.1209dupT (which is how ClinVar represents the variant), but mutalyzer claims that NM_007294.3:c.1209dup is valid HGVS... When I parse its name with

chrom, offset, ref, alt = hgvs.parse_hgvs_name(variant, genome, get_transcript=get_transcript)

I get the results that ref and alt are both 'C', where alt should be 'CC'. If there's a way around this, please let me know!

Thanks!

hgvs/pyhgvs/data/genes.refGen file

dear:

How do I create this file : hgvs/pyhgvs/data/genes.refGen ,This file is out of date and I want to update it。

I want to use the latest transcripts。

Syntax error when trying to parse valid R variant

import hgvs.parser
hp = hgvs.parser.Parser()
hp.parse_hgvs_variant("NM13423:r.831_832ins831+1_831+60")
...
ometa.runtime.ParseError:
NM13423:r.831_832ins831+1_831+60
^
Parse error at line 1, column 20: Syntax error. trail: [rna_iupac rna rna_ins rna_edit r_posedit r_variant hgvs_variant]
...
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
hgvs.exceptions.HGVSParseError: NM13423:r.831_832ins831+1_831+60: char 20: Syntax error

Incorrect HGVS to VCF conversion for some genomic indels

Hi, genomic indels are often wrong because get_coords() adjustment of start/end is only done for indels if self.kind == 'c'

Testing against examples from the ClinGen allele registry:

http://reg.clinicalgenome.org/redmine/projects/registry/genboree_registry/allele?hgvsOrDescriptor=NM_000492.3%3Ac.1155_1156dupTA

    'NM_000492.3:c.1155_1156dupTA' # correct resolves to ('chr7', 117182104, 'A', 'AAT')
    # Same as above but without optional trailing base - issue #32
    'NM_000492.3:c.1155_1156dup' # Error - resolves to ('chr7', 117182107, 'A', 'A')
    # Genomic coordinate of above
    "chr7:g.117182108_117182109dup" # Error - resolves to ('7', 117182109, 'A', 'A')

    # Genomic coordinate of above but shifted with optional base suffix
    "chr7:g.117182105_117182106dupAT" # Error - resolves to ('7', 117182106, 'T', 'T')

I would do a pull request but I've been working with existing pull request #25 and it doesn't look like this project is being updated anymore. If you merge #25 please ping this issue and I'll make a pull request.

Fixes are to remove test for if self.kind == 'c': in get_coords()

I've patched my fork: https://github.com/sacgf/hgvs

'dict' object has no attribute 'tx_position'

python hgvs-convert.py

DEBUG seqdb._create_seqLenDict: Building sequence length index...
Traceback (most recent call last):
File "hgvs-convert.py", line 35, in
print(hgvs.parse_hgvs_name("NM_000352.3:c.215A>G",genome,transcripts))
File "/home/josephv/Pythonmodules/lib/python2.7/site-packages/pyhgvs-0.9.4-py2.7.egg/pyhgvs/init.py", line 1365, in parse_hgvs_name
chrom, start, end, ref, alt = get_vcf_allele(hgvs, genome, transcript)
File "/home/josephv/Pythonmodules/lib/python2.7/site-packages/pyhgvs-0.9.4-py2.7.egg/pyhgvs/init.py", line 662, in get_vcf_allele
chrom, start, end = hgvs.get_vcf_coords(transcript)
File "/home/josephv/Pythonmodules/lib/python2.7/site-packages/pyhgvs-0.9.4-py2.7.egg/pyhgvs/init.py", line 1181, in get_vcf_coords
chrom, start, end = self.get_coords(transcript)
File "/home/josephv/Pythonmodules/lib/python2.7/site-packages/pyhgvs-0.9.4-py2.7.egg/pyhgvs/init.py", line 1142, in get_coords
chrom = transcript.tx_position.chrom
AttributeError: 'dict' object has no attribute 'tx_position'

The script I am using is

import pyhgvs as hgvs
import pyhgvs.utils as hgvs_utils
from pygr.seqdb import SequenceFileDB

genome = SequenceFileDB('/ifs/e63data/offitlab/Human_Decoy_REF/hs37d5.fa')

with open('/ifs/e63data/offitlab/REFGENE/sorted.curated_geneTrack_wo_chr_sorted.refgene') as infile:
transcripts = hgvs_utils.read_transcripts(infile)

def get_transcript(name):
return transcripts.get(name)

print(hgvs.parse_hgvs_name("NM_000352.3:c.215A>G",genome,transcripts))

AttributeError: 'module' object has no attribute 'utils'

In your example, the line

transcripts = hgvs.utils.read_transcripts('genes.refGene')

is throwing the error:
transcripts = hgvs.utils.read_transcripts('genes.refGene')
AttributeError: 'module' object has no attribute 'utils'

Any thoughts?

hgvs_utils not installing?

OS X 10.11.3
python 2.7.10

Or am I supposed to install this separately?

I git cloned hgvs and ran python setup.py install

>>> import hgvs.utils as hgvs_utils
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named hgvs.utils

error naming CFTR:c.1521_1523delCTT

Using hg18.fa and the provided genes.refGene in the git repo. I don't think this is a problem but let me know if you think it is.

chrom, offset, ref, alt = ('chr7', 116986881, 'TCTT', 'T')
transcript = get_transcript('NM_000492.3')
hgvs_name = hgvs.format_hgvs_name(
    chrom, offset, ref, alt, genome, transcript)
print(hgvs_name)
#returns NM_000492.3(CFTR):c.-133267_-133265delCTT

However I don't think this is correct. Shouldn't it be CFTR:c.1521_1523delCTT?
Goods news: I tried an alternative form of FDel508 and got the same result

#NM_000492.3 is the transcript for CFTR
chrom, offset, ref, alt = ('chr7', 11698688, 'ATCT', 'A')
transcript = get_transcript('NM_000492.3')
hgvs_name = hgvs.format_hgvs_name(
    chrom, offset, ref, alt, genome, transcript)
print(hgvs_name)
#returns NM_000492.3(CFTR):c.-133267_-133265delCTT

So I think it is just how it is counting from is possibly off. Any thoughts? Thanks! Let me know if I can help contribute!

No module read_transcripts in hgvs_utils

My code is an exact copy of the README.md file on your site. I can't get your package to work as directed.
>>> import pyhgvs as hgvs
>>> import hgvs.utils as hgvs_utils
>>> hgvs_utils.read_transcripts
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'read_transcripts'

I am trying to use Ensembl transcripts as well, and the documentation is rather sparse on that.

Add NC_ALLELE parse

Awesome work! Thanks!

There are some variants which have no mRNA or cDNA hgvs,

eg. rs716274,NC_000011.9:g.103418158A>G

NC_ALLELE is empty and not being processed now.

Reference HGVS without reference base leads to wrong coordinates and reference allele

The current regex treats the last digit as a ref digit, ie it uses it to multiply "N" that many times. This makes the coordinate wrong as the last digit is cut off, eg:

In [6]: HGVSName("NC_000017.11:g.50199235=")                                                                                                                               
Out[6]: HGVSName('NC_000017.11:g.5019923NNNNN=')

In [7]: HGVSName("NM_018090.5:c.462=")                                                                                                                                     
Out[7]: HGVSName('NM_018090.5:c.46NN=')

Unit test test_hgvs_names.py

# Copy pasted from BRCA1:c.101A= test with "A" removed

    ('BRCA1:c.101=', True,
     {
         'gene': 'BRCA1',
         'kind': 'c',
         'cdna_start': CDNACoord(101),
         'cdna_end': CDNACoord(101),
         'ref_allele': '',
         'alt_allele': '',
         'mutation_type': '=',
     }),

# Copy pasted from BRCA1:g.101A= test with "A" removed

    ('BRCA1:g.101=', True,
     {
         'gene': 'BRCA1',
         'kind': 'g',
         'start': 101,
         'end': 101,
         'ref_allele': '',
         'alt_allele': '',
         'mutation_type': '=',
     }),


Currently fails with:

AssertionError: CDNACoord(10, 0) != CDNACoord(101, 0)

Fix is to add a new regex just above the existing "No change" regexes, ie in HGVSRegex:

CDNA_ALLELE = [
    CDNA_START + EQUAL, 
    # old regexes
]

GENOMIC_ALLELE = [
    COORD_START + EQUAL,
    # old regexes
]

I am not sure whether the protein HGVS is affected, and if need to specify the ref ie whether "p.1000=" is valid or not

dup longer than 100 bases converted back to delins (due to hardcoding of 100 in code)

Expected: Converting a long HGVS dup to variant coordinates then back again will make a dup
Actual: A long dup is converted to a delins:

from pyhgvs import parse_hgvs_name, variant_to_hgvs_name

g_hgvs_str = "NC_000001.10:g.235611675_235611994dup"
c_hgvs_str = "NM_003193.4(TBCE):c.1411_1501dup"


chrom, offset, ref, alt = parse_hgvs_name(g_hgvs_str, f, None)
g_hgvs_name = variant_to_hgvs_name(chrom, offset, ref, alt, f, None)

print(f"{g_hgvs_str=} => {g_hgvs_name=}")

chrom, offset, ref, alt = parse_hgvs_name(c_hgvs_str, f, transcript)
c_hgvs_name = variant_to_hgvs_name(chrom, offset, ref, alt, f, transcript)

print(f"{c_hgvs_str=} => {c_hgvs_name=}")

Output:

g_hgvs_str='NC_000001.10:g.235611675_235611994dup' => g_hgvs_name=HGVSName('g.235611773_235611774ins320')
c_hgvs_str='NM_003193.4(TBCE):c.1411_1501dup' => c_hgvs_name=HGVSName('NM_003193.4(TBCE):c.1491+18_1491+19ins320')

This is because hgvs_justify_indel only looks a hardcoded 100 bases around the indel

If you change the code to:

    size = max(len(ref), len(alt)) + 1
    start = max(offset - size, 0)
    end = offset + size

It keeps the dup:

g_hgvs_str='NC_000001.10:g.235611675_235611994dup' => g_hgvs_name=HGVSName('g.235611675_235611994dup320')
c_hgvs_str='NM_003193.4(TBCE):c.1411_1501dup' => c_hgvs_name=HGVSName('NM_003193.4(TBCE):c.1411_1501dup320')

Rename repository to pyhgvs?

This package was renamed from hgvs to pyhgvs a while ago, but the GitHub url still uses hgvs. Switching is actually pretty low-cost, since GH sets up redirects from the old name to the new name, so old links don't break. Even git pull/push still works (I've done this with a few repositories in the past).

how to get coordinate of "AB026906.1:c.40_42del" by hgvs code

i have used genes.refGene(#26 (comment)) and hg19.fa

genes.refGene does not have "AB026906.1" transcript

Error :
Traceback (most recent call last):
File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required

HGVS / genome coordinate conversion does not account for cDNA alignment gaps

RefSeq transcript sequences can be different from the reference sequence (even if they agree with 1 build they can be different across builds). These sequences are aligned against the genome to produce exon coordinates in GFF releases.

This alignment can sometimes produce insertions / deletions (5-10% of transcripts), eg in the GFF file there is a “cDNA match” string that records the alignment, and has a “Gap” entry:

NC_000002.12    RefSeq  cDNA_match      73385758        73386192        431.411 +       .       ID=daa36283c6058f57b6347eb074291b21;Target=NM_015120.4 1 438 +;assembly_bases_aln=5003;assembly_bases_seq=5003;consensus_splices=44;exon_identity=0.999768;for_remapping=2;gap_count=1;identity=0.999768;idty=0.993151;matches=12925;num_ident=12925;num_mismatch=0;pct_coverage=99.9768;pct_coverage_hiqual=99.9768;pct_identity_gap=99.9768;pct_identity_ungap=100;product_coverage=1;rank=1;splices=44;weighted_identity=0.999771;Gap=M185 I3 M250

NM_015120.4 has cDNA_match Gap=M185 I3 M250 - meaning there was 185 bases matched, 3 bases inserted then back to matching. You can see how this affects PyHGVS conversion downstream from the gaps:

2:73385942 A>T: NM_015120.4(ALMS1):c.74A>T (correct)
2:73385943 A>T: NM_015120.4(ALMS1):c.75A>T (off by 3, VEP gives NM_015120.4:c.78A>T)
2:73385944 G>C: NM_015120.4(ALMS1):c.76G>C (off by 3, VEP gives NM_015120.4:c.79G>C)

Running UTA locally

hi, I preferred to run UTA locally, and I have downloaded and installed the docker and the postgreSQL docker. But "docker" technology is quite new to me, and I am not sure how to run the database. Could you help me on this? Thanks

Announcing cdot - a way to load lots of transcripts fast

I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS

https://github.com/SACGF/cdot

You can either download a JSON.gz file, or use a REST service. To use it:

from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)

README.md (import hgvs error)

Noticed that the code in the UI's readme didnt work for me it looks like I was resolved in the examples1.py file. In the second line use

import pyhgvs.utils as hgvs_utils
intstead of
import hgvs.utils as hgvs_utils

single base pair insertion name comes up as slightly off

Getting a systematic issue:
Every cdna name from vcf records is correct except for single base pair insertion.

shouldBe getting
CFTR:c.1006_1007insG CFTR:c.1007insG
CFTR:c.1029_1030insG CFTR:c.1030insG
CFTR:c.1660_1661insA CFTR:c.1661insA
CFTR:c.3883_3884insG CFTR:c.3884insG

So its close but it doesn't get the first coordinate. Multi-bp insertions are correct. Any idea why there is a difference?

Need updated version of genes.refGene

Hi I ahve some variants in HGVS format which has NM_004364.4 transcript.

This transcript is not there in pyhgvs/data/genes.refGene file.

Can you please tell me how can I get the updated file or add this to the file.

Thank you

Regards

Issue with installing in Ubuntu

I seem to have an issue installing HGVS when running "python setup.py install" I encounter the following:

Traceback (most recent call last):
File "setup.py", line 35, in
main()
File "setup.py", line 30, in main
parse_requirements('requirements-dev.txt')],
File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1200, in parse_requirements
skip_regex = options.skip_requirements_regex
AttributeError: 'NoneType' object has no attribute 'skip_requirements_regex'

Python3 version

Any chance to make it happen? It seems much better than biocommon hgvs since it requires connection to uta resources.

License

Thanks for sharing very useful library!

Would you mind adding License for this software?

how to Check whether p.val meets the HGVS specification

I'm trying to localize all variants of CIVIC

But I'm not sure whether some variants meet HGVS standards

This is an outstanding project, but in readme, I haven't seen an example of analyzing protein level variation

I want to know if it can do this, and thank any other suggestions

Unable to parse a HGVS variant in format that VEP accepts

pyhgvs.InvalidHGVSName: Invalid HGVS cDNA allele "3252delC+3263insC"

VEP's web interface was able to translate that just fine, so I'm assuming that is the correct HGVS format. I gave it the variant as such:

ENST00000333535:c.3252delC+3263insC

a format which worked for all of my other variants. Just a PSA unless there is some older/newer format version for this kind of variant of which I am unaware.

how to create or find "genes.refGene" file for hg19 and hg38

how to create or find "genes.refGene" file for hg19, hg38.
i have got "genes.refGene" file from USSC but these are not working for my case

error shows :

Traceback (most recent call last):
File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required

get_transcripts()

I am running the sample script from GitHUB but using my local version of refGene and Human Genome reference.

import pyhgvs as hgvs
import pyhgvs.utils as hgvs_utils
from pygr.seqdb import SequenceFileDB

genome = SequenceFileDB('hs37d5.fa')

with open('sorted.curated_geneTrack_wo_chr_sorted.refgene') as infile:
transcripts = hgvs_utils.read_transcripts(infile)

def get_transcript(name):
return transcripts.get(name)

chrom, offset, ref, alt = hgvs.parse_hgvs_name('NM_000352.3:c.215A>G', genome, get_transcript=get_transcript)
print(chrom, offset, ref, alt)

I am encountering this error:

File "hgvs-convert.py", line 34, in
chrom, offset, ref, alt = hgvs.parse_hgvs_name('NM_000352.3:c.215A>G', genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.