Git Product home page Git Product logo

hgvs's Introduction

hgvs - manipulate biological sequence variants according to Human Genome Variation Society recommendations

Important: biocommons packages require Python 3.8+. More

The hgvs package provides a Python library to parse, format, validate, normalize, and map sequence variants according to Variation Nomenclature (aka Human Genome Variation Society) recommendations.

Specifically, the hgvs package focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package does not attempt to cover the full scope of HGVS recommendations. Please refer to issues for limitations.

Information

rtd changelog getting_help GitHub license binder

Latest Release

GitHub tag pypi_rel

Development

coveralls issues GitHub Open Pull Requests GitHub license GitHub stars GitHub forks

Features

  • Parsing is based on formal grammar.
  • An easy-to-use object model that represents most variant types (SNVs, indels, dups, inversions, etc) and concepts (intronic offsets, uncertain positions, intervals)
  • A variant normalizer that rewrites variants in canonical forms and substitutes reference sequences (if reference and transcript sequences differ)
  • Formatters that generate HGVS strings from internal representations
  • Tools to map variants between genome, transcript, and protein sequences
  • Reliable handling of regions genome-transcript discrepancies
  • Pluggable data providers support alternative sources of transcript mapping data
  • Extensive automated tests, including those for all variant types and "problematic" transcripts
  • Easily installed using remote data sources. Installation with local data sources is straightforward and completely obviates network access

Important Notes

  • You are encouraged to browse issues. All known issues are listed there. Please report any issues you find.
  • Use a pip package specification to stay within minor releases. For example, hgvs>=1.5,<1.6. hgvs uses Semantic Versioning.

Installing HGVS Locally

Important: For more detailed installation and configuration instructions, see the HGVS readthedocs

Prerequisites

libpq
python3
postgresql

Examples for installation:

MacOS :

brew install libpq
brew install python3
brew install postgresql@14

Ubuntu :

sudo apt install gcc libpq-dev python3-dev

Installation Steps

By default, hgvs uses remote data sources, which makes installation easy. If you would like to use local instances of the data sources, see the readthedocs.

  1. Create a virtual environment using your preferred method.

    Example:

     python3 -m venv venv
    
  2. Run the following commands in your virtual environment:

     source venv/bin/activate
     pip install --upgrade setuptools
     pip install hgvs
    

See Installation instructions for details, including instructions for installing Universal Transcript Archive (UTA) and SeqRepo locally.

Examples and Usage

See examples and readthedocs for usage.

Contributing

The hgvs package is intended to be a community project. Please see Contributing to get started in submitting source code, tests, or documentation. Thanks for getting involved!

See Also

Other packages that manipulate HGVS variants:

hgvs's People

Contributors

afrubin avatar andreasprlic avatar bioinformed avatar ccaitlingo avatar chenliangomc avatar davmlaw avatar diliopoulos avatar dolftax avatar ecalifornica avatar gitter-badger avatar icebert avatar invitae-vince avatar jamespeacock avatar jdasilva-invitae avatar jpleyte avatar jsstevenson avatar jtratner avatar katiestahl avatar keithcallenberg avatar kern3020 avatar khmccurdy avatar korikuzma avatar langitem avatar lucaswiman avatar naomifox avatar pjcoenen avatar pkaleta avatar reece avatar sandeep-n avatar shmumer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hgvs's Issues

add more tests for grammar rules

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


This ticket has two goals:

  • ensure that we can parse a set of valid variant strings that we have chosen to support
  • ensure that each of the grammar rules has a reasonable set of tests.

Think code coverage for the hgvs grammar.

tests/data/gauntlet contains an incomplete list of test variants that provides a starting point for testing. We don't currently support all of these. It would be good to keep a list of variants that we should parse but don't (yet).

One possibility is to name a test class for the rule being tested, and a method for each specific test. doc strings will help here to identify what we're trying to test.

Then, we can then write a script to pull out rule names and match against tests to look for gaps.

This approach is just a suggestion -- other ideas are welcome!

Links

  • imported from: CORE-60 (Invitae access required)

hgvs - stringifying a delins with a single AA insertion drops the delins

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


If an hgvsp tag with a delins is parsed, the delins is dropped in the case where the insertion is only 1 AA long. (2+ are OK.)

See the following iPython excerpt:

In [36]: foo = 'NP_12345.1:p.Ala10_Ile12delinsGly'
In [38]: var_p = p.parse_hgvs_variant(foo)
In [39]: str(var_p)
Out[39]: 'NP_12345.1:p.Ala10_Ile12Gly'

In [47]: foo2 = 'NP_12345.1:p.Ala10_Ile12delinsGlyArg'
In [48]: var_p2 = p.parse_hgvs_variant(foo2)
In [49]: str(var_p2)
Out[49]: 'NP_12345.1:p.Ala10_Ile12delinsGlyArg'

Links

  • imported from: CORE-92 (Invitae access required)

implement VariantValidator

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Three levels of validation:

  • syntactic -- does it parse?
  • intrinsic -- is it internally consistent?
  • extrinsic -- does it validate against external data?

Syntactic validation comes with parsing. This issue should create a class that provides intrinsic and extrinsic validation. Examples (brainstorm for more):

I: end>=start (careful with offsets)
I: range for ins is length 1
I: length of del = length of range

E: is the ac valid?
E: if the variant includes ref seq, does it agree with the sequence?

Links

  • imported from: CORE-39 (Invitae access required)

build self-contained test set with sqlite database for hgvs

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Build test set from ClinVar/Clinvitae/dbSNP that covers broad set of variant types and small number of genes.

Using those selections, then build a small sqlite database subset for genes+transcripts sufficient for testing and commit with repo. The goal is to have broad, self-contained testing.

Links

  • imported from: CORE-40 (Invitae access required)

biopython warns when sequence translation when not mod 3 == 0

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Please squash this warning:

/home/reece/.virtualenvs/default-2.7/local/lib/python2.7/site-packages/Bio/Seq.py:1971: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.

Links

  • imported from: CORE-55 (Invitae access required)

hgvs can't parse p.Met1?

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


This may fall under the category of not supporting '?' but I'm calling this out since it's a common case (at least in the cvids). You can create variants of the form "Met1?" but you can't actually parse them.

p = hgvs.parser.Parser()
p.parse_hgvs_variant('NP_999999.1:p.Met1?')
ParseError: 
NP_999999.1:p.Met1?
                  ^
Parse error at line 1, column 18: expected one of 'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Gln', 'Glu', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro', 'Ser', 'Ter', 'Thr', 'Trp', 'Tyr', 'Val', '_', 'del', 'delins', 'dup', 'ins', or a digit. trail: [pro_dup pro_edit p_posedit p_variant hgvs_variant]

Links

  • imported from: CORE-89 (Invitae access required)
  • parent task: issue #90

hgvs parser: needs to parse p.0/p.=/p.?

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


Parsing p.0 or p.? or p.= fails as follows. (extra space between colon and p to try to avoid the automatic emoticon.)

e.g.
p.parse_hgvs_variant('NP_123456: p.=')

Parse error at line 1, column 12: expected one of '(', 'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Gln', 'Glu', 'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe', 'Pro', 'Ser', 'Thr', 'Trp', 'Tyr', or 'Val'. trail: [aa1 aa13 def_p_pos p_pos def_p_interval p_interval p_posedit p_variant hgvs_variant]

Links

  • imported from: CORE-66 (Invitae access required)

Update docs

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


  • -fix doc errors (make docs)- --done
  • -fix class references- --done
  • -add link to build status graphic- --done
  • -update docs to use pip install hgvs, bdi, uta- --done
  • -remove reST- --done
  • -ext.todos- -- abandoned
  • -use rtd theme- -- decided I didn't like rtd theme as much
  • -put license in doc- --done
  • -restructure docs: toc, overview, description, example, getting started, reference, license,contact,more info,issues-

Links

  • imported from: CORE-97 (Invitae access required)

implement HGVS normalization

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


The first goal for this issue is to identify what sorts of normalization we need to implement.

Examples:

  • left shuffling
  • right shuffling
  • variant type rewriting, e.g., ins -> dup

Links:


classify (or shorten) long-running hgvs tests

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Testing times have ballooned in the hgvs package, and it's making me test less frequently. Please shorten long-running tests to representative subsets.

TestHgvsCToPSubsetCvidsPlus and TestHgvsCToPAllCvids are particularly sluggish.

If you want to keep the long running flavors for additional testing NOT by default, that's fine.

Alternatively, I think the real reason to run those exhaustive tests is to satisfy Emily, not to check the build, and that use would probably be better served by ./sbin/test-runner (or similar). (It's not a good name, admittedly.)

Thanks!

Links

  • imported from: CORE-95 (Invitae access required)

update parser and variant representation

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


Lumping these into one issue for now - can split into smaller pieces later if desired.

If you run all the cvid data through parser.Parser().parse_hgvs_variant, the following categories all fail to parse:

Met1? (already in a separate Jira)
ins
delins
dup
ext

These can all be created via the converter; it's the act of parsing the string that's failing.

Links

  • imported from: CORE-90 (Invitae access required)
  • subtasks: issue #94, issue #91, issue #89, issue #65

hgvsc to hgvsp: handle case where deletion crosses utr/cds boundary

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


Edge case for converter; if a deletion crosses the following:
5'utr-cds
cds-3'utr

the code currently reports this as a p.?.

In the former case, this should be Met1?
In the latter case, this should be a fs/ext.

e.g.

NM_130839.2:c.2616_*11del15
NP_570854.1:p.*873Glnext*39   expected
NP_570854.1:p.?   actual

Links

  • imported from: CORE-64 (Invitae access required)

hgvs stringifies some hgsvp "*" as "Ter"

Originally reported by: Rudolph Rico (Bitbucket: rrico, GitHub: rrico)


tested version: bitbucket checkin 9edea6b

There are at least some instances where the string representation of an hgvsp tag is still "Ter" instead of ""
e.g.
NP_006149.2:p.Glu140Ter
NP_000194.2:p.Ter654Argext
51 (note - ext aren't yet supported in the grammar, but something to keep in mind when they are)

e.g.
In [1]: import hgvs.parser
In [2]: p = hgvs.parser.Parser()
In [3]: hgvsc = 'NP_006149.2:p.Glu140*'
In [4]: hgvsp = 'NP_006149.2:p.Glu140*'
In [5]: p.parse_hgvs_variant(hgvsp)
Out[5]: Variant(ac=NP_006149.2, type=p, posedit=Glu140Ter)
In [6]: var_p = p.parse_hgvs_variant(hgvsp)
In [7]: str(var_p)
Out[7]: 'NP_006149.2:p.Glu140Ter'

Links

  • imported from: CORE-63 (Invitae access required)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.