Git Product home page Git Product logo

ncov-recombinant's Issues

Identify parent lineages with LAPIS cov-spectrum

So I have a new idea for identifying the parent lineage.

England/MILK-3796834/2022 is an XM recombinant, with regions predicted by sc2rf:

  • 44:17410|Omicron/21K
  • 21618:29510|Omicron/21L

From nextclade, the mutations by region are as follows:

  • 44:17410
    • C241T,C2470T,A2832G,C3037T,T5386G,G8393A,C10029T,C10449A,A11537G,C12513T,T13195C,C14408T,C15240T
  • 21618:29510
    • C21618T,G21987A,T22200G,G22578A,C22674T,T22679C,C22686T,A22688G,G22775A,A22786C,G22813T,T22882G,G22992A,C22995A,A23013C,A23040G,A23055G,A23063T,T23075C,A23403G,C23525T,T23599G,C23604A,C23854A,G23948T,A24424T,T24469A,C25000T,C25416T,C25584T,C26060T,C26270T,C26577G,G26709A,C26858T,A27259C,G27382C,A27383T,T27384C,C27807T,A28271T,C28311T,G28487A,G28881A,G28882A,G28883C,A29510C

And if we query these mutations in cov-spectrum with LAPIS...

Parent 1 | 44:17410

Parent 1 is mostly likely BA.1.1.10 (72%, 647/894.)

https://lapis.cov-spectrum.org/open/v1/sample/aggregated?fields=pangoLineage&nucMutations=C241T,C2470T,A2832G,C3037T,T5386G,G8393A,C10029T,C10449A,A11537G,C12513T,T13195C,C14408T,C15240T

{
  "errors":[],
  "info": {
    "apiVersion":1, 
    "dataVersion":1656461191,
    "deprecationDate":null,
    "deprecationInfo":null,
    "acknowledgement":null
  },
  "data":[
    {"pangoLineage":"B.1.1","count":1},
    {"pangoLineage":"BA.1.1.18","count":30},
    {"pangoLineage":"BA.1.1.12","count":3},
    {"pangoLineage":"BA.1.1.10","count":647},
    {"pangoLineage":"BA.1.1","count":186},
    {"pangoLineage":"BA.1","count":27}
  ]
}

Parent 2 | 21618:29510

Parent 2 is mostly likely BA.2 (83%, 38/46). However, there is only one runner up, and it is BA.2.12.1 (17%) which falls within BA.2.

https://lapis.cov-spectrum.org/open/v1/sample/aggregated?fields=pangoLineage&nucMutations=C21618T,G21987A,T22200G,G22578A,C22674T,T22679C,C22686T,A22688G,G22775A,A22786C,G22813T,T22882G,G22992A,C22995A,A23013C,A23040G,A23055G,A23063T,T23075C,A23403G,C23525T,T23599G,C23604A,C23854A,G23948T,A24424T,T24469A,C25000T,C25416T,C25584T,C26060T,C26270T,C26577G,G26709A,C26858T,A27259C,G27382C,A27383T,T27384C,C27807T,A28271T,C28311T,G28487A,G28881A,G28882A,G28883C,A29510C

{
  "errors": [],
  "info":{
    "apiVersion":1,
    "dataVersion":1656461191,
    "deprecationDate":null,
    "deprecationInfo":null,
    "acknowledgement":null
  },
  "data":[
    {"pangoLineage":"BA.2","count":38},
    {"pangoLineage":"BA.2.12.1","count":8}
  ]
}

Resolving

There are a couple of options to resolve the proportions:

  • Exclude lineages by a hard cut-off (<1%, <10%, etc.)
  • Take the highest proportion lineage.
  • Consider lineages in descending order, and report a lineage if it is a sub-lineage of the one with the highest proportion.
Lineage Count Proportion Note
BA.1.1.10 647 72% Report
BA.1.1 186 21% Not sub-lineage
BA.1.1.18 30 3% Not sub-lineage
BA.1 27 3% Not sub-lineage
BA.1.1.12 3 <1% Exclude
B.1.1 1 <1% Exclude
Lineage Count Proportion Note
BA.2 38 83% Ignore, has sub-lineage
BA.2.12.1 8 17% Report, is sub-lineage

Relax Nextclade filtering

  • Samples are being excluded because they have no labelled private mutations.
  • Experiment with keeping samples that also have privateNucMutations.unlabeledSubstitutions or privateNucMutations.reversionSubstitutions

Positive Controls

This is the list of implemented positive controls:

  • XA | Alpha recombinant (poor lineage accuracy)
  • XB | Conflicting designation issue
  • XC | Alpha recombinant (poor lineage accuracy)
  • XD
  • XE
  • XG
  • XH
  • XJ
  • XK | No public genomes
  • XL
  • XM
  • XN
  • XP
  • XQ
  • XR
  • XS
  • XT | No public genomes | Restricted to South Africa
  • XU | No public genomes | Restricted to India, Japan, Australia
  • XW | ...
  • XY | ...
  • XZ | ...
  • XAA | ...
  • XAB | ...
  • XAC | ...
  • XAD | ...
  • XAE | ...
  • XAF | ...
  • XAG | ...
  • XAH | ...
  • XAJ | ...
  • XAK | ...
  • XAL | ...
  • XAM | ...
  • XAN | ...
  • XAP | ...
  • XAQ | ...
  • XAR | ...
  • XAS | ...
  • XAT | ...
  • XAU | ...
  • XAV | ...
  • XAW | ...
  • XAY | ...
  • XAZ | โ€ผ Priority | Large Lineage

Identify parent lineages with RIPPLES

I'd like to give RIPPLES another try, now that I'm more proficient with the matUtils commands.

Steps:

  • 1. Create a text file of cluster ids. This will be the first observed sequence for each recombinant lineage.
  • 2. Run ripples on those cluster ids.

Rules with empty log files

The following rules currently produce empty log files. If checked, log content has been added:

  • nextclade
  • sc2rf_recombinants
  • faToVcf
  • usher_metadata
  • summary
  • plot?

Add versions to reporting

Recombinant lineages change extremely rapidly. Include program versions for:

  • ncov-recombinant
  • nextclade
  • sc2rf
  • usher

Include dataset versions for:

Upgrade UShER to v0.5.6

Something has changed in the implementation of UShER subtree extraction because an initial upgrade from v0.5.3 to v0.5.6 crashed the pipeline. But v0.5.6 has some interesting new options for subtrees, so I'll look into it!

Detect duplicate sequences in subtrees

  • I'd like the option to detect duplicates strains based on a matching column (ex. genbank_accession or gisaid_epi_isl).
  • And then label or remove from the subtree JSON.

Update tutorial strains

The tutorial strains have high levels of ambiguity, maybe I should replace these with different sequences? But on the other hand, these are good examples of how the pipeline handles this ambiguity.

  • XM-example-2: Ns around breakpoints
  • proposed467-example-2: Ns around breakpoints
  • miscBA1BA2Post17k-example-1: Ns around breakpoints and IUPAC ambiguity
  • 19955,20055 are common sites to be ambiguous.

image

Plot substitutions as ticks on breakpoints figure

This help in interpreting lineages that have been split into multiple clusters (ex. XM)

breakpoints_clade

This help in interpreting lineages with the same breakpoint and parents (but different subs). For example, XQ and XR differ by one sub (around 17500).
breakpoints_lineage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.