Git Product home page Git Product logo

tdp_publicdb's Introduction

tdp_publicdb

Phovea Phovea NPM version Build Status

This repository contains the database connector to the bioinfoDB.hg38 database and the respective SQL queries to show the views and visualizations.

Installation

git clone https://github.com/caleydo/tdp_publicdb.git
cd tdp_publicdb
npm install

Testing

npm test

Building

npm run build

This repository is part of Phovea, a platform for developing web-based visualization applications. For tutorials, API docs, and more information about the build and deployment process, see the documentation page.

tdp_publicdb's People

Contributors

anitasteiner avatar awernibi avatar bikramkawan avatar dg-datavisyn avatar dv-usama-ansari avatar dvdanielamoitzi avatar dvdanielrehberger avatar dvmichaelpeterseil avatar dvmoritzschoefl avatar dvvanessastoiber avatar keckelt avatar mstreit avatar oltionchampari avatar puehringer avatar rumersdorfer avatar sgratzl avatar steiner-anita avatar thinkh avatar zichner avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

tdp_publicdb's Issues

Domain for Boxplot Aggregated Score does not update

Environment

  • Release number or git hash: 9.1.2-20221027-011632
  • Browser: Chrome 106.0.5249.119
  • Deployed / Local: deployed

Steps to reproduce the bug

  1. Add an aggregated score of type boxplot

Observed Behavior

Domain values remain at 0 to 100, which causes the boxplots to be extremely skewed and illegible.
image

Expected Behavior

Boxplot visualizations are properly displayed
image

This might be related to #204 that fixed domain values for numeric scores

Update to postgres 10

The new h38 release requires Postgres 10. We need to update the docker-compose.partial.yml.

Update requirements and dependencies

We should use the latest versions both for requirements and dependencies.
Except for Typescript, tslint and d3 dependencies, which will be updated in a separate step.

Error when adding a score column with additional filters

  • Release number or git hash: f045a07
  • Web browser version and OS: Chrome 77
  • Environment (local or deployed): local and deployed

Steps to reproduce

  1. Open https://ordino.caleydoapp.org
  2. Select Cancer Gene Census
  3. Select one or multiple genes
  4. Open Copy Number view
  5. Filter for Organ: bladder (or any other filter)
  6. Apply filter

Same behavior applies when starting with cell lines and switching to genes.

Observed behavior

  • Shows an error message
  • The selected gene is not added as score column

grafik

grafik

Expected behavior

  • No error message and the score column for the filtered rows should be added

Combined view not working for Tissue Samples

@zichner commented on Mon Oct 02 2017

Release: dev and stable

Description
Combined view only shows NaN for tissue samples

Steps to reproduce

  1. Open list of all TCGA Tumor samples
  2. Select multiple samples
  3. Open Combined View
  4. Select TPM and Relative Copy Number as data subtype

-> Only empty columns are shown (all values are NaN).

By default, scores should be computed for all entities not only filtered ones

@zichner commented on Thu Oct 12 2017

If a filtered list contains less than 1000 entities, scores are only computed for this filtered subset. A checkbox was added to allow the user to still compute scores for all entities (#471).

By default, this checkbox should be ticked, i.e., scores should be computed for the entire list, since users seem to get confused if scores are missing after removing a filter criteria.


@zichner commented on Fri Oct 13 2017

It might also be good to invert the logic of the checkbox, so instead of ticking it to compute on all entities, one ticks it to only compute on current subset.

Add cell line score "PRISM drug screen"

With this issue the PRISM drug screen data (https://depmap.org/portal/prism/) should be added as a cell line score in Ordino.

User path

  1. Open list of cell lines
  2. Add column menu
  3. Select PRISM Drug Screen Score
  4. Select one or multiple drugs via auto complete
  5. Select one or multiple scores (Activity Area, IC50, or EC50)
  6. Add one or multiple score columns

Integration details

  • The data is already in our DB
    • We must add the related SQL schema as Alembic revision
    • Corresponding views: cellline.tdp_drugscore and public.tdp_drug
  • For now, there can be a dedicated entry in the add column dialog (e.g., PRISM Drug Screen Score); later there will likely come more drug screen data sets and an additional selection will be needed
    • This score is similar to the Depletion Screen Score for genes
  • When clicking on PRISM Drug Screen Score a dialog should open, where one selects
    • one of the several hundred drugs. Ideally, for each drug, the name, the mode-of-action, and the target gene(s) should be listed in the auto-complete
    • one of the three available scores (Activity Area, IC50, or EC50)

Implemenation pointers

  • SQL queries:
    • depletion_tables = ['depletionscore']
      depletion_attributes = ['rsa', 'ataris', 'ceres']
      cellline_data = DataEntity(cellline.schema, tables, attributes, operators)
      tissue_data = DataEntity(tissue.schema, tables, attributes, operators)
      cellline_depletion = DataEntity(cellline.schema, depletion_tables, depletion_attributes, operators)
    • # depletion scores
      create_gene_sample_score(views, gene, cellline, cellline_depletion, 'depletion_', callback=lambda x: x.filter('depletionscreen'))
      create_gene_sample_score(views, cellline, gene, cellline_depletion, 'depletion_', inline_aggregate_sample_filter=True, callback=lambda x: x.filter('depletionscreen'))
  • Score dialog
    • tdp_publicdb/src/config.ts

      Lines 367 to 401 in 920738f

      export const depletion: IDataTypeConfig = {
      id: 'depletion',
      name: 'Depletion Screen ',
      tableName: 'depletionscore',
      query: 'depletion_score',
      dataSubtypes: [
      {
      id: 'rsa',
      name: 'DRIVE RSA (ER McDonald III et al., Cell, 2017)',
      type: dataSubtypes.number,
      domain: [-3, 3],
      missingValue: NaN,
      constantDomain: false,
      useForAggregation: 'rsa'
      },
      {
      id: 'ataris',
      name: 'DRIVE ATARiS (ER McDonald III et al., Cell, 2017)',
      type: dataSubtypes.number,
      domain: [0, 10000],
      missingValue: NaN,
      constantDomain: false,
      useForAggregation: 'ataris'
      },
      {
      id: 'ceres',
      name: 'Avana CERES (Robin M. Meyers et al., Nature Genetics, 2017)',
      type: dataSubtypes.number,
      domain: [0, 10000],
      missingValue: NaN,
      constantDomain: false,
      useForAggregation: 'ceres'
      }
      ]
      };
    • // Factories for depletion scores for DRIVE data
      export function createSingleDepletionScoreDialog(pluginDesc: IPluginDesc, extra: any, countHint?: number) {
      return createScoreDialog(pluginDesc, extra, FORM_SINGLE_SCORE_DEPLETION.slice(), countHint);
      }
      export function createSingleDepletionScore(data: ISingleScoreParam, pluginDesc: IPluginDesc): IScore<number>|IScore<any>[] {
      return initializeScore(data, pluginDesc, (parameter, dataSource, oppositeDataSource) => new SingleDepletionScore(parameter, dataSource, oppositeDataSource));
      }
    • class SingleDepletionScore extends ASingleScore implements IScore<any> {
      constructor(parameter: ISingleScoreParam, dataSource: IDataSourceConfig, oppositeDataSource: IDataSourceConfig) {
      super(parameter, dataSource, oppositeDataSource);
      }
      protected getViewPrefix(): string {
      return 'depletion_';
      }
      protected createFilter(): IParams {
      return {
      depletionscreen: this.dataSubType.id === 'ceres' ? 'Avana' : 'Drive'
      };
      }
      }
    • tdp_publicdb/phovea.js

      Lines 544 to 559 in c341197

      registry.push('tdpScore', prefix + '_depletion_single_score', function () {
      return import ('./src/scores/SingleScore');
      }, {
      name: 'Depletion Screen Score (Single)',
      idtype: idType,
      primaryType: idType,
      oppositeType: 'Ensembl',
      factory: 'createSingleDepletionScoreDialog'
      });
      registry.push('tdpScoreImpl', prefix + '_depletion_single_score', function () {
      return import ('./src/scores/SingleScore');
      }, {
      factory: 'createSingleDepletionScore',
      primaryType: idType,
      oppositeType: 'Ensembl'
      });

Definition of done

  • SQL schema for the new views is added as alembic revision
  • Score must be listed in the add column dialog
  • Score dialog allows multi-selection of drugs and attributes/scores
  • Score adds multiple columns (drugs x scores)
  • Scores show the correct results for the corresponding cell lines

Filtering for predefined named sets not working within Copy Number detail view

@zichner commented on Wed Oct 11 2017

Release: dev

Steps to reproduce

  1. Open list of all cell lines
  2. Select A549
  3. Open Copy Number detail view (-> Copy number values are shown)
  4. Filter detail view by Predefined Named Set normal chromosome protein coding human genes
    -> No copy number information is shown

I guess this is due to a bug in the SQL statement:

SELECT d.ensg AS id, d.relativecopynumber AS score 
FROM cellline.tdp_copynumber d 
INNER JOIN cellline.tdp_cellline s ON d.celllinename = s.celllinename 
INNER JOIN public.tdp_gene g ON d.ensg = g.ensg 
WHERE g.species = 'human' 
AND d.celllinename = 'A549' 
AND d.celllinename = ANY(ARRAY(SELECT celllinename FROM cellline.tdp_panelassignment WHERE panel = 'normal chromosome protein coding human genes'))

The last part should filter genes but is filtering cell lines.

Please check again the code that builds the SQL queries very thoroughly to avoid these kind of bugs (the SQL queries are one of the most crucial parts of the application).

PS: Interestingly, filtering for predefined named sets is working in the Combined View. However, shouldn't both SQL queries be almost identical?

Column label wrong for scores aggregated by frequency

@zichner commented on Tue Sep 05 2017

Steps to reproduce

  1. Open list of "Cancer gene census" genes
  2. Add aggregated cell line score column
    • Tumor type: bladder carcinoma
    • Data type: AA Mutated
    • Aggregation: Frequency

-> Column header is "AA Mutated <0 Frequency", but should be "AA Mutated Frequency" (the "<0" should be removed)

Also the column header tool tip contains "Frequency: <0", but should be "Aggregation: Frequency"

Add Column to GeneList Tour is failing

  • Release number or git hash: de96c5c
  • Web browser version and OS: Chrome 91, Linux
  • Environment (local or deployed): both

Steps to reproduce

  1. Open https://ordino-daily.caleydoapp.org/#/
  2. Go to Onboarding Tours
  3. Select Add Column to GeneList Tour

Observed behavior

  • In one step, the tour fails because one of the root selectors has been changed from col-sm-auto to col-sm-12

Expected behavior

  • The tour works without any errors

Migrate to Jest

We should migrate from Karma to Jest for testing due to compatibility issues.

After the update to Typescript 3.8, we can use the latest version.
For the usage of ts-jest it is important to use a pinned version without ^. The reason for that is that its Major version follows Jest, which means that minor changes can be breaking. See the docs.

Make repository public

Observed behavior

  • This repository cannot be published to npm, due to this flag in package.json:
"private": true,

Expected behavior

  • Option should be set to false in order to be able to publish it to npm.

Replace `Unkown` with missing value dash for `AA mutated`, `DNA mutated`, and `Copy Number Class`

Coming from https://github.com/Caleydo/tdp_bi_bioinfodb/issues/1085

As far as I can see, AA mutated can currently only be Non Mutated, Mutated and Unknown, i.e., there is no null. All values that are either null in the DB or are not existing in the DB are set to Unknown.

I think this should be changed in a way that everything which is currently Unknown (either null in DB or not in DB) becomes null in Ordino and gets shown as a dash.
This should be the case for AA mutated, DNA mutated, as well as Copy Number Class.

Checklist

  • Create session that contain an Unknown filter before the refactoring
  • Refactor code
  • Check if existing and new provenance graphs still work after the refactoring

Integrate alembic and add initial database schema

Coming from PR #63 we should add Alembic for SQLAchemy. The first alembic revision should create the initial Ordino database schema without data.

Tasks

  • Connect to Ordino public DB
  • Export schema pg_dump -U postgres -s postgres > exportFile.dmp (see https://www.postgresql.org/docs/10/app-pgdump.html and https://stackoverflow.com/a/31602990)
  • Initialize alembic
  • Write initial alembic revision that creates the schema in the upgrade method
  • Always add IF NOT EXISTS to prevent errors due to duplicates
  • Write downgrade method that drops tables and views
  • Comment the content of downgrade so that we do not delete data by accident (but if somebody wants to use it, the method is ready to use)
  • Test it

Definition of Done

  • Creating a new Ordino instance and running alembic head should create the schema in the local docker publicdb container
  • Ordino should start (with no data)

Update to Typescript 3.8

Observed behavior

Updating the typescript and typedoc versions according to the snippet below without using stricter linting makes it possible to use the current typescript version with a minimum amount of errors.

"typedoc": "~0.16.9",
"typescript": "~3.8.1-rc",

Expected behavior

Updating the dependencies in the package.json and adding the following line of code in tsconfig.json (as well as fixing appearing type errors) should be sufficient for the usage of Typescript 3.8 as a first step.

"downlevelIteration": true, // required as long as target is `es5`

Integrate Depletion screen data

@awernibi commented on Thu Sep 07 2017

Storing depletion screen data require a DB update, with the following SQL script.

set search_path = cellline,public;

DROP table PROCESSEDDEPLETIONSCORE CASCADE;
DROP table DEPLETIONSCREEN;

create table DEPLETIONSCREEN (
DEPLETIONSCREEN      TEXT                 not null,
DEPLETIONSCREENDESCRIPTION TEXT                 null,
constraint PK_DEPLETIONSCREEN primary key (DEPLETIONSCREEN)
);

create table PROCESSEDDEPLETIONSCORE (
ENSG                 TEXT                 not null,
CELLLINENAME         TEXT                 not null,
DEPLETIONSCREEN      TEXT                 not null,
RSA                  FLOAT4               null,
ATARIS               FLOAT4               null,
constraint PK_PROCESSEDDEPLETIONSCORE primary key (CELLLINENAME, ENSG, DEPLETIONSCREEN)
);

alter table PROCESSEDDEPLETIONSCORE
   add constraint FK_PROCESSE_REFERENCE_DEPLETIO foreign key (DEPLETIONSCREEN)
      references DEPLETIONSCREEN (DEPLETIONSCREEN)
      on delete restrict on update restrict;

alter table PROCESSEDDEPLETIONSCORE
   add constraint FK_PROCESSE_REFERENCE_GENE foreign key (ENSG)
      references GENE (ENSG)
      on delete restrict on update restrict;

alter table PROCESSEDDEPLETIONSCORE
   add constraint FK_PROCESSED_DEPLETION_CELLLINE foreign key (CELLLINENAME)
      references CELLLINE (CELLLINENAME)
      on delete restrict on update restrict;

CREATE INDEX idx_processeddepletionscore ON cellline.processeddepletionscore(ensg);

CREATE VIEW cellline.processeddepletionscoreview AS
SELECT d.ensg, symbol, celllinename, depletionscreen, rsa, ataris FROM cellline.processeddepletionscore d
JOIN gene g ON (d.ENSG = g.ENSG);

The Novartis Drive data set (http://dx.doi.org/10.1016/j.cell.2017.07.005) can be found in the AWS ordino database. In order to have a local version, copy the content of tables DEPLETIONSCREEN and PROCESSEDDEPLETIONSCORE from db schema cellline (~2.7 million rows).


@awernibi commented on Fri Sep 08 2017

Two views have been created:

CREATE VIEW cellline.tdp_depletionscreen AS
  SELECT depletionscreen, depletionscreendescription
  FROM cellline.depletionscreen;

CREATE VIEW cellline.tdp_depletionscore AS
  SELECT ensg, symbol, celllinename, depletionscreen, rsa, ataris
  FROM cellline.processeddepletionscoreview ;

@zichner commented on Fri Sep 15 2017

In order to make this data available in Ordino, please add the two menus:

  • Aggregated RNAi Screen Score
  • Single RNAi Screen Score

to the gene and cell line list.

The Score dialogs should (for now) be analog to Aggregated Cellline Score/Aggregated Gene Score as well as Single Cellline Score/Single Gene Score.

The Data Types should be DRIVE RSA as well as DRIVE ATARiS. The actual data is stored in the DB view cellline.tdp_depletionscore (as described above).

dTiles: Co-Expression view shows "Select two or more genes" when rearranging dashboard

  • Release number or git hash:
  • Web browser version and OS:
  • Environment (local or deployed): deployed master (and probably dev)

Steps to reproduce

  1. Open dTiles (dtiles.app.datavisyn.io)
  2. Select 1 gene
  3. Open the Co-Expression view (the view should show the message "Select two or more genes")
  4. Select another gene (the Co-Expression view renders the plots)
  5. Add another view and split them

Observed behavior

  • The Co-Expression view shows the initial message ("Selecct two ...") again
  • When changing the view's parameters, the view rerenders again

Expected behavior

  • It should render the same plots as before

Add pgAdmin container for development purposes

We should prepare a pgAdmin container for development purposes in the docker-compose.partial.yml. The container should be commented out by default, so that developers can activate them on demand.

In the following a sample config that can be adapted.

  pgadmin:
    links:
      - db_publicdb:publicdb
    image: dpage/pgadmin4:latest
    ports:
      - "5050:80"
    environment:
      - PGADMIN_DEFAULT_EMAIL=admin
      - PGADMIN_DEFAULT_PASSWORD=admin
    networks:
      - db_publicdb

PubMed detail view not working

Environment

Steps to reproduce the bug

  1. Open list of all genes
  2. Select one gene
  3. Open "PubMed" detail view

->Error: pubmed.ncbi.nlm.nih.gov refused connection

If showing PubMed results within Ordino is not possible anymore, please modify the detail view in a way that it just provides a link (analog to "Human Protein Atlas" detail view).

Workspace does not work due to old folder in dist files

  • Release number or git hash: 78f859a
  • Web browser version and OS: Chrome 103, Ubuntu 22.04
  • Environment (local or deployed): local

Steps to reproduce

  1. Setup dTiles workspace using yo phovea:setup-workspace datavisyn/dTiles_product -b develop_aws
  2. Run npm start and docker-compose up
  3. Add dburl in config.json

Observed behavior

  • localhost:8080 only shows a blank screen and the following error in the console:
    image

Expected behavior

  • dTiles runs normally

Remove dTiles Dependency

We can remove dTiles from the dependencies in the package.json since it is not used at the moment and has no priority for deployment.

Error in GeneSymbolDetector and IDTypeDetector with non-string values

  • Release number or git hash: v7.0.0
  • Web browser version and OS: Chrome 84
  • Environment (local or deployed): both

Steps to reproduce

  1. Upload a CSV or XSLX that contians boolean values

Observed behavior

Uncaught (in promise) TypeError: v.trim is not a function
    at Object.detectIDType (GeneSymbolDetector.ts:9)
    at valuetype_idtype.ts:98
    at Generator.next (<anonymous>)
    at fulfilled (tslib.es6.js:71)

if (!v || v.trim().length === 0) {
continue; //skip empty samples
}

Uncaught (in promise) TypeError: v.trim is not a function
    at Object.detectIDType (IDTypeDetector.ts:19)
    at valuetype_idtype.ts:98
    at Generator.next (<anonymous>)
    at fulfilled (tslib.es6.js:71)

if (v == null || v.trim().length === 0) {
continue; //skip empty samples
}

Expected behavior

The detectors should check for strings before using trim().

Adding a New Single Score Column breaks the ranking

Environment

  • Release number or git hash: 7de3f13
  • Browser: Chrome 103
  • Deployed / Local: both

Steps to reproduce the bug

  1. Open https://ordino-daily.caleydoapp.org/#/
  2. Click Start Analysis
  3. Open normal chromosome protein coding human genes
  4. Add a Cell Line Score (Single) column
  5. Select any cell line
  6. Select any data type

Observed Behavior

Peek 2022-08-23 09-11

  • The operation is aborted and the user comes back to the dataset cards

Expected Behavior

  • score column is added to the ranking normally

Update to postgres 12

The new h38 release requires Postgres 12. We need to update the docker-compose.partial.yml.

Add mutation zygosity information

@zichner commented on Thu Sep 14 2017

Currently, we have 4 types of mutation information:

  • DNA Mutated
  • DNA Mutation
  • AA Mutated
  • AA Mutation

Please add a fifth one: Zygosity

The corresponding information is stored in the cellline / tissue view tdp_mutation

LEFT JOIN vs INNER JOIN in aggregated score DB query

@zichner commented on Tue Oct 24 2017

Release: dev

When building an SQL query for a cell line list score that aggregates across a set of genes, the gene table is join by an INNER JOIN whereas the gene set table is joined by a LEFT JOIN.

Here is an example:

/api/tdp/db/publicdb/cellline_gene_frequency_copynumberclass_score/score?attribute=copynumberclass&species=human&table=copynumber&target=Cellline&value=-2&filter_panel_ensg=Cancer+Gene+Census
SELECT d.celllinename AS id, SUM((copynumberclass in (-2))::INT4) as count, COUNT(copynumberclass) as total 
FROM cellline.tdp_copynumber d 
INNER JOIN public.tdp_gene s ON d.ensg = s.ensg 
LEFT JOIN public.tdp_geneassignment ga ON (d.ensg = ga.ensg)  
WHERE s.species = 'human' AND ga.genesetname = 'Cancer Gene Census' GROUP BY d.celllinename;

Is there a reason for using different joins?
If not, please change the second join also into an INNER JOIN, which is much faster in some cases.
Additionally, you can harmonize the ON statement (either use always brackets or never).


@zichner commented on Mon Oct 30 2017

Just FYI: This issue is the most relevant at the moment. As soon as this change is made and deployed, I can start checking the DB queries for correctness and we can see if all steps of the described use cases are running fast enough or if we need to modify the DB queries or the use cases.


@mstreit commented on Mon Oct 30 2017

We discussed it this morning. @lehnerchristian will take care of it today.

Ordino welcome tour not working

Environment

  • Browser: Firefox, Edge Chromium
  • deployed/local: Ordino-daily

Steps to reproduce the bug

  1. Start Ordino Welcome Tour and click through it

-> in step 5, only the gray overlay is visible and the dialog is not shown anymore. Consequently, Ordino is completely non-responsive.

I guess the problem is caused by changes in LineUp v4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.