ucscxena / python-scripts Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
input:
xena genomicMatrix file -supplied on the command line
gene file (each row is a gene) - supplied on the command line
output:
upper quantile normalized genomicMatrix file,
the offset parameter for each sample
download a gene expression xena genomicMatrix on a TCGA data, such as http://ec2-52-23-185-93.compute-1.amazonaws.com/datapages/?dataset=TCGA.COAD.sampleMap/HiSeqV2&host=https://tcga.xenahubs.net
sample ids are column header, gene names are on the first column'
using the unit in the url to revert back the value to before the log transformation, e.g. "log2(norm_count +1)" -> revert back to "norm_count" , where 1 is the pseudo count
Using the reverted values, for each column in the file downloaded in 1, the script will only rank the genes belong to the gene file, identify the upper (top) 75% ranked value (100% is the gene with the highest value), assign this as the upper quantile value for this column, then divide every value in the column by the upper quantile value, multiply by 1 million, plus the pseudo count value (e.g. 1 in the example), then log2 transform
log2(x+1) => x => log2(x/uq * 1E6 +1)
a good gene list genes that are expression in at least 90% TCGA samples and 90% GTEx samples: is https://github.com/ucscXena/python-scripts/blob/master/geneLists/GTEX_TCGA_genes
output the upper quantile normalized matrix
output the upper quantile value for each column in a separate file in this format
column id upper quantile value
column id upper quantile value
optional: include an optional probeMap file for the situation the download genomicMatrix file does not use gene names. The probeMap file can be downloaded, containing mapping information between gene and first column in the matrix.
build and publish our python xena API, so that others and ourself can write python code connect to xenahub.
start with https://github.com/jingchunzhu/cgDataNew/tree/master/xena
materials to include in the package or module:
xenaQuery.py
xenaAPI.py
dataset_obj/
name of module: xenaPython
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.