Git Product home page Git Product logo

🧮🧬 Alex Diaz-Papkovich, PhD 🧬🧮

I'm a statistician and data scientist. I'm currently at Brown University working as a postdoctoral research associate at the Data Science Institute with Sohini Ramachandran. My PhD work was at McGill University in Quantitative Life Sciences with Simon Gravel, where I studied topological data analysis methods for genetic data. You can find my published research on Google Scholar.

I also enjoy collecting data on a variety of topics. Some of my side-projects include tracking the length of the Rideau Canal skating season and collecting news stories of traffic violence.

Some of my academic research:

Non-linear dimensionality reduction for visualizing population genetic data

UMAP is an efficient method to visualize biobank data. You can find structure in your data (i.e. population structure) related to factors like demographic history or biobank sampling methodology. When you colour in the visualizations with other data, like geography or phenotypic measures, you can see lots of patterns and study them further. You can also work in 3D and get creative, doing stuff like converting UMAP's $(x,y,z)$ coordinates to RGB positions to create colour maps.

Paper: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts, Diaz-Papkovich et al, PLoS Genetics, 2019.

Related Github repositories:

Stratification of biobank data

Though UMAP tends to generate clusters, it is not a clustering algorithm. To extract clusters from UMAP data, we use a density-based method called HDBSCAN. We can use this for stratification to get a better grasp of the population structure in our data, study how methods like polygenic scores transfer between populations, and do QC on biobank data.

Preprint: Topological stratification of continuous genetic variation in large biobanks, Diaz-Papkovich et al, bioRxiv, 2023.

Related Github repositories:

Alex Diaz-Papkovich's Projects

1kgp_dimred icon 1kgp_dimred

Interactive demonstration of how to use PCA, t-SNE, and UMAP on genotype data from the Thousand Genome Project.

death_by_car icon death_by_car

Tracking collisions between vehicles and pedestrians/cyclists in Canada.

dim_red icon dim_red

Dimension reduction and Visualisations for Genetic Data

end icon end

Exploration of particle tracking focused on extra-nucleolar droplets (ENDs)

gt-dimred icon gt-dimred

Genotype dimension reduction research. Code for manuscript "UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts"

hgss_regression_workshop icon hgss_regression_workshop

A workshop on introductory linear regression in R developed for graduate students in human genetics. Covers the basics of the concept, its statistical foundation, and some R code to illustrate it.

intro_to_r_stats icon intro_to_r_stats

This is a workship as part of the Fall 2022 McGill Initiative in Computational Medicine.

online-cv icon online-cv

A minimal Jekyll Theme to host your resume (CV)

skateway icon skateway

Tracking the length of the Rideau Canal skating season over the years.

topstrat icon topstrat

Genotype dimension reduction and clustering research. Code for manuscript "Topological stratification of continuous genetic variation in large biobanks"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.