Git Product home page Git Product logo

Hi, I'm Daniel 👋

Senior Scientist at Pacific Biosciences (PacBio). Previously, I was a PhD Candidate at Johns Hopkins University in the department of Computer Science. Before that I was a Bioinformatics Scientist at ARUP Laboratories, where I worked on cell-free circulating tumor DNA (ctDNA) analysis and clinical genomics after my training in Physics [BS] and Biophysics/Computational Biology [MS]. I've worked with biological data (sequence, molecular modeling, metabolomics, transcriptomics, metagenomics), telecommunications data, as well as graph algorithms, machine learning, and numerical optimization.

🔭 I've worked on similarity search, and clustering, and indexing for large-scale biological data, simd/gpu-accelerated and randomized algorithms. Most recently, I've been developing methods for human genetics, including long RNA-seq, VNTRs, and haplotype phasing.

😄 Pronouns: He/Him/His

A quick tour of my interests

  1. Practical randomized algorithms

This ranges from libraries providing sketch data structures and coresets, as well as projects using random projections and DCI.

My work on coresets and clustering is primarily part of the minicore project, with the aims of providing a standard utility for coreset construction and weighted clustering, especially for exponential family models and shortest-paths metrics.

  1. Computational Biology

The bonsai project provides methods for metagenomic analysis, along with k-mer encoding/decoding and I/O, while the Dashing performs scalable sketching and comparison of sequence data.

BMFtools performs molecular demultiplication over sequencing barcoded data, reducing error rates while eliminating redundant information. Designed for ctDNA, this method can reduce error rates by orders of magnitude, allowing confident detection of very rare events.

scavenger has rust implementations using tch-rs for VAEs for count-based data, applied to single-cell transcriptomics.

I also co-developed pbfusion, a fast tool for characterizing transcriptional abnormalities.

  1. General C++

Most of my projects fall into this category, serving as tools I can reuse in various projects.

Some of my favorites:

  • vec provides type-generic abstractions over x86-64 vectorization, making it easy to write fast, portable code.
  • kspp is an RAII-based variant of kstring from klib with extra niceties making appending printf-style formatting easy.
  • aesctr provides STL-style random number generators built on fast aes-ctr and wyhash
  • circularqueue provides a range-based circular queue container that uses power-of-two sizes

Daniel Baker's Projects

10xdash icon 10xdash

Pairwise similarity metrics for 10x barcoded RNASeq datasets

aesctr icon aesctr

C++ implementation of AES-CTR PRNG using SIMD, based on Samuel Neves' Implementation

bamcov icon bamcov

Quickly calculate and visualize sequence coverage in alignment files

bbhash icon bbhash

Bloom-filter based minimal perfect hash function library

big-bwt icon big-bwt

modified from gitlab.com/manzai/Big-BWT to support fasta files

bioseq icon bioseq

Tokenizers and Machine Learning Models for biological sequence data

blaze_tensor icon blaze_tensor

3D Tensors for Blaze (https://bitbucket.org/blaze-lib/blaze)

bonsai icon bonsai

Bonsai: Fast, flexible taxonomic analysis and classification

circularqueue icon circularqueue

Circular Queue for minimizing memory allocations in deque applications

clhash icon clhash

C library implementing the ridiculously fast CLHash hashing function

comanche icon comanche

Component-based development framework for memory-centric storage systems

cqf icon cqf

A General-Purpose Counting Filter: Counting Quotient Filter

cybmf icon cybmf

Old Utilities from BMFtools development

dashing icon dashing

Fast and accurate genomic distances using HyperLogLog

dashing2 icon dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.