PeterPham's Projects
Development repository for the Triton language and compiler
TrustLLM: Trustworthiness in Large Language Models
TryOnDiffusion: A Tale of Two UNets Implementation
πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A playbook for systematically maximizing the performance of deep learning models.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
π‘ Semantic search and workflows powered by language models
TypeChat is a library that makes it easy to build natural language interfaces using types.
Typer, build great CLIs. Easy to code. Based on Python type hints.
Understanding Deep Learning - Simon J.D. Prince
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
[CVPR 2023 Best Paper] Planning-oriented Autonomous Driving
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
[NeurIPS'21] Unified tracking framework with a single appearance model. It supports Single Object Tracking (SOT), Video Object Segmentation (VOS), Multi-Object Tracking (MOT), Multi-Object Tracking and Segmentation (MOTS), Pose Tracking, Video Instance Segmentation (VIS), and class-agnostic MOT (e.g. TAO dataset).
This Unity package extends the functionality of the barracuda-inference-base package to perform object detection using YOLOX models.
2-5X faster 70% less memory QLoRA & LoRA finetuning
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
The library provides standard interface as well defines data structures and rules for different video codec classes (video encoding and decoding).
Vector Quantization, in Pytorch
A Python vector database you just need - no more, no less.
Vehicle Speed Estimation
Versatile audio super resolution (any -> 48kHz) with AudioSR.
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
Convert your videos to densepose and use it on MagicAnimate
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models