Git Product home page Git Product logo

chiendb97 / tensorrt-llm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia/tensorrt-llm

0.0 0.0 0.0 273.54 MB

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Home Page: https://nvidia.github.io/TensorRT-LLM

License: Apache License 2.0

Shell 0.01% C++ 99.32% Python 0.51% C 0.01% PowerShell 0.01% Cuda 0.14% Makefile 0.01% Smarty 0.01% CMake 0.01% Dockerfile 0.01%

tensorrt-llm's Introduction

TensorRT-LLM

A TensorRT Toolbox for Optimized Large Language Model Inference

Documentation python cuda trt version license

Architecture   |   Results   |   Examples   |   Documentation


Latest News

TensorRT-LLM Overview

TensorRT-LLM is an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).

The TensorRT-LLM Python API architecture looks similar to the PyTorch API. It provides a functional module containing functions like einsum, softmax, matmul or view. The layers module bundles useful building blocks to assemble LLMs; like an Attention block, a MLP or the entire Transformer layer. Model-specific components, like GPTAttention or BertAttention, can be found in the models module.

TensorRT-LLM comes with several popular models pre-defined. They can easily be modified and extended to fit custom needs. Refer to the Support Matrix for a list of supported models.

To maximize performance and reduce memory footprint, TensorRT-LLM allows the models to be executed using different quantization modes (refer to support matrix). TensorRT-LLM supports INT4 or INT8 weights (and FP16 activations; a.k.a. INT4/INT8 weight-only) as well as a complete implementation of the SmoothQuant technique.

Getting Started

To get started with TensorRT-LLM, visit our documentation:

Community

  • Model zoo (generated by TRT-LLM rel 0.9 a9356d4b7610330e89c1010f342a9ac644215c52)

tensorrt-llm's People

Contributors

kaiyux avatar shixiaowei02 avatar juney-nvidia avatar a5hwinjs avatar basiccoder avatar heandres avatar whitelok avatar minwhoo avatar sam-india-007 avatar sjbae1999 avatar tp5uiuc avatar superjomn avatar byshiue avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.