openhackathons-org / end-to-end-llm Goto Github PK

This repository is an AI Bootcamp material that consist of a workflow for LLM

License: Apache License 2.0

Jupyter Notebook 56.30% Shell 5.10% Python 38.38% HTML 0.22% JavaScript 0.01%

deep-learning natural-language-processing p-tuning prompt-tuning nemo-megatron llm nemo-guardrails question-answering tensorrt-llm genai

end-to-end-llm's People

Contributors

Stargazers

Watchers

end-to-end-llm's Issues

Feature Request - Addition of Benchmarking for TRT-LLM

The current TRT-LLM Materials discusses the Hands-on aspects of getting from a Model to Deployment in a Triton server.

Given that TRT-LLM focuses on Performance, we could have a section that discusses the performance aspects of TRT-LLM and the various optimisations that are available to the end user.

Issue: 98 - Address already in use, Unable to download MegatronGPT 1.3B, and Triton Server issue

Issues with downloading the MegatronGPT 1.3B model from google drive which cause a delay running the lab activity 2 notebook. Google drive restrict permission when it sense multiple download request. The solution is to download the files ahead before mounting workspace into the container
“errno: 98 - Address already in use” error when running the trainer.fit() cell within the Prompt/p-tuning notebook. The solution is to set the DDP port to something else before trainer.fit (eg: os.environ['MASTER_PORT'] = PORT + )
Triton Server error ”mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated ” due several jobs on the same nodes. The issue can be resolved by modifying a line in the launch_triton_server.py script, from:

cmd += ' -n 1 {} --model-repository={} --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix{} : '.format(tritonserver, model_repo, i)_

cmd += ' -n 1 {} --model-repository={} --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix{} : '.format( tritonserver, model_repo, str(i)+os.environ['USER'])_

Feature Request - Triton Server Deployment - Hands-on latency and throughput comparison across two models

An important aspect of deployment would be that the model needs to be served to a wide range of users. Understanding the throughout and latency and comparison with additional optimisation to the Vanilla deployment could be helpful to get a better picture of the Deployment requirements and perspective.

Feature Request - Workflow for Integration of New models with TRT-LLM

TRT-LLM does a great job in optimising the supported set of models. But a Notebook/ Section discussing the workflow and steps to integrate a custom model would be very helpful for custom integrations.

Issue: Deployment guide is outdated or incorrect

Deployment guide is stating the following:

_When you are inside the container, launch jupyter lab: jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace.

Open the browser at http://localhost:8888 and click on the Start_here.ipynb notebook_

But when building the container there is no actual Start_here.ipynb (unless you go to archived/workspace, which indicates me that it is either deprecated or not well defined where i should look for the notebook).

Feature Request: Building TensorRT Engine for Finetuned Llama-2-7B Model

The feature request is base on the use of TRT-LLM to build a tenssorrt engine from a finetuned llama-2-7b model.
Expatiate on the process built process
Exemplify vanilla optimization process

Issue: Nemo_primer.ipynb imports not working.

In Nemo_primer.ipynb when doing import nemo.collections.asr as nemo_asr, import nemo.collections.nlp as nemo_nlp and
import nemo.collections.tts as nemo_tts I get the following error
ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.2.

If I try to solve it by doing pip install tokenizers==0.13.1 I get this other error
File /usr/local/lib/python3.10/dist-packages/pytorch_lightning/_graveyard/utilities.py:25
17 def _get_gpu_memory_map() -> None:
18 # TODO: Remove in v2.0.0
19 raise RuntimeError(
20 "pytorch_lightning.utilities.memory.get_gpu_memory_map was deprecated in v1.5 and is no longer supported"
21 " as of v1.9. Use pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats instead."
22 )
---> 25 pl.utilities.memory.get_gpu_memory_map = _get_gpu_memory_map

AttributeError: partially initialized module 'pytorch_lightning' has no attribute 'utilities' (most likely due to a circular import)

It might be helpful to specify the desired package versions in the pip install inside the Dockerfile_nemo because it might be that
doing
RUN pip install lightning RUN pip install megatron.core RUN pip install --upgrade nemoguardrails RUN pip install openai RUN pip install ujson RUN pip install --upgrade --no-cache-dir gdown
is installing new and uncompatible versions of the libraries (I mean uncompatibles with the tutorials showed in the notebooks).

Feature Request: Validating prompt response from Triton server using NeMo Guardrails

This feature request is about creating a content that demonstrate how to connect nemo guardrails to Llama-2-7b-chat TensorRT engine deployed on Triton Inference Server. This approach helps avoid the need for an Openai key and bypass NeMo-LLM Service when using NeMo guardrails to guard user prompts to/from the deployed model. You can use the LangChain framework to achieve the task.
The feature is required to complete the End-to-End LLM pipeline.

Feature Request: Fine-tune Llama-2-7B with Custom Dataset

This feature request is required as part of an End-to-End pipeline. The process should include:

dataset preprocessing
use of PEFT method to fine-tune llama-2-7b for text generation task
Fine-tuned and based model merging
inferencing

Missing configuration file.

In workspace/jupyter_notebook/nemo/Multitask_Prompt_and_PTuning.ipynb, the code references megatron_gpt_prompt_learning_config.yaml, but I couldn't locate this file.

Is there a source where I can find the megatron_gpt_prompt_learning_config.yaml file?

Issue: Many unnecessary files and folders within the NeMo Guardrails lab

Many unnecessary files and folders are included within the NeMo Guardrails lab, making navigation within the lab difficult. The lab should not have the entire clone repository but a folder containing only needed files, folders, and notebooks. The Deployment_Guide.md file should explicitly state the type of services and requirements (openai and nemo llm service) to run the lab.

Issue: NeMo container library issues and Start_Here.ipynb links conflict issues for different containers

NeMo container issues:

Unable to download dataset due to gdown library issue. The gdown library requires an upgrade within the nemo container.
Unable to connect to the server with NeMo-LLM service. To solve the issue, the NeMO guardrail library requires an upgrade within the container.

Start_Here.ipynb links conflict issues for different containers:

Users sometimes click on labs that run on different containers and get errors. To avoid this issue, a separate Start_Here.ipynb notebooks for each lab should be created.

TRT-LLM & Triton Version Mismatch

There is no version of TRT-LLM & Triton set , so there are version conflicts.

solved with #23

Issues: Readme file update

The readme file requires update to match the copyedited version.

openhackathons-org / end-to-end-llm Goto Github PK

end-to-end-llm's People

Contributors

Stargazers

Watchers

Forkers

end-to-end-llm's Issues

Recommend Projects

Recommend Topics

Recommend Org