Topic: dpo Goto Github

Some thing interesting about dpo

👇 Here are 55 public repositories matching this topic...

adithya-s-k / indic-llm

dpo,A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

User: adithya-s-k

continual-pre-training dpo finetuning finetuning-llms llm lora

anilca / nettrader.indicator

dpo,Technical anaysis library for .NET

User: anilca

adl adx atr bollinger-bands cci cmf cmo dema dpo ema

argilla-io / notus

dpo,Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

Organization: argilla-io

dpo fine-tuning trl zephyr lm-alignment preference-data alignment-handbook

armbues / sillm

dpo,SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

User: armbues

apple-silicon dpo large-language-models llm llm-inference llm-training lora mlx

armbues / sillm-examples

dpo,Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

User: armbues

apple-silicon dpo large-language-models llm llm-inference llm-training lora mlx

contextualai / halos

dpo,A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Organization: contextualai

Home Page: https://arxiv.org/abs/2402.01306

alignment dpo halos kto ppo rlhf

cyberagentailab / filtered-dpo

dpo,Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

Organization: cyberagentailab

Home Page: https://arxiv.org/abs/2404.13846

alignment dpo rlhf

daehankim / easyrlhf

dpo,EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

User: daehankim

dpo instruction-tuning ipo language-model rlhf rrhf sft

dbf / django-dpotools

dpo,An open source collection of tools meant to simplify the life of data protection officers (DPOs) of large entities

User: dbf

dpo dsb dsgvo gdpr rpa vvt django python

developermiranda / dpoquiz

dpo,Projeto criado durante a imersão React v2

User: developermiranda

Home Page: https://dpoquiz.vercel.app/

aluraquiz imersao-react alura dpo

dipnot / direct-pay-online-php

dpo,Unofficial PHP wrapper for Direct Pay Online API

Organization: dipnot

direct-pay-online dpo payment-system api php-library composer

directpay-online / dpo_magento_2

dpo,This is the DPO Group plugin for Magento 2.

Organization: directpay-online

magento2 magento2-module dpo

directpay-online / dpo_whmcs

dpo,This is the DPO Group plugin for WHMCS.

Organization: directpay-online

whmcs whmcs-payment-gateway dpo

dpo-group / dpo_gravity_forms

dpo,This is the DPO Group plugin for Gravity Forms.

Organization: dpo-group

dpo gravity-forms gravityforms-payment gravityforms

dpo-group / dpo_whmcs

dpo,This is the DPO Group plugin for WHMCS.

Organization: dpo-group

whmcs whmcs-payment-gateway dpo

dpo-group / dpo_woocommerce

dpo,This is the DPO Pay plugin for WooCommerce.

Organization: dpo-group

dpo woocommerce-payment woocommerce

ducnh279 / align-llms-with-dpo

dpo,Align a Large Language Model (LLM) with DPO loss

User: ducnh279

alignment dpo llms python pytorch transformers

dvlab-research / step-dpo

dpo,Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Organization: dvlab-research

dpo llm math reasoning

eyess-glitch / phi-2-fine-tuning

dpo,This repository contains the source code used for finetuning the LLM phi-2 with several frameworks, such as DPO.

User: eyess-glitch

dpo fine-tuning human-eval llm phi-2 retrieval-augmented-generation

golang-malawi / go-dpo

dpo,Unofficial Go library for DPO Group

Organization: golang-malawi

Home Page: https://nndi.cloud/oss/go-dpo/

dpo golang library payments

itnef / grexgen

dpo,(MultiDi)Graph Morphism Example Generator

User: itnef

adhesive-categories dpo example-generator graph-theory morphisms quickcheck

jianzhnie / llamatuner

dpo,Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

User: jianzhnie

Home Page: https://jianzhnie.github.io/llmtech/

llama chatgpt dpo llama3 mixtral ppo qlora qwen rlhf

junkangwu / beta-dpo

dpo,$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

User: junkangwu

alignment dpo preference-alignment rlhf

junkangwu / dr_dpo

dpo,Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

User: junkangwu

alignment distributionally-robust-optimization dpo preference-alignment rlhf

karhel / glpi-dporegister

dpo,Processings Register for DPO (GDPR) - GLPI Plugin

User: karhel

dpo gdpr glpi pdf personal-data-protection plugin register rgpd

kyryl-opens-ml / rlfh-dagster-modal

dpo,Re-usable & scalable RLHF training pipeline with Dagster and Modal.

Organization: kyryl-opens-ml

Home Page: https://kyrylai.com/2024/06/10/rlhf-with-dagster-and-modal/

dagster dpo llm modal rlhf

martin-wey / codeultrafeedback

dpo,CodeUltraFeedback: aligning large language models to coding preferences

User: martin-wey

Home Page: https://arxiv.org/abs/2403.09032

alignment code-generation dpo large-language-models codal-bench codeultrafeedback llm-as-a-judge

modelscope / ms-swift

dpo,Use PEFT or Full-parameter to finetune 300+ LLMs or 60+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Organization: modelscope

Home Page: https://swift.readthedocs.io/zh-cn/latest/LLM/index.html

agent llm lora llama pre-training sft deploy multimodal dpo llava llama3 modelscope unsloth peft qwen2 internvl ollama megatron minicpm-v reft

nycyberlawyer / privacymap

dpo,Privacy Mapping Open Source Software

User: nycyberlawyer

Home Page: https://sites.google.com/mydpo.us/mydpous/home

ccpa data dpo gdpr privacy privacy-enhancing-technologies privacy-protection

octopusmind / dpo

dpo,dpo算法实现

Organization: octopusmind

dpo qwen rlhf lora

omarmnfy / finetune-llama3-using-direct-preference-optimization

dpo,This repository contains Jupyter Notebooks, scripts, and datasets used in our finetuning experiments. The project focuses on Direct Preference Optimization (DPO), a method that simplifies the traditional finetuning process by using the model itself as a feedback mechanism.

User: omarmnfy

dpo finetuning rlhf stf