Topic: ai-safety Goto Github

Some thing interesting about ai-safety

👇 Here are 81 public repositories matching this topic...

agencyenterprise / promptinject

ai-safety,PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

Organization: agencyenterprise

ai-safety language-models ml-safety agi ai-alignment agi-alignment adversarial-attacks gpt-3 large-language-models machine-learning

ai-fail-safe / safe-reward

ai-safety,a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

Organization: ai-fail-safe

ai-safety anomaly-detection ai-alignment fail-safe failsafe

ai4ce / flat

ai-safety,[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory

Organization: ai4ce

Home Page: https://ai4ce.github.io/FLAT/

deep-learning point-cloud lidar adversarial-attacks 3d-object-detection ai-safety trustworthy-ai trustworthy-machine-learning 3d-perception robotics autonomous-driving gnss

cdeiuk / bias-mitigation

ai-safety,Machine Learning Bias Mitigation

Organization: cdeiuk

Home Page: https://cdeiuk.github.io/bias-mitigation/

machine-learning ai-safety bias-correction fairness-ml fairness-ai machine-learning-models fairness-testing

cure-lab / contranet

ai-safety,This is the official implementation of ContraNet (NDSS2022).

Organization: cure-lab

adversarial-attacks ai-safety defense

dit7ya / awesome-ai-alignment

ai-safety,A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.

User: dit7ya

awesome awesome-list ai-safety ai-alignment

dlmacedo / distinction-maximization-loss

ai-safety,A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.

User: dlmacedo

classification deep-learning machine-learning open-set-recognition out-of-distribution-detection pytorch robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-estimation

dlmacedo / entropic-out-of-distribution-detection

ai-safety,A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.

User: dlmacedo

pytorch deep-learning out-of-distribution-detection out-of-distribution machine-learning trustworthy-ai ai-safety anomaly-detection novelty-detection robust-machine-learning

dynaroars / neuralsat

ai-safety,DPLL(T)-based Verification tool for DNNs

Organization: dynaroars

abstraction adversarial-attacks ai-assurance ai-safety dnn-verification dpll robustness robustness-verification sat-solver software-verification

ebagdasa / mithridates

ai-safety,Measure and Boost Backdoor Robustness

User: ebagdasa

ai-safety backdoor-attacks backdoor-resistance data-poisoning hyperparameter-tuning ml-defenses ml-security robustness

endlessloop2 / uc-ai-thinkathon-2023

ai-safety,Winning entry for the UC Chile AI Safety Thinkathon 2023. Coauthor @mon-b

User: endlessloop2

Home Page: https://endlessloop0.itch.io/investigating-the-relationship-between-priming-with-multiple-traits-and-language

ai ai-safety aisafety gpt-3 alignment

ezgikorkmaz / adversarial-reinforcement-learning

ai-safety,Reading list for adversarial perspective and robustness in deep reinforcement learning.

User: ezgikorkmaz

adversarial-attacks robust-machine-learning deep-reinforcement-learning adversarial-reinforcement-learning robust-reinforcement-learning ai-safety machine-learning-safety reinforcement-learning-safety safe-reinforcement-learning adversarial-policies

giskard-ai / awesome-ai-safety

ai-safety,📚 A curated list of papers & technical articles on AI Quality & Safety

Organization: giskard-ai

Home Page: https://giskard.ai

ai ai-alignment ai-safety artificial-intelligence llm llmops machine-learning ml mlops natural-language-processing

giskard-ai / giskard

ai-safety,🐢 Open-Source Evaluation & Testing framework for LLMs and ML models

Organization: giskard-ai

Home Page: https://docs.giskard.ai

mlops ml-validation ml-testing ai-testing ai-safety ml-safety llmops ethical-artificial-intelligence responsible-ai fairness-ai

google-research-datasets / aart-ai-safety-dataset

ai-safety,AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Organization: google-research-datasets

Home Page: https://arxiv.org/abs/2311.08592

ai-safety ml-fairness responsible-ai responsible-ml

guardrail-ml / guardrail

ai-safety,Build LLM apps safely and securely🛡️

Organization: guardrail-ml

Home Page: http://docs.useguardrail.com

ai-safety ai-security firewall guardrail hallucinations logging monitoring-tool openai-api

hendrycks / ethics

ai-safety,Aligning AI With Shared Human Values (ICLR 2021)

User: hendrycks

ai-safety machine-ethics ethical-ai gpt-3 ml-safety

iqtlabs / daisybell

ai-safety,Scan your AI/ML models for problems before you put them into production.

Organization: iqtlabs

bias-correction bias-detection model-poison cybersecurity ai-alignment ai-assurance ai-safety

jacksonkarel / recursive-other-improvement

ai-safety,

User: jacksonkarel

ai artificial-intelligence large-language-models llms code-generation agents ai-agents automl autonomous-agents ai-safety

jakobovski / ai-safety-cheatsheet

ai-safety,A compilation of AI safety ideas, problems, and solutions.

User: jakobovski

agi artificial-intelligence ai-safety agi-safety alignment

jehumtine / lawlia

ai-safety,LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence

User: jehumtine

agents ai computational-law computational-linguistics large-language-models law legal-agent legal-framework legal-system legal-analysis

jphall663 / awesome-machine-learning-interpretability

ai-safety,A curated list of awesome responsible machine learning resources.

User: jphall663

fairness xai interpretability transparency machine-learning data-science python r awesome awesome-list

lancopku / avg-avg

ai-safety,[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Organization: lancopku

ai-safety natural-language-processing ood-detection robust-machine-learning trustworthy-machine-learning

lancopku / dan

ai-safety,[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Organization: lancopku

ai-safety backdoor-attacks backdoor-defense natural-language-processing

lets-make-safe-ai / make-safe-ai

ai-safety,How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

User: lets-make-safe-ai

agi ai ai-safety artificial-general-intelligence artificial-intelligence ai-alignment

luanademi / toumei

ai-safety,An interpretability library for pytorch

User: luanademi

Home Page: https://luanademi.github.io/toumei/

interpretability python pytorch feature-visualization deep-learning transformer modularity ai-safety

mccaffary / agi-safety-governance-practices

ai-safety,Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"

User: mccaffary

ai-governance ai-safety artificial-intelligence artificial-intelligence-governance artificial-intelligence-safety expert-survey

megvii-research / fssd_ood_detection

ai-safety,Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)

Organization: megvii-research

ood-detection anomaly-detection ai-safety out-of-distribution-detection anomaly

microsoft / safenlp

ai-safety,Safety Score for Pre-Trained Language Models

Organization: microsoft

ai-safety fairness-ai nlp

nkluge-correa / aira

ai-safety,Aira is a series of chatbots developed as an experimentation playground for value alignment.

User: nkluge-correa

Home Page: https://nkluge-correa.github.io/Aira/

ai chatbot alignment ai-safety language-model natural-language-processing

normster / llm_rules

ai-safety,RuLES: a benchmark for evaluating rule-following in language models

User: normster

Home Page: https://eecs.berkeley.edu/~normanmu/llm_rules

ai-security gpt-4 ai-safety

ongov / ai-principles

ai-safety,Alpha principles for the ethical use of AI and Data Driven Technologies in Ontario | Proposition de principes pour une utilisation éthique des technologies axées sur les données en Ontario

Organization: ongov

government open-government ai artifical-intelligence ethical-artificial-intelligence ai-safety ml data-driven-decisions machine-learning

pair-code / farsight

ai-safety,In situ interactive widgets for responsible AI 🌱

Organization: pair-code

Home Page: https://pair-code.github.io/farsight/

ai ai-safety chatgpt chrome-extension gemini gemini-pro gpt-4 jupyter-notebook llm notebook responsible-ai

phelps-sg / llm-cooperation

ai-safety,Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

User: phelps-sg

economics experimental-economics gametheory gpt-3 llm prisoners-dilemma ai-safety ai-alignment behavioral-economics gpt-4

pku-alignment / beavertails

ai-safety,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Organization: pku-alignment

Home Page: https://sites.google.com/view/pku-beavertails

ai-safety human-feedback human-feedback-data language-model large-language-model llm llms rlhf safe-rlhf safety

pku-alignment / safe-rlhf

ai-safety,Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Organization: pku-alignment

Home Page: https://pku-beaver.github.io

ai-safety alpaca datasets deepspeed large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback

pku-yuangroup / hallucination-attack

ai-safety,Attack to induce LLMs within hallucinations

Organization: pku-yuangroup

Home Page: http://arxiv.org/abs/2310.01469

adversarial-attacks llm hallucinations machine-learning nlp llm-safety ai-safety deep-learning

riceissa / aiwatch

ai-safety,Website to track people, organizations, and products (tools, websites, etc.) in AI safety

User: riceissa

Home Page: https://aiwatch.issarice.com/

aisafety ai-safety php database dataset data-portal ai-alignment mysql

romaingrx / second-order-jailbreak

ai-safety,NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.

User: romaingrx

Home Page: https://second-order-jailbreak.romaingrx.com

ai-safety multi-agents