Git Product home page Git Product logo

awesome-reasoning-foundation-models's Introduction

Awesome-Reasoning-Foundation-Models

Awesome DOI arXiv

overview

survey.pdf | A curated list of awesome large AI models, or foundation models, for reasoning.

We organize the current foundation models into three categories: language foundation models, vision foundation models, and multimodal foundation models. Further, we elaborate the foundation models in reasoning tasks, including commonsense, mathematical, logical, causal, visual, audio, multimodal, agent reasoning, etc. Reasoning techniques, including pre-training, fine-tuning, alignment training, mixture of experts, in-context learning, and autonomous agent, are also summarized.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute! See CONTRIBUTING.

Table of Contents

table of contents

0 Survey

overview

This repository is primarily based on the following paper:

A Survey of Reasoning with Foundation Models

Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, and Zhenguo Li

If you find this repository helpful, please consider citing:

@article{sun2023survey,
  title={A Survey of Reasoning with Foundation Models},
  author={Sun, Jiankai and Zheng, Chuanyang and Xie, Enze and Liu, Zhengying and Chu, Ruihang and Qiu, Jianing and Xu, Jiaqi and Ding, Mingyu and Li, Hongyang and Geng, Mengzhe and others},
  journal={arXiv preprint arXiv:2312.11562},
  year={2023}
}

1 Relevant Surveys and Links

relevant surveys

(Back-to-Top)

  • Combating Misinformation in the Age of LLMs: Opportunities and Challenges - [arXiv] [Link]

  • The Rise and Potential of Large Language Model Based Agents: A Survey - [arXiv] [Link]

  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [arXiv] [Tutorial]

  • A Survey on Multimodal Large Language Models - [arXiv] [Link]

  • Interactive Natural Language Processing - [arXiv] [Link]

  • A Survey of Large Language Models - [arXiv] [Link]

  • Self-Supervised Multimodal Learning: A Survey - [arXiv] [Link]

  • Large AI Models in Health Informatics: Applications, Challenges, and the Future - [arXiv] [Paper] [Link]

  • Towards Reasoning in Large Language Models: A Survey - [arXiv] [Paper] [Link]

  • Reasoning with Language Model Prompting: A Survey - [arXiv] [Paper] [Link]

  • Awesome Multimodal Reasoning - [Link]

2 Foundation Models

foundation models

(Back-to-Top)

foundation_models

Table of Contents - 2

foundation models (table of contents)

(Back-to-Top)

2.1 Language Foundation Models

LFMs

Foundation Models (Back-to-Top)


2.2 Vision Foundation Models

VFMs

Foundation Models (Back-to-Top)


2.3 Multimodal Foundation Models

MFMs

Foundation Models (Back-to-Top)


2.4 Reasoning Applications

reasoning applications

Foundation Models (Back-to-Top)


3 Reasoning Tasks

reasoning tasks

(Back-to-Top)

Table of Contents - 3

reasoning tasks (table of contents)

3.1 Commonsense Reasoning

commonsense reasoning

Reasoning Tasks (Back-to-Top)


3.1.1 Commonsense Question and Answering (QA)

3.1.2 Physical Commonsense Reasoning

3.1.3 Spatial Commonsense Reasoning

3.1.x Benchmarks, Datasets, and Metrics


3.2 Mathematical Reasoning

mathematical reasoning

Reasoning Tasks (Back-to-Top)


3.2.1 Arithmetic Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.2 Geometry Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.3 Theorem Proving

Mathematical Reasoning (Back-to-Top)

3.2.4 Scientific Reasoning

Mathematical Reasoning (Back-to-Top)

3.2.x Benchmarks, Datasets, and Metrics

Mathematical Reasoning (Back-to-Top)


3.3 Logical Reasoning

logical reasoning

Reasoning Tasks (Back-to-Top)


3.3.1 Propositional Logic

  • 2022/09 | Propositional Reasoning via Neural Transformer Language Models - [Paper]

3.3.2 Predicate Logic

3.3.x Benchmarks, Datasets, and Metrics


3.4 Causal Reasoning

causal reasoning

Reasoning Tasks (Back-to-Top)


3.4.1 Counterfactual Reasoning

3.4.x Benchmarks, Datasets, and Metrics


3.5 Visual Reasoning

visual reasoning

Reasoning Tasks (Back-to-Top)


3.5.1 3D Reasoning

3.5.x Benchmarks, Datasets, and Metrics


3.6 Audio Reasoning

audio reasoning

Reasoning Tasks (Back-to-Top)


3.6.1 Speech

3.6.x Benchmarks, Datasets, and Metrics


3.7 Multimodal Reasoning

multimodal reasoning

Reasoning Tasks (Back-to-Top)


3.7.1 Alignment

3.7.2 Generation

3.7.3 Multimodal Understanding

3.7.x Benchmarks, Datasets, and Metrics


3.8 Agent Reasoning

agent reasoning

Reasoning Tasks (Back-to-Top)


3.8.1 Introspective Reasoning

3.8.2 Extrospective Reasoning

3.8.3 Multi-agent Reasoning

3.8.4 Driving Reasoning

3.8.x Benchmarks, Datasets, and Metrics


3.9 Other Tasks and Applications

other tasks and applications

Reasoning Tasks (Back-to-Top)

3.9.1 Theory of Mind (ToM)

3.9.2 LLMs for Weather Prediction

  • 2022/09 | MetNet-2 | Deep learning for twelve hour precipitation forecasts - [Paper]

  • 2023/07 | Pangu-Weather | Accurate medium-range global weather forecasting with 3D neural networks - [Paper]

3.9.3 Abstract Reasoning

3.9.4 Defeasible Reasoning

3.9.5 Medical Reasoning

  • 2024/01 | CheXagent / CheXinstruct / CheXbench | Chen et al. citations Star
    CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
    [arXiv] [paper] [code] [project] [huggingface]

  • 2024/01 | EchoGPT | Chao et al. citations
    EchoGPT: A Large Language Model for Echocardiography Report Summarization
    [medRxiv] [paper]

  • 2023/10 | GPT4V-Medical-Report | Yan et al. citations Star
    Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
    [arXiv] [paper] [code]

  • 2023/10 | VisionFM | Qiu et al. citations
    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
    [arXiv] [paper]

  • 2023/09 | Yang et al. citations
    The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
    [arXiv] [paper]

  • 2023/09 | RETFound | Zhou et al., Nature citations Star
    A foundation model for generalizable disease detection from retinal images
    [paper] [code]

  • 2023/08 | ELIXR | Xu et al. citations
    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
    [arXiv] [paper]

  • 2023/07 | Med-Flamingo | Moor et al. citations Star
    Med-Flamingo: a Multimodal Medical Few-shot Learner
    [arXiv] [paper] [code]

  • 2023/07 | Med-PaLM M | Tu et al. citations Star
    Towards Generalist Biomedical AI
    [arXiv] [paper] [code]

  • 2023/06 | Endo-FM | Wang et al., MICCAI 2023 citations Star
    Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
    [arXiv] [paper] [code]

  • 2023/06 | XrayGPT | Thawkar et al. citations Star
    XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models
    - [arXiv] [paper] [code]

  • 2023/06 | LLaVA-Med | Li et al., NeurIPS 2023 citations Star
    LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
    [arXiv] [paper] [code]

  • 2023/05 | HuatuoGPT | Zhang et al., Findings of EMNLP 2023 citations Star
    HuatuoGPT, Towards Taming Language Model to Be a Doctor
    [arXiv] [paper] [code]

  • 2023/05 | Med-PaLM 2 | Singhal et al. citations
    Towards Expert-Level Medical Question Answering with Large Language Models
    [arXiv] [paper]

  • 2022/12 | Med-PaLM / MultiMedQA / HealthSearchQA | Singhal et al., Nature citations
    Large Language Models Encode Clinical Knowledge
    [arXiv] [paper]

3.9.6 Bioinformatics Reasoning

3.9.7 Long-Chain Reasoning


4 Reasoning Techniques

reasoning techniques

(Back-to-Top)

Table of Contents - 4

reasoning techniques (table of contents)

4.1 Pre-Training

pre-training

Reasoning Techniques (Back-to-Top)

4.1.1 Data

a. Data - Text
b. Data - Image
c. Data - Multimodality

4.1.2 Network Architecture

a. Encoder-Decoder
b. Decoder-Only
c. CLIP Variants
d. Others

4.2 Fine-Tuning

fine-tuning

Reasoning Techniques (Back-to-Top)

4.2.1 Data

4.2.2 Parameter-Efficient Fine-tuning

a. Adapter Tuning
b. Low-Rank Adaptation
c. Prompt Tuning
d. Partial Parameter Tuning
e. Mixture-of-Modality Adaption

4.3 Alignment Training

alignment training

Reasoning Techniques (Back-to-Top)

4.3.1 Data

a. Data - Human
b. Data - Synthesis

4.3.2 Training Pipeline

a. Online Human Preference Training
b. Offline Human Preference Training

4.4 Mixture of Experts (MoE)

mixture of experts

Reasoning Techniques (Back-to-Top)


4.5 In-Context Learning

in-context learning

Reasoning Techniques (Back-to-Top)


4.5.1 Demonstration Example Selection

a. Prior-Knowledge Approach
b. Retrieval Approach

4.5.2 Chain-of-Thought

a. Zero-Shot CoT
b. Few-Shot CoT
c. Multiple Paths Aggregation

4.5.3 Multi-Round Prompting

a. Learned Refiners
b. Prompted Refiners

4.6 Autonomous Agent

autonomous agent

Reasoning Techniques (Back-to-Top)


awesome-reasoning-foundation-models's People

Contributors

chuanyang-zheng avatar jiaqixuac avatar reasoning-survey avatar sparkjiao avatar thanhtoantnt avatar xf-zhao avatar zhimin-z avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

awesome-reasoning-foundation-models's Issues

Add two relevant papers

Hi, thanks for your excellent survey.

Please consider adding two relevant papers to your repository and paper:

[1] Title: Making Large Language Models Better Reasoners with Alignment
Link: https://arxiv.org/pdf/2309.02144.pdf

This paper proposes a constrained preference alignment method to improve the reasoning ability of LLMs.

[2] Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning
Link: https://arxiv.org/pdf/2312.08935.pdf
This paper proposes a framework to automatical construct the training dataset of process reward models.

Thank you for your consideration. 😊

Two related references on LLM-generated misinformation

Congratulations on your recent solid survey paper! I am impressed by the depth and comprehensiveness of the survey paper.

I would greatly appreciate it if you could consider citing our work [1][2] in LLMs can contribute to the dissemination of misinformation, both intentionally and unintentionally in "Section 6.2 Interpretability and Transparency", or the "Hallucinations" section of "Section 5 Discussion: Challenges, Limitations, and Risks", or Various intended attacks have been identified, including the ... disinformation in "Section 6.1 Safety and Privacy"

You could also check out our project website: https://llm-misinformation.github.io/ Thanks a lot!

[1] Combating Misinformation in the Age of LLMs: Opportunities and Challenges https://arxiv.org/abs/2311.05656

  • TL;DR: A survey of the oppotunities (can we utilize LLMs to combat misinformation) and challenges (how to combat LLM-generated misinformation) of combating misinformation in the age of LLMs.
  • abstract: Misinformation such as fake news and rumors is a serious threat on information ecosystems and public trust. The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of combating misinformation. Generally, LLMs can be a double-edged sword in the fight. On the one hand, LLMs bring promising opportunities for combating misinformation due to their profound world knowledge and strong reasoning abilities. Thus, one emergent question is: how to utilize LLMs to combat misinformation? On the other hand, the critical challenge is that LLMs can be easily leveraged to generate deceptive misinformation at scale. Then, another important question is: how to combat LLM-generated misinformation? In this paper, we first systematically review the history of combating misinformation before the advent of LLMs. Then we illustrate the current efforts and present an outlook for these two fundamental questions respectively. The goal of this survey paper is to facilitate the progress of utilizing LLMs for fighting misinformation and call for interdisciplinary efforts from different stakeholders for combating LLM-generated misinformation.

[2] Can LLM-Generated Misinformation Be Detected? https://arxiv.org/abs/2309.13788

  • TL;DR: We discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm.
  • abstract: The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.

From Survey Authors: Welcome Community Contribution

Dear Community,
Hope this finds you all well.

After upload the survey, we have received precious comments and suggestions. We sincerely thank your all for your effort and attention to our work. The following is our plan to integrate the valuable suggestions to our survey. The details of contribution ways to the work are at Contributing

  • Paper Regular Update Time: the first day of every month, such as 1st Dec, 1st Jan and so on.
  • If there any some papers that we missed, please not hesitate to tell us:
    • The title of the paper.
    • (Optional) The paper link
    • (Optional) The paper abstract
    • (Optional) Which section or part is suitable for the paper
  • If there is any other suggestion or comments, please also let us know.

Again, thank you very much for your attention to our work. Thank you very much for your precious suggestion, and wish you a good day.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.