Topic: multimodal-deep-learning Goto Github

Some thing interesting about multimodal-deep-learning

👇 Here are 329 public repositories matching this topic...

alibabaresearch / advancedliteratemachinery

multimodal-deep-learning,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Organization: alibabaresearch

artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer

ankurderia / mft

multimodal-deep-learning,Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

User: ankurderia

deep-learning hsi-classification multimodal-datasets multimodal-deep-learning remote-sensing transformer-models hyperspectral-image-classification

bentoml / bentoml

multimodal-deep-learning,The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

Organization: bentoml

Home Page: https://bentoml.com

ai ai-infra bentoml deep-learning generative-ai inference-api kubernetes llmops lmops machine-learning microservices ml-platform mlops model-deployment model-inference model-management model-serving multimodal-deep-learning

cap-ntu / video-to-retail-platform

multimodal-deep-learning,An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

Organization: cap-ntu

deep-neural-networks deep-learning ai-system network-server machine-learning multimedia multimodal-deep-learning

david-yoon / multimodal-speech-emotion

multimodal-deep-learning,TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18

User: david-yoon

Home Page: https://arxiv.org/abs/1810.04635

speech-emotion-recognition paralinguistics multimodal-deep-learning

davidhuji / capdec

multimodal-deep-learning,CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

User: davidhuji

captioning clip clipcap gpt-2 multimodal-deep-learning zero-shot-learning

declare-lab / awesome-emotion-recognition-in-conversations

multimodal-deep-learning,A comprehensive reading list for Emotion Recognition in Conversations

Organization: declare-lab

emotion-recognition dialogue-systems natural-language-processing multimodal-deep-learning multimodal-interactions emotion-recognition-in-conversation conversational-ai

declare-lab / multimodal-deep-learning

multimodal-deep-learning,This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Organization: declare-lab

multimodal-deep-learning multimodal-learning multimodal-interactions multimodal-sentiment-analysis

declare-lab / multimodal-infomax

multimodal-deep-learning,This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Organization: declare-lab

multimodal-deep-learning multimodal-fusion multimodal-sentiment-analysis

dirtyharrylyl / dj-rn

multimodal-deep-learning,As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".

User: dirtyharrylyl

human-object-interaction 3d-reconstruction multimodal-deep-learning

drprojects / deepviewagg

multimodal-deep-learning,[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

User: drprojects

cvpr deep-learning image multimodal multimodal-deep-learning point-cloud pytorch semantic-segmentation cvpr2022 multi-view

dwctod / cvpr2024-papers-with-code-demo

multimodal-deep-learning,收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!

User: dwctod

cvpr2021 cvpr computer-vision cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything

dwctod / eccv2022-papers-with-code-demo

multimodal-deep-learning,收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！

User: dwctod

ai computer-vision cv dataset diffusion eccv eccv2022 face-recognition image-segmentation multimodal-deep-learning nerf objection-detection vision-transformer

fcakyon / content-moderation-deep-learning

multimodal-deep-learning,Deep learning based content moderation from text, audio, video & image input modalities.

User: fcakyon

Home Page: https://arxiv.org/abs/2212.04533

content-moderation content-ratings genre-classification movie-trailer nudity-detection profanity-detection violence-detection multimodal-deep-learning movie-content-filter nsfw-recognition

florencejt / fusilli

multimodal-deep-learning,A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

User: florencejt

Home Page: https://fusilli.readthedocs.io/en/latest/

attention-mechanism cnn data-fusion graph-neural-network imaging machine-learning multi-view multi-view-learning multimodal multimodal-deep-learning multimodality multivariate-analysis pytorch pytorch-lightning variational-autoencoder

geoaigroup / awesome-vision-language-models-for-earth-observation

multimodal-deep-learning,A curated list of awesome vision and language resources for earth observation.

Organization: geoaigroup

Home Page: https://geogroup.ai/

awesome awesome-list earth-observation multimodal-deep-learning remote-sensing vision-and-language

haamoon / mmtm

multimodal-deep-learning,Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"

User: haamoon

multimodal-deep-learning multimodal-learning cnn-fusion pytorch action-recognition speech-enhancement gesture-recognition

ilaria-manco / multimodal-ml-music

multimodal-deep-learning,List of academic resources on Multimodal ML for Music

User: ilaria-manco

academic-publications awesome-list multimodal-data multimodal-learning music-ai music-information-retrieval resources music-research multimodal-deep-learning

jianghaojun / awesome-3d-vision-and-language

multimodal-deep-learning,A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

User: jianghaojun

3d-deep-learning computer-vision multimodal-deep-learning natural-language-processing point-cloud visual-grounding awesome 3d-vision-and-language

jianghaojun / awesome-parameter-efficient-transfer-learning

multimodal-deep-learning,A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

User: jianghaojun

computer-vision deep-learning machine-learning multimodal-deep-learning parameter-efficient-learning parameter-efficient-tuning transfer-learning

jrzaurin / pytorch-widedeep

multimodal-deep-learning,A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

User: jrzaurin

pytorch tabular-data text images multimodal-deep-learning pytorch-tabular-data pytorch-nlp pytorch-cv pytorch-transformers deep-learning

kimmeen / time-llm

multimodal-deep-learning,[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

User: kimmeen

Home Page: https://arxiv.org/abs/2310.01728

cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series

kyegomez / bitnet

multimodal-deep-learning,Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning

kyegomez / med-palm

multimodal-deep-learning,Towards Generalist Biomedical AI

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource

kyegomez / navit

multimodal-deep-learning,My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

vit attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality

kyegomez / pali3

multimodal-deep-learning,Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence autogpt gpt4 machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality

kyegomez / the-compiler

multimodal-deep-learning,Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

agora artficial-intelligence autogpt chatgpt prompt-engineering tree-of-thoughts chain-of-thought deep-learning deep-learning-algorithms multi-modal-fusion

leaplabthu / pseudo-q

multimodal-deep-learning,[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

User: leaplabthu

Home Page: https://arxiv.org/abs/2203.08481

computer-vision visual-grounding cvpr2022 deep-learning pytorch multimodal-deep-learning vision-and-language

mahmoodlab / mcat

multimodal-deep-learning,Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Organization: mahmoodlab

early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology

milvlg / prophet

multimodal-deep-learning,Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Organization: milvlg

Home Page: https://arxiv.org/abs/2303.01903

a-okvqa gpt-3 multimodal-deep-learning okvqa prompt-engineering pytorch visual-question-answering

mmmu-benchmark / mmmu

multimodal-deep-learning,This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Organization: mmmu-benchmark

Home Page: https://mmmu-benchmark.github.io/

computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning

om-ai-lab / vl-checklist

multimodal-deep-learning,Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.

Organization: om-ai-lab

evaluation-metrics multimodal-deep-learning vision-and-language

omriav / blended-latent-diffusion

multimodal-deep-learning,Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

User: omriav

Home Page: https://omriavrahami.com/blended-latent-diffusion-page/

deep-learning multimodal multimodal-deep-learning text-guided-manipulation text-to-image text-to-image-synthesis computer-vision diffusion diffusion-models generative-model

phellonchen / awesome-vision-and-language-pre-training

multimodal-deep-learning,Recent Advances in Vision and Language Pre-training (VLP)

User: phellonchen

vision-and-language-pre-training vision-and-language pretraining multimodal-deep-learning vlp

referit3d / referit3d

multimodal-deep-learning,Code accompanying our ECCV-2020 paper on 3D Neural Listeners.

User: referit3d

Home Page: https://referit3d.github.io/

deep-learning multimodal-deep-learning language-modeling rgbd computer-vision computer-graphics

richard-peng-xia / awesome-multimodal-in-medical-imaging

multimodal-deep-learning,A collection of resources on applications of multi-modal learning in medical imaging.

User: richard-peng-xia

medical-imaging medical-report-generation multimodal-deep-learning multimodal-learning visual-question-answering large-language-models large-multimodal-models multimodal-large-language-models

sail-sg / clot

multimodal-deep-learning,Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation" (CVPR 2024)

Organization: sail-sg

Home Page: https://zhongshsh.github.io/CLoT

association humor-generation large-language-models leap-of-thought multimodal-deep-learning

salesforce / lavis

multimodal-deep-learning,LAVIS - A One-stop Library for Language-Vision Intelligence

Organization: salesforce

deep-learning deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets

shamanez / self-supervised-embedding-fusion-transformer

multimodal-deep-learning,The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.

User: shamanez

emotion-recognition self-supervised-learning bert multimodal-deep-learning multimodal-sentiment-analysis multimodal-emotion-recognition

soujanyaporia / mustard

multimodal-deep-learning,Multimodal Sarcasm Detection Dataset

User: soujanyaporia

Home Page: https://www.aclweb.org/anthology/P19-1455/

sarcasm-detection sarcasm multimodal-interactions multimodal-deep-learning

theislab / scarches

multimodal-deep-learning,Reference mapping for single-cell genomics

Organization: theislab

Home Page: https://docs.scarches.org/en/latest/

single-cell deep-learning rna-seq-analysis data-integration batch-correction multimodal-deep-learning multiomics single-cell-genomics scrna-seq human-cell-atlas

theshadow29 / awesome-grounding

multimodal-deep-learning,awesome grounding: A curated list of research papers in visual grounding

User: theshadow29

computer-vision natural-language-processing grounding awesome-list papers arxiv visual-grounding image-grounding video-understanding video-grounding

thuiar / mmsa-fet

multimodal-deep-learning,A Tool for extracting multimodal features from videos.

Organization: thuiar

multimodal-deep-learning multimodal-sentiment-analysis

vijayvee / video-captioning

multimodal-deep-learning,This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

User: vijayvee

video-captioning tensorflow s2vt sequence-to-sequence multimodal-deep-learning seq2seq

westlake-repl / idvs.morec

multimodal-deep-learning,End-to-end Training for Multimodal Recommendation Systems

Organization: westlake-repl

foundation-models foundation-recommendation-model image-recommendation multimodal multimodal-deep-learning multimodal-recommendation multimodal-recommendation-dataset text-recommendation transferable-recommendation modality-based-recommendation

westlake-repl / recommendation-systems-without-explicit-id-features-a-literature-review

multimodal-deep-learning,Paper List of Pre-trained Foundation Recommender Models

Organization: westlake-repl

chatgpt foundation-model llm llm4rec multimodal pre-training recommender-system transfer-learning chatgpt4rec cross-domainrecommendation

yuangongnd / cav-mae

multimodal-deep-learning,Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

User: yuangongnd

audio audio-processing computer-vision multimodal multimodal-deep-learning

yuewang-cuhk / awesome-vision-language-pretraining-papers

multimodal-deep-learning,Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

User: yuewang-cuhk

vision-and-language pretraining multimodal-deep-learning bert vl-ptms

yutong-zhou-cv / awesome-multimodality

multimodal-deep-learning,A Survey on multimodal learning research.

User: yutong-zhou-cv

awesome-list multimodal-deep-learning multimodality

yutong-zhou-cv / awesome-text-to-image

multimodal-deep-learning,(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

User: yutong-zhou-cv

awseome-list generative-adversarial-network image-generation image-manipulation image-synthesis multimodal multimodal-deep-learning survey text-to-face text-to-image

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.