Topic: multimodal-deep-learning Goto Github
Some thing interesting about multimodal-deep-learning
Some thing interesting about multimodal-deep-learning
multimodal-deep-learning,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Organization: alibabaresearch
multimodal-deep-learning,Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
User: ankurderia
multimodal-deep-learning,The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
Organization: bentoml
Home Page: https://bentoml.com
multimodal-deep-learning,An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Organization: cap-ntu
multimodal-deep-learning,TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
User: david-yoon
Home Page: https://arxiv.org/abs/1810.04635
multimodal-deep-learning,CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
User: davidhuji
multimodal-deep-learning,A comprehensive reading list for Emotion Recognition in Conversations
Organization: declare-lab
multimodal-deep-learning,This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Organization: declare-lab
multimodal-deep-learning,This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
Organization: declare-lab
multimodal-deep-learning,As a part of HAKE project (HAKE-3D). Code for our CVPR2020 paper "Detailed 2D-3D Joint Representation for Human-Object Interaction".
User: dirtyharrylyl
multimodal-deep-learning,[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
User: drprojects
multimodal-deep-learning,收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
User: dwctod
multimodal-deep-learning,收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
User: dwctod
multimodal-deep-learning,Deep learning based content moderation from text, audio, video & image input modalities.
User: fcakyon
Home Page: https://arxiv.org/abs/2212.04533
multimodal-deep-learning,A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
User: florencejt
Home Page: https://fusilli.readthedocs.io/en/latest/
multimodal-deep-learning,A curated list of awesome vision and language resources for earth observation.
Organization: geoaigroup
Home Page: https://geogroup.ai/
multimodal-deep-learning,Implementation of CVPR 2020 paper "MMTM: Multimodal Transfer Module for CNN Fusion"
User: haamoon
multimodal-deep-learning,List of academic resources on Multimodal ML for Music
User: ilaria-manco
multimodal-deep-learning,A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
User: jianghaojun
multimodal-deep-learning,A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
User: jianghaojun
multimodal-deep-learning,A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
User: jrzaurin
multimodal-deep-learning,[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
User: kimmeen
Home Page: https://arxiv.org/abs/2310.01728
multimodal-deep-learning,Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal-deep-learning,Towards Generalist Biomedical AI
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal-deep-learning,My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal-deep-learning,Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal-deep-learning,Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal-deep-learning,[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
User: leaplabthu
Home Page: https://arxiv.org/abs/2203.08481
multimodal-deep-learning,Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Organization: mahmoodlab
multimodal-deep-learning,Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Organization: milvlg
Home Page: https://arxiv.org/abs/2303.01903
multimodal-deep-learning,This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Organization: mmmu-benchmark
Home Page: https://mmmu-benchmark.github.io/
multimodal-deep-learning,Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
Organization: om-ai-lab
multimodal-deep-learning,Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
User: omriav
Home Page: https://omriavrahami.com/blended-latent-diffusion-page/
multimodal-deep-learning,Recent Advances in Vision and Language Pre-training (VLP)
User: phellonchen
multimodal-deep-learning,Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
User: referit3d
Home Page: https://referit3d.github.io/
multimodal-deep-learning,A collection of resources on applications of multi-modal learning in medical imaging.
User: richard-peng-xia
multimodal-deep-learning,Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation" (CVPR 2024)
Organization: sail-sg
Home Page: https://zhongshsh.github.io/CLoT
multimodal-deep-learning,LAVIS - A One-stop Library for Language-Vision Intelligence
Organization: salesforce
multimodal-deep-learning,The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
User: shamanez
multimodal-deep-learning,Multimodal Sarcasm Detection Dataset
User: soujanyaporia
Home Page: https://www.aclweb.org/anthology/P19-1455/
multimodal-deep-learning,Reference mapping for single-cell genomics
Organization: theislab
Home Page: https://docs.scarches.org/en/latest/
multimodal-deep-learning,awesome grounding: A curated list of research papers in visual grounding
User: theshadow29
multimodal-deep-learning,A Tool for extracting multimodal features from videos.
Organization: thuiar
multimodal-deep-learning,This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
User: vijayvee
multimodal-deep-learning,End-to-end Training for Multimodal Recommendation Systems
Organization: westlake-repl
multimodal-deep-learning,Paper List of Pre-trained Foundation Recommender Models
Organization: westlake-repl
multimodal-deep-learning,Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
User: yuangongnd
multimodal-deep-learning,Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
User: yuewang-cuhk
multimodal-deep-learning,A Survey on multimodal learning research.
User: yutong-zhou-cv
multimodal-deep-learning,(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
User: yutong-zhou-cv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.