avinash2468 / medical-vqa Goto Github PK
View Code? Open in Web Editor NEWThis project forked from anipal/nlp-cs7650-project
A simple yet effective model for medical visual question answering by combining an image-based encoder (ConvNeXt) with a text encoder (BERT), matching SOTA performance (72% acc.) on the VQA-RAD dataset.