

M-VADER
A Model for Diffusion with Multimodal Context
Microsoft Kosmos-1
A Multimodal Large Language Model
Jina AI
Build multimodal AI services via cloud native technologies
WIT By Google AI
GitHub - google-research-datasets/wit: WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
TOP