TP2: Vision, Language and Multimodal Challenges
How to create AI agents capable of perception in real, complex environments with multiple combined modalities (text, speech, images, video, …).
How to create AI agents capable of perception in real, complex environments with multiple combined modalities (text, speech, images, video, …).