SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization

Tan-Hanh Pham | Le Hoang Nam | Phu-Vinh Nguyen | Chris Ngo | Truong-Son Hy |

Paper Details:

Month: November
Year: 2025
Location: Suzhou, China
Venue: EMNLP |

Citations

URL

No Citations Yet

Field Of Study