NLPExplorer

ALVR - 2024

Total Papers:- 19

Total Papers accross all years:- 35

Total Citations :- 0

1 2 »

How and where does CLIP process negation?

Vincent Quantmeyer | Pablo Mosteiro | Albert Gatt |

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-Tailed Multi-Label Visual Recognition

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

Malvina Nikandrou | Georgios Pantazopoulos | Ioannis Konstas | Alessandro Suglia |

WISMIR3: A Multi-Modal Dataset to Challenge Text-Image Retrieval Approaches

Florian Schneider | Chris Biemann |

Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Gregor Geigle | Abhay Jain | Radu Timofte | Goran Glavaš |

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool

Wiki-VEL: Visual Entity Linking for Structured Data on Wikimedia Commons

Improving Vision-Language Cross-Lingual Transfer with Scheduled Unfreezing

Max Reinhardt | Gregor Geigle | Radu Timofte | Goran Glavaš |

Causal and Temporal Inference in Visual Question Generation by Utilizing Pre-trained Models

Zhanghao Hu | Frank Keller |

VerbCLIP: Improving Verb Understanding in Vision-Language Models with Compositional Structures

Hadi Wazni | Kin Lo | Mehrnoosh Sadrzadeh |

English-to-Japanese Multimodal Machine Translation Based on Image-Text Matching of Lecture Videos