Abstract: Video-based Human Activity Recognition (VHAR) is a core task in computer vision with a wide range of applications in healthcare, surveillance, and human–robot interaction. Traditional VHAR ...
Abstract: Multimodal relation extraction (MRE) aims at predicting the semantic relation between two entities given a hybrid context of a text and its related image. Though existing MRE methods have ...