Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog (Rank 1st in DSTC8-AVSD Challenge)

Publication
Proceedings of the 8th Dialog System Technology Challenge Workshop in AAAI2020