OmniDenseCap: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Publication
In The International Conference on Machine Learning