Pham, L., Lam, P., Nguyen, T., Tang, H., & Schindler, A. (2024). A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection). arXiv preprint arXiv:2407.03110. Retrieved from https://arxiv.org/abs/2407.03110.