Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification