ViDA-MAN: Visual Dialog with Digital Human

Tong Shen 1
Jiawei Zuo 1
Fan Shi 1
Jin Zhang 2
Liqin Jiang 2
Meng Chen 1
Zhengchen Zhang 1
Wei Zhang 1
Xiaodong He 1
Tao Mei 1

1JD AI Research
2Migu Culture Technology

ACM-MM 2021, Best Demo





We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. Compared to traditional text or voice-based system, ViDA-MAN offers human-like interactions (e.g, vivid voice, natural facial expression and body gestures). Given a speech request, the demonstration is able to response with high quality videos in sub-second latency. To deliver immersive user experience, ViDA-MAN seamlessly integrates multi-modal techniques including Acoustic Speech Recognition (ASR), multi-turn dialog, Text To Speech (TTS), talking heads video generation. Backed with large knowledge base, ViDA-MAN is able to chat with users on a number of topics including chit-chat, weather, device control, News recommendations, booking hotels, as well as answering questions via structured knowledge.


Paper

Tong Shen, Jiawei Zuo, Fan Shi, Jin Zhang, Liqin Jiang
Meng Chen, Zhengchen Zhang, Wei Zhang, Xiaodong He, Tao Mei

ViDA-MAN: Visual Dialog with Digital Human

ACM-MM 2021, Best Demo

[Arxiv Paper]
[Bibtex Citation]
[Video Presentation]

Results