Header

Search

3 Questions about DAVALab – project in the «DSI Infrastructures & Labs» series

DSI Infrastructures & Labs are shareable infrastructures or structural vessels for creating collaborative research environments related to digital transformation. Teodora Vuković briefly introduces DAVALab, one of the projects in this series.

What is the goal of DAVALab?
DAVALab (Digital Audio-Visual Annotation Lab) is a web-based infrastructure enabling automatic annotation of text, audio, and video data without programming skills. It provides the latest models for audiovisual and textual analysis in an accessible and scalable way, democratizing AI and empowering its users. Its basis is TIB-AV-A, cutting-edge software made at TIB Hannover. At UZH, we are upgrading it with state-of-the-art AI annotation models.

How does your platform analyze text, speech, and video data, and who can use it?
DAVALab provides advanced multimodal analysis through an accessible web interface. It includes speech transcription, speaker recognition, text-to-speech alignment, syntax,  and video analysis (face, pose, gaze, emotion). Researchers and students across disciplines can upload data, select annotations, view results in the app timeline view, or export for further analysis.

What potential do you see for research and teaching?
DAVALab lowers entry barriers by providing easy access to advanced AI tools that otherwise require programming skills. It enables reproducible multimodal research workflows from linguistics and digital humanities to psychology, law, film studies, and AI. It also supports hands-on courses on AI methods, and digital skills, using real-world speech and video data in a scalable infrastructure.

Learn more about DAVALab here.

All projects of the series «DSI Infrastructures & Labs» can be found here.

 

Teodora Vuković leads the Multimodal Technology Group at the University of Zurich (DSI / Department of Computational Linguistics / LiRI). She is the PI of DAVA Lab and develops AI-powered tools and infrastructures for automatic multimodal annotation and analysis of speech, text, and video data. Her work bridges NLP, computer vision, and research infrastructure.

Subpages