Skip to content
SPS SLTC/AASP Techical committee webinar

Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

23 July 2024, 1:00 PM - 2:30 PM (EDT)
Takuya Yoshioka  
Presented by Dr. Takuya Yoshioka


About this Topic:

Recognizing conversational speech involves processing multi-talker human-to-human communications. It requires overcoming various challenges resulting from dealing with natural conversations, promoting progress in various topics, including speech separation, end-to-end modeling, speaker diarization, and utilizing self-supervised models, to name a few.

This webinar will introduce recent research advances in these domains as well as insights gained from applications of these methods to real-world commercial scenarios.

About the Presenter:

Takuya Yoshioka received the B.Eng., M.Inf., and Ph.D. degrees in informatics from Kyoto University, Kyoto, Japan, in 2004, 2006, and 2010, respectively. He has been the Director of Research at Assembly AI Inc., US, since 2023, leading the company's model and algorithm development efforts, encompassing ASR, speaker diarization, and NLP.

Prior to joining AssemblyAI, he led a research team at Microsoft Azure Cognitive Services Research, developing technologies for speech enhancement, speech generation, meeting transcription, and speaker diarization. Before this role, he conducted research in speech processing at Microsoft Research and NTT Communication Science Laboratories for more than 10 years.

Dr. Yoshioka received the Conference Best Paper Award for Industry from IEEE SPS in 2022 and led a winning team of the CHiME-3 Challenge in 2015.

Want to learn more about upcoming events & webinars?

Visit the events section of the Signal Processing website to see all upcoming lectures, workshops, webinars, and more.