Moshi is a speech-text foundation model and full-duplex spoken dialogue framework.
Open Source framework for voice and multimodal conversational AI
Robust Speech Recognition via Large-Scale Weak Supervision