Moshi is a speech-text foundation model and full-duplex spoken dialogue framework.
Cutting-edge AI speech for 5¢ per minute.
Robust Speech Recognition via Large-Scale Weak Supervision