Model Demo Page

Abstract

We introduce Hello-Chat, an end-to-end Large Audio Language Model (LALM) tailored for real-world conversational scenarios. The model achieves state-of-the-art performance on specific understanding benchmarks and significantly outperforms existing open-source systems in prosodic naturalness, emotional accuracy, and interaction fluency. By explicitly modeling fine-grained acoustic perception and cross-modal alignment, Hello-Chat enables realistic, context-aware spoken interaction between users and AI.

Contents

Abstract
Model Architecture
Audio Samples (Zero-Shot)
Dialogue Demo (Multi-Turn Zero-shot)

System Overview

Figure 1: The overall architecture of Hello-Chat.

Audio Samples

Single-turn audio synthesis grouped by reference speaker.

Loading assets/single_turn.json ...

Multi-Turn Conversation

Context-aware human-like conversation with unified text and speech generation (Tester ↔ Bot, Zero-shot).

Loading assets/dialogue.json ...