This project integrates the AudioLDM model into a web interface powered by ChatGPT and Gradio. It enables users to input text and receive generated audio—speech or sound effects—based on their prompts. It’s designed to streamline text-to-audio generation in a conversational, browser-based environment.
Key Features
ChatGPT-style conversational interface for text-to-audio generation
Generates both voice and musical/sound effect audio
Built on AudioLDM backend
Supports launching via local web server (Gradio-based)
Includes example scripts and notebook for experimentation
Easy setup: environment installation, script run, and browser launch
Stores generated outputs such as MP3 or WAV in local folders