AI Vision Companion as F5-TTS fork for convenience. This repository features an AI vision companion/assistant that merges visual input capture with audio transcription and synthesis through various ...
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference. E2 TTS: Flat-UNet Transformer, closest reproduction from paper.