Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How It Works

Nightingale’s pipeline transforms any audio or video file into a karaoke experience through several stages.

Pipeline Overview

flowchart TD
    A["🎵 Audio or video file"] --> B["UVR Karaoke / Demucs"]
    B --> |"vocals.ogg + instrumental.ogg"| C["LRCLIB"]
    C --> |"Fetches synced lyrics if available"| D["WhisperX (large-v3)"]
    D --> |"Transcription + word-level alignment"| E["Bevy App (Rust)"]
    E --> F["🎤 Plays instrumental + synced lyrics\nwith pitch scoring & backgrounds"]

Caching

Analysis results are cached at ~/.nightingale/cache/ using blake3 file hashes. Re-analysis only happens if the source file changes or is manually triggered from the UI.

Hardware Acceleration

The Python analyzer uses PyTorch and auto-detects the best backend:

BackendDeviceNotes
CUDANVIDIA GPUFastest
MPSApple SiliconmacOS; WhisperX alignment falls back to CPU
CPUAnySlowest but always works

The UVR Karaoke model uses ONNX Runtime and enables CUDA acceleration automatically on NVIDIA GPUs, or CoreML on Apple Silicon.

A song typically takes 2–5 minutes on GPU, 10–20 minutes on CPU.