Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How It Works

Nightingale’s pipeline transforms any audio or video file into a karaoke experience through several stages.

Pipeline Overview

flowchart TD
    A["🎵 Audio or video file"] --> B["UVR Karaoke / Demucs"]
    A2["🎼 USDX bundle (.txt / .usdx)"] --> E["Tauri App (Rust + React)"]
    B --> |"vocals + instrumental"| C["LRCLIB"]
    C --> |"synced lyrics if available"| D["WhisperX (large-v3) or Parakeet v3 (exp.)"]
    D --> |"word-level alignment, CJK readings"| E
    E --> F["🎤 Plays instrumental + synced lyrics\nwith pitch scoring, key/tempo controls,\nmic mirroring, and audio-reactive backgrounds"]

USDX bundles bypass stem separation and transcription entirely — the .txt is parsed into a transcript JSON shaped exactly like the analyzer cache, so playback reuses the existing pipeline. See UltraStar Deluxe.

Analyzer Server

The analyzer is a long-lived Python process that Nightingale spawns once on startup and talks to over a token-authenticated loopback TCP socket using newline-delimited JSON (NDJSON). Per-song startup costs (model load, CUDA init, Python imports) are paid once at boot, after which analyze requests stream progress events and complete with done or error messages. This makes back-to-back analyses noticeably faster than the previous per-song subprocess model.

Caching

Analysis results are cached in your configured data folder (cache/) using blake3 file hashes. Re-analysis only happens if the source file changes, if you trigger it manually, or when creating shifted key/tempo variants.

Hardware Acceleration

The Python analyzer uses PyTorch and auto-detects the best backend:

BackendDeviceNotes
CUDANVIDIA GPUFastest
MPSApple SiliconmacOS; WhisperX alignment falls back to CPU
CPUAnySlowest but always works

The UVR Karaoke model uses ONNX Runtime and enables CUDA acceleration automatically on NVIDIA GPUs, or CoreML on Apple Silicon.

A song typically takes 2–5 minutes on GPU, 10–20 minutes on CPU.