Nightingale
Karaoke from any song in your music library, powered by neural networks.
Nightingale scans your music folder, separates lead vocals from instrumentals using the UVR Karaoke model (or Demucs), transcribes lyrics with word-level timestamps via WhisperX, and plays it all back with synchronized highlighting, pitch scoring, profiles, and dynamic backgrounds.
Ships as a single binary. No manual installation of Python, ffmpeg, or ML models required — everything is downloaded and bootstrapped automatically on first launch.

Key Features
- Stem Separation — isolates lead vocals from instrumentals
- Word-Level Lyrics — automatic transcription with alignment
- Pitch Scoring — real-time microphone input with star ratings
- Profiles — per-player score tracking
- Video Files — use video files with synchronized background playback
- 7 Background Themes — GPU shaders, Pixabay videos, source video
- Gamepad Support — full navigation via gamepad
- Self-Contained — zero manual dependency setup
Supported Platforms
| Platform | Target |
|---|---|
| Linux x86_64 | x86_64-unknown-linux-gnu |
| Linux aarch64 | aarch64-unknown-linux-gnu |
| macOS ARM | aarch64-apple-darwin |
| macOS Intel | x86_64-apple-darwin |
| Windows x86_64 | x86_64-pc-windows-msvc |
Getting Started
Installation
Download the latest release for your platform from the Releases page and run the binary.
Supported audio formats: .mp3, .flac, .ogg, .wav, .m4a, .aac, .wma.
Supported video formats: .mp4, .mkv, .avi, .webm, .mov, .m4v.
First Launch
On first launch, Nightingale will set up its environment automatically:
- Downloads ffmpeg — needed for audio/video processing
- Downloads uv — Python package manager
- Installs Python 3.10 — via uv, isolated from your system Python
- Creates virtual environment — with PyTorch, WhisperX, Demucs, and UVR models
- Downloads ML models — stem separation and transcription models
- Pre-downloads video backgrounds — Pixabay videos for the first session
This process takes a few minutes and shows a progress screen. After setup completes, Nightingale is ready to use.

Adding Music
When prompted, select your music folder. Nightingale will scan it for supported audio and video files. You can change the folder later in the settings.
Analysis
Before a song can be played as karaoke, it needs to be analyzed:
- Select a song from the library
- Analysis runs automatically (stem separation → lyrics → transcription)
- Results are cached — subsequent plays are instant
You can also queue multiple songs for batch analysis.

Force Re-setup
If something goes wrong with the vendor environment, you can force a fresh setup:
nightingale --setup
Controls
Nightingale supports both keyboard and gamepad input. The UI adapts to your input method automatically.
Navigation
| Action | Keyboard | Gamepad |
|---|---|---|
| Move | Arrow keys | D-pad / Left stick |
| Confirm / Select | Enter | A (South) |
| Back / Cancel | Escape | B (East) / Start |
| Switch panel | Tab | — |
| Search songs | Type to filter | — |
Playback
| Action | Keyboard | Gamepad |
|---|---|---|
| Pause / Resume | Space | Start |
| Exit to menu | Escape | B (East) |
| Toggle guide vocals | G | — |
| Guide volume up/down | + / - | — |
| Cycle background theme | T | — |
| Cycle video flavor | F | — |
| Toggle microphone | M | — |
| Next microphone | N | — |
| Toggle fullscreen | F11 | — |
| Skip Intro / Skip Outro | On-screen buttons | A (South) |
Gamepad Notes
- Full navigation of menus, song selection, and settings via gamepad
- D-pad and left stick both work for navigation
- Face buttons map to confirm (A/South) and cancel (B/East)
Keyboard & Gamepad Reference
This page provides a complete reference of all keyboard shortcuts and gamepad mappings.
Menu Navigation
In the main menu, sidebar, and settings screens:
- Arrow keys / D-pad: Move focus between items
- Enter / A button: Select the focused item
- Escape / B button: Go back or close overlays
- Tab: Switch between sidebar and main content area
- Type any text: Filter/search the song list
During Playback
While a song is playing:
- Space / Start: Pause or resume playback
- Escape / B: Exit back to the song menu
- G: Toggle guide vocals on/off
- + / -: Adjust guide vocal volume
- T: Cycle through background themes (shaders, video, source)
- F: Cycle through Pixabay video flavors (Nature, Underwater, Space, City, Countryside)
- M: Toggle microphone for pitch scoring
- N: Switch to the next available microphone
- F11: Toggle fullscreen
Skip Buttons
During playback, on-screen skip buttons appear for intro and outro sections. These can be activated with Enter/A or clicked.
How It Works
Nightingale’s pipeline transforms any audio or video file into a karaoke experience through several stages.
Pipeline Overview
flowchart TD
A["🎵 Audio or video file"] --> B["UVR Karaoke / Demucs"]
B --> |"vocals.ogg + instrumental.ogg"| C["LRCLIB"]
C --> |"Fetches synced lyrics if available"| D["WhisperX (large-v3)"]
D --> |"Transcription + word-level alignment"| E["Bevy App (Rust)"]
E --> F["🎤 Plays instrumental + synced lyrics\nwith pitch scoring & backgrounds"]
Caching
Analysis results are cached at ~/.nightingale/cache/ using blake3 file hashes. Re-analysis only happens if the source file changes or is manually triggered from the UI.
Hardware Acceleration
The Python analyzer uses PyTorch and auto-detects the best backend:
| Backend | Device | Notes |
|---|---|---|
| CUDA | NVIDIA GPU | Fastest |
| MPS | Apple Silicon | macOS; WhisperX alignment falls back to CPU |
| CPU | Any | Slowest but always works |
The UVR Karaoke model uses ONNX Runtime and enables CUDA acceleration automatically on NVIDIA GPUs, or CoreML on Apple Silicon.
A song typically takes 2–5 minutes on GPU, 10–20 minutes on CPU.
Stem Separation
Nightingale separates lead vocals from instrumentals so you can sing along to the backing track.
Models
UVR Karaoke (Default)
The UVR (Ultimate Vocal Remover) Karaoke model is optimized specifically for karaoke use. It preserves backing vocals in the instrumental track, giving a more natural karaoke experience. Uses ONNX Runtime for inference, with automatic CUDA (NVIDIA) or CoreML (Apple Silicon) acceleration.
Demucs
Demucs by Facebook Research provides an alternative separation model. You can switch between models in the settings.
Video Files
When processing video files (.mp4, .mkv, etc.), Nightingale first extracts the audio track using ffmpeg, then runs stem separation on the extracted audio. The original video is preserved for synchronized background playback.
Guide Vocals
After separation, you can control how much of the lead vocals bleed through the instrumental:
- Toggle: Press
Gto turn guide vocals on/off - Volume: Press
+/-to adjust the guide vocal level
This is useful for learning new songs or for singers who want a reference pitch.
Lyrics & Transcription
Nightingale provides word-level synchronized lyrics through two sources.
LRCLIB
LRCLIB is queried first for existing synced lyrics. When a match is found, lyrics are used directly without needing transcription. This is faster and often more accurate for well-known songs.
WhisperX Transcription
When LRCLIB doesn’t have lyrics for a song, Nightingale uses WhisperX with the large-v3 model to:
- Transcribe the isolated vocals into text
- Align each word to precise timestamps in the audio
This produces word-level timing information that drives the karaoke highlighting during playback.
Language Support
WhisperX supports a wide range of languages. The language is auto-detected from the audio. Nightingale includes CJK font support (Noto Sans CJK) for Chinese, Japanese, and Korean lyrics.
Highlighting
During playback, lyrics are displayed with word-by-word highlighting:
- Current word — highlighted in the accent color
- Sung words — shown in a completed state
- Upcoming words — shown in a dimmer color
- Next line — previewed below the current line
Pitch Scoring
Nightingale includes real-time pitch scoring to gamify the karaoke experience.
How It Works
- Microphone input — select your microphone and toggle it on with
M - Pitch detection — your vocal pitch is analyzed in real-time
- Comparison — your pitch is compared against the reference vocal track
- Scoring — accuracy is tracked throughout the song
Star Ratings
At the end of each song, you receive a star rating based on your overall pitch accuracy. Ratings are saved to your profile’s scoreboard.

Microphone Selection
- Press
Mto toggle the microphone on/off - Press
Nto cycle through available microphones - The active microphone is shown in the HUD during playback
Per-Song Scoreboards
Each song maintains a scoreboard of past performances. Scores are tracked per profile, so multiple singers can compete on the same songs.
Backgrounds
Nightingale offers 7 background themes during playback, cycled with the T key.
GPU Shader Backgrounds
Five backgrounds are rendered in real-time using GPU shaders (WGSL):
- Plasma — flowing colorful plasma effect
- Aurora — northern lights animation
- Waves — undulating wave patterns
- Nebula — cosmic nebula clouds
- Starfield — deep space star field
These run at full frame rate and adapt to your display resolution.

Pixabay Video Backgrounds
Pre-downloaded video backgrounds from Pixabay in 5 flavors, cycled with the F key:
- Nature — forests, mountains, rivers
- Underwater — ocean, coral, sea life
- Space — galaxies, nebulae, Earth from orbit
- City — urban skylines, night cityscapes
- Countryside — rolling fields, sunsets
Videos are pre-downloaded during setup so they’re ready instantly.
Source Video Playback
When playing a video file (.mp4, .mkv, etc.), the original video plays as a synchronized background automatically. This is the default for video file sources.
Profiles
Nightingale supports multiple player profiles for tracking scores across different singers.
Creating Profiles
Create a new profile from the main menu. Each profile stores:
- Player name
- Per-song pitch scores and star ratings
- Score history
Switching Profiles
Switch between profiles from the sidebar. The active profile is shown in the UI and all new scores are saved to it.
Score Tracking
Scores are stored in ~/.nightingale/profiles.json. Each profile maintains separate scoreboards for every song, so multiple singers can compete on the same library.
Configuration
Nightingale stores its configuration at ~/.nightingale/config.json.
Data Storage
Everything lives under ~/.nightingale/:
~/.nightingale/
├── cache/ # Stems, transcripts, lyrics per song
├── config.json # App settings
├── profiles.json # Player profiles and scores
├── videos/ # Cached Pixabay video backgrounds
├── sounds/ # Sound effects
├── vendor/
│ ├── ffmpeg # Downloaded ffmpeg binary
│ ├── uv # Downloaded uv binary
│ ├── python/ # Python 3.10 installed via uv
│ ├── venv/ # Virtual environment with ML packages
│ ├── analyzer/ # Extracted analyzer Python scripts
│ └── .ready # Marker indicating setup is complete
└── models/
├── torch/ # Demucs model cache
├── huggingface/ # WhisperX model cache
└── audio_separator/ # UVR Karaoke model cache
Video Backgrounds
Pixabay video backgrounds use the Pixabay API. In development, create a .env file at the project root with:
PIXABAY_API_KEY=your_key_here
Theme
Toggle between dark and light themes from the sidebar. The theme preference is saved in the config.

Building from Source
Prerequisites
| Tool | Version |
|---|---|
| Rust | 1.85+ (edition 2024) |
| Linux only | libasound2-dev, libudev-dev, libwayland-dev, libxkbcommon-dev |
Development Build
git clone <repo-url> nightingale
cd nightingale
cargo build --release
Local Release
Linux / macOS
scripts/make-release.sh
Builds the release binary and packages it into nightingale-<target>.tar.gz.
Windows (PowerShell)
powershell -ExecutionPolicy Bypass -File scripts/make-release.ps1
Builds the release binary and packages it into nightingale-x86_64-pc-windows-msvc.zip.
CLI Flags
| Flag | Description |
|---|---|
--setup | Force re-run of the first-launch bootstrap |
Troubleshooting
First Launch Takes Too Long
The initial setup downloads Python, PyTorch, ML models, and video backgrounds. On a slow connection this can take 10–20 minutes. Subsequent launches skip setup entirely.
Analysis Fails
If song analysis fails:
- Check the error message in the UI for details
- Ensure you have enough disk space (~5 GB for models and cache)
- Try re-running with
--setupto reset the vendor environment - GPU memory errors may occur with very long songs — CPU fallback will be used automatically
No Sound
- Verify your audio output device is correctly configured
- Check that the audio file format is supported (
.mp3,.flac,.ogg,.wav,.m4a,.aac,.wma) - Try a different audio file to rule out file-specific issues
Microphone Not Detected
- Press
Nto cycle through available microphones - Ensure microphone permissions are granted to the application
- On macOS, check System Settings > Privacy & Security > Microphone
GPU Acceleration Not Working
The analyzer auto-detects the best backend:
- NVIDIA GPU: Requires CUDA-compatible drivers
- Apple Silicon: Uses MPS backend (some operations fall back to CPU)
- CPU: Always works as a fallback
Check the setup progress screen for which backend was detected.
Reset Everything
To completely reset Nightingale, delete the data directory:
rm -rf ~/.nightingale
The next launch will re-run setup from scratch.