Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Nightingale

Karaoke from any song in your music library, powered by neural networks.

Nightingale scans your music folder, separates lead vocals from instrumentals using the UVR Karaoke model (or Demucs), transcribes lyrics with word-level timestamps via WhisperX, and plays it all back with synchronized highlighting, pitch scoring, profiles, and dynamic backgrounds.

Ships as a single binary. No manual installation of Python, ffmpeg, or ML models required — everything is downloaded and bootstrapped automatically on first launch.

Nightingale playback

Key Features

  • Stem Separation — isolates lead vocals from instrumentals
  • Word-Level Lyrics — automatic transcription with alignment
  • Pitch Scoring — real-time microphone input with star ratings
  • Profiles — per-player score tracking
  • Video Files — use video files with synchronized background playback
  • 7 Background Themes — GPU shaders, Pixabay videos, source video
  • Gamepad Support — full navigation via gamepad
  • Self-Contained — zero manual dependency setup

Supported Platforms

PlatformTarget
Linux x86_64x86_64-unknown-linux-gnu
Linux aarch64aarch64-unknown-linux-gnu
macOS ARMaarch64-apple-darwin
macOS Intelx86_64-apple-darwin
Windows x86_64x86_64-pc-windows-msvc

Getting Started

Installation

Download the latest release for your platform from the Releases page and run the binary.

Supported audio formats: .mp3, .flac, .ogg, .wav, .m4a, .aac, .wma.

Supported video formats: .mp4, .mkv, .avi, .webm, .mov, .m4v.

First Launch

On first launch, Nightingale will set up its environment automatically:

  1. Downloads ffmpeg — needed for audio/video processing
  2. Downloads uv — Python package manager
  3. Installs Python 3.10 — via uv, isolated from your system Python
  4. Creates virtual environment — with PyTorch, WhisperX, Demucs, and UVR models
  5. Downloads ML models — stem separation and transcription models
  6. Pre-downloads video backgrounds — Pixabay videos for the first session

This process takes a few minutes and shows a progress screen. After setup completes, Nightingale is ready to use.

Setup progress

Adding Music

When prompted, select your music folder. Nightingale will scan it for supported audio and video files. You can change the folder later in the settings.

Analysis

Before a song can be played as karaoke, it needs to be analyzed:

  1. Select a song from the library
  2. Analysis runs automatically (stem separation → lyrics → transcription)
  3. Results are cached — subsequent plays are instant

You can also queue multiple songs for batch analysis.

Song library

Force Re-setup

If something goes wrong with the vendor environment, you can force a fresh setup:

nightingale --setup

Controls

Nightingale supports both keyboard and gamepad input. The UI adapts to your input method automatically.

ActionKeyboardGamepad
MoveArrow keysD-pad / Left stick
Confirm / SelectEnterA (South)
Back / CancelEscapeB (East) / Start
Switch panelTab
Search songsType to filter

Playback

ActionKeyboardGamepad
Pause / ResumeSpaceStart
Exit to menuEscapeB (East)
Toggle guide vocalsG
Guide volume up/down+ / -
Cycle background themeT
Cycle video flavorF
Toggle microphoneM
Next microphoneN
Toggle fullscreenF11
Skip Intro / Skip OutroOn-screen buttonsA (South)

Gamepad Notes

  • Full navigation of menus, song selection, and settings via gamepad
  • D-pad and left stick both work for navigation
  • Face buttons map to confirm (A/South) and cancel (B/East)

Keyboard & Gamepad Reference

This page provides a complete reference of all keyboard shortcuts and gamepad mappings.

In the main menu, sidebar, and settings screens:

  • Arrow keys / D-pad: Move focus between items
  • Enter / A button: Select the focused item
  • Escape / B button: Go back or close overlays
  • Tab: Switch between sidebar and main content area
  • Type any text: Filter/search the song list

During Playback

While a song is playing:

  • Space / Start: Pause or resume playback
  • Escape / B: Exit back to the song menu
  • G: Toggle guide vocals on/off
  • + / -: Adjust guide vocal volume
  • T: Cycle through background themes (shaders, video, source)
  • F: Cycle through Pixabay video flavors (Nature, Underwater, Space, City, Countryside)
  • M: Toggle microphone for pitch scoring
  • N: Switch to the next available microphone
  • F11: Toggle fullscreen

Skip Buttons

During playback, on-screen skip buttons appear for intro and outro sections. These can be activated with Enter/A or clicked.

How It Works

Nightingale’s pipeline transforms any audio or video file into a karaoke experience through several stages.

Pipeline Overview

flowchart TD
    A["🎵 Audio or video file"] --> B["UVR Karaoke / Demucs"]
    B --> |"vocals.ogg + instrumental.ogg"| C["LRCLIB"]
    C --> |"Fetches synced lyrics if available"| D["WhisperX (large-v3)"]
    D --> |"Transcription + word-level alignment"| E["Bevy App (Rust)"]
    E --> F["🎤 Plays instrumental + synced lyrics\nwith pitch scoring & backgrounds"]

Caching

Analysis results are cached at ~/.nightingale/cache/ using blake3 file hashes. Re-analysis only happens if the source file changes or is manually triggered from the UI.

Hardware Acceleration

The Python analyzer uses PyTorch and auto-detects the best backend:

BackendDeviceNotes
CUDANVIDIA GPUFastest
MPSApple SiliconmacOS; WhisperX alignment falls back to CPU
CPUAnySlowest but always works

The UVR Karaoke model uses ONNX Runtime and enables CUDA acceleration automatically on NVIDIA GPUs, or CoreML on Apple Silicon.

A song typically takes 2–5 minutes on GPU, 10–20 minutes on CPU.

Stem Separation

Nightingale separates lead vocals from instrumentals so you can sing along to the backing track.

Models

UVR Karaoke (Default)

The UVR (Ultimate Vocal Remover) Karaoke model is optimized specifically for karaoke use. It preserves backing vocals in the instrumental track, giving a more natural karaoke experience. Uses ONNX Runtime for inference, with automatic CUDA (NVIDIA) or CoreML (Apple Silicon) acceleration.

Demucs

Demucs by Facebook Research provides an alternative separation model. You can switch between models in the settings.

Video Files

When processing video files (.mp4, .mkv, etc.), Nightingale first extracts the audio track using ffmpeg, then runs stem separation on the extracted audio. The original video is preserved for synchronized background playback.

Guide Vocals

After separation, you can control how much of the lead vocals bleed through the instrumental:

  • Toggle: Press G to turn guide vocals on/off
  • Volume: Press + / - to adjust the guide vocal level

This is useful for learning new songs or for singers who want a reference pitch.

Lyrics & Transcription

Nightingale provides word-level synchronized lyrics through two sources.

LRCLIB

LRCLIB is queried first for existing synced lyrics. When a match is found, lyrics are used directly without needing transcription. This is faster and often more accurate for well-known songs.

WhisperX Transcription

When LRCLIB doesn’t have lyrics for a song, Nightingale uses WhisperX with the large-v3 model to:

  1. Transcribe the isolated vocals into text
  2. Align each word to precise timestamps in the audio

This produces word-level timing information that drives the karaoke highlighting during playback.

Language Support

WhisperX supports a wide range of languages. The language is auto-detected from the audio. Nightingale includes CJK font support (Noto Sans CJK) for Chinese, Japanese, and Korean lyrics.

Highlighting

During playback, lyrics are displayed with word-by-word highlighting:

  • Current word — highlighted in the accent color
  • Sung words — shown in a completed state
  • Upcoming words — shown in a dimmer color
  • Next line — previewed below the current line

Pitch Scoring

Nightingale includes real-time pitch scoring to gamify the karaoke experience.

How It Works

  1. Microphone input — select your microphone and toggle it on with M
  2. Pitch detection — your vocal pitch is analyzed in real-time
  3. Comparison — your pitch is compared against the reference vocal track
  4. Scoring — accuracy is tracked throughout the song

Star Ratings

At the end of each song, you receive a star rating based on your overall pitch accuracy. Ratings are saved to your profile’s scoreboard.

Star rating

Microphone Selection

  • Press M to toggle the microphone on/off
  • Press N to cycle through available microphones
  • The active microphone is shown in the HUD during playback

Per-Song Scoreboards

Each song maintains a scoreboard of past performances. Scores are tracked per profile, so multiple singers can compete on the same songs.

Backgrounds

Nightingale offers 7 background themes during playback, cycled with the T key.

GPU Shader Backgrounds

Five backgrounds are rendered in real-time using GPU shaders (WGSL):

  1. Plasma — flowing colorful plasma effect
  2. Aurora — northern lights animation
  3. Waves — undulating wave patterns
  4. Nebula — cosmic nebula clouds
  5. Starfield — deep space star field

These run at full frame rate and adapt to your display resolution.

Aurora background Waves background

Pixabay Video Backgrounds

Pre-downloaded video backgrounds from Pixabay in 5 flavors, cycled with the F key:

  1. Nature — forests, mountains, rivers
  2. Underwater — ocean, coral, sea life
  3. Space — galaxies, nebulae, Earth from orbit
  4. City — urban skylines, night cityscapes
  5. Countryside — rolling fields, sunsets

Videos are pre-downloaded during setup so they’re ready instantly.

Source Video Playback

When playing a video file (.mp4, .mkv, etc.), the original video plays as a synchronized background automatically. This is the default for video file sources.

Profiles

Nightingale supports multiple player profiles for tracking scores across different singers.

Creating Profiles

Create a new profile from the main menu. Each profile stores:

  • Player name
  • Per-song pitch scores and star ratings
  • Score history

Switching Profiles

Switch between profiles from the sidebar. The active profile is shown in the UI and all new scores are saved to it.

Score Tracking

Scores are stored in ~/.nightingale/profiles.json. Each profile maintains separate scoreboards for every song, so multiple singers can compete on the same library.

Configuration

Nightingale stores its configuration at ~/.nightingale/config.json.

Data Storage

Everything lives under ~/.nightingale/:

~/.nightingale/
├── cache/              # Stems, transcripts, lyrics per song
├── config.json         # App settings
├── profiles.json       # Player profiles and scores
├── videos/             # Cached Pixabay video backgrounds
├── sounds/             # Sound effects
├── vendor/
│   ├── ffmpeg          # Downloaded ffmpeg binary
│   ├── uv              # Downloaded uv binary
│   ├── python/         # Python 3.10 installed via uv
│   ├── venv/           # Virtual environment with ML packages
│   ├── analyzer/       # Extracted analyzer Python scripts
│   └── .ready          # Marker indicating setup is complete
└── models/
    ├── torch/          # Demucs model cache
    ├── huggingface/    # WhisperX model cache
    └── audio_separator/ # UVR Karaoke model cache

Video Backgrounds

Pixabay video backgrounds use the Pixabay API. In development, create a .env file at the project root with:

PIXABAY_API_KEY=your_key_here

Theme

Toggle between dark and light themes from the sidebar. The theme preference is saved in the config.

White theme

Building from Source

Prerequisites

ToolVersion
Rust1.85+ (edition 2024)
Linux onlylibasound2-dev, libudev-dev, libwayland-dev, libxkbcommon-dev

Development Build

git clone <repo-url> nightingale
cd nightingale
cargo build --release

Local Release

Linux / macOS

scripts/make-release.sh

Builds the release binary and packages it into nightingale-<target>.tar.gz.

Windows (PowerShell)

powershell -ExecutionPolicy Bypass -File scripts/make-release.ps1

Builds the release binary and packages it into nightingale-x86_64-pc-windows-msvc.zip.

CLI Flags

FlagDescription
--setupForce re-run of the first-launch bootstrap

Troubleshooting

First Launch Takes Too Long

The initial setup downloads Python, PyTorch, ML models, and video backgrounds. On a slow connection this can take 10–20 minutes. Subsequent launches skip setup entirely.

Analysis Fails

If song analysis fails:

  1. Check the error message in the UI for details
  2. Ensure you have enough disk space (~5 GB for models and cache)
  3. Try re-running with --setup to reset the vendor environment
  4. GPU memory errors may occur with very long songs — CPU fallback will be used automatically

No Sound

  • Verify your audio output device is correctly configured
  • Check that the audio file format is supported (.mp3, .flac, .ogg, .wav, .m4a, .aac, .wma)
  • Try a different audio file to rule out file-specific issues

Microphone Not Detected

  • Press N to cycle through available microphones
  • Ensure microphone permissions are granted to the application
  • On macOS, check System Settings > Privacy & Security > Microphone

GPU Acceleration Not Working

The analyzer auto-detects the best backend:

  • NVIDIA GPU: Requires CUDA-compatible drivers
  • Apple Silicon: Uses MPS backend (some operations fall back to CPU)
  • CPU: Always works as a fallback

Check the setup progress screen for which backend was detected.

Reset Everything

To completely reset Nightingale, delete the data directory:

rm -rf ~/.nightingale

The next launch will re-run setup from scratch.