Implement music downloading via yt-dlp in shanty-dl #7

Closed
opened 2026-03-17 15:21:52 -04:00 by connor · 0 comments
Owner

The shanty-dl crate is responsible for actually downloading music files. For the MVP, the only backend is yt-dlp, which is capable of downloading audio from YouTube, SoundCloud, Bandcamp, and hundreds of other sites. The user provides a specific song (or list of songs), and the crate handles downloading, converting to the desired format, and storing the result.

This issue covers:

  1. yt-dlp integration — shell out to yt-dlp (assumed to be installed on the system) to download audio. The crate should:

    • Accept a search query (e.g., "Pink Floyd Time") or a direct URL
    • Use yt-dlp's search feature (ytsearch:) to find the best match when given a query
    • Download as the best available audio quality
    • Convert to a target format (default: MP3 320kbps or OPUS, configurable)
    • Save to a configurable output directory
    • Embed available metadata (title, artist) into the downloaded file
  2. Download queue — integrate with the download_queue table in shanty-db:

    • Items can be added to the queue (manually or from the watchlist's "wanted" items)
    • The downloader processes the queue, updating status as it goes (pendingdownloadingcompleted / failed)
    • Failed downloads include error information for debugging
    • Support retrying failed downloads
  3. Post-download integration — after a successful download:

    • Add the file to the database via shanty-index (or at minimum insert a record)
    • If the download was for a watchlist item, update that item's status to downloaded
    • Optionally run the organizer (shanty-org) to move the file to the right location
  4. CLI interface — the shanty-dl binary should accept:

    • download <query_or_url> — download a single item
    • download --from-queue — process all pending items in the download queue
    • --format <mp3|opus|flac|best> — output format
    • --output <dir> — output directory
    • --dry-run — show what would be downloaded without doing it
    • queue add <query> — add an item to the download queue
    • queue list — show the current queue
    • queue retry — retry all failed downloads

Design Considerations

  • The download backend should be trait-based (DownloadBackend trait) so that future backends (torrents via transmission-rpc, Soulseek, NZBGet, etc.) can be added. The trait should define methods like search, download, supports_url, etc.
  • yt-dlp output parsing can be tricky — use --print-json or --dump-json for structured output.
  • Consider concurrent downloads (configurable limit, e.g., 3 simultaneous) for queue processing.
  • yt-dlp must be installed separately — the crate should check for its presence and give a clear error if missing.
  • Be mindful of YouTube rate limiting — don't hammer the API.

Acceptance Criteria

  • Given a search query, the downloader finds and downloads the correct song via yt-dlp
  • Given a direct URL, the downloader downloads the audio from that URL
  • Downloaded files are converted to the requested format
  • Downloads are recorded in the database
  • The download queue works — items can be added, processed, and retried
  • Failed downloads are logged with error details
  • DownloadBackend trait exists and yt-dlp is the first implementation
  • CLI interface works as specified
  • The crate checks for yt-dlp availability and errors clearly if missing
  • Concurrent queue processing works with a configurable limit
  • Integration test: download a short public-domain audio clip, verify file exists and is valid
The `shanty-dl` crate is responsible for actually downloading music files. For the MVP, the only backend is `yt-dlp`, which is capable of downloading audio from YouTube, SoundCloud, Bandcamp, and hundreds of other sites. The user provides a specific song (or list of songs), and the crate handles downloading, converting to the desired format, and storing the result. This issue covers: 1. **yt-dlp integration** — shell out to `yt-dlp` (assumed to be installed on the system) to download audio. The crate should: - Accept a search query (e.g., "Pink Floyd Time") or a direct URL - Use yt-dlp's search feature (`ytsearch:`) to find the best match when given a query - Download as the best available audio quality - Convert to a target format (default: MP3 320kbps or OPUS, configurable) - Save to a configurable output directory - Embed available metadata (title, artist) into the downloaded file 2. **Download queue** — integrate with the `download_queue` table in `shanty-db`: - Items can be added to the queue (manually or from the watchlist's "wanted" items) - The downloader processes the queue, updating status as it goes (`pending` → `downloading` → `completed` / `failed`) - Failed downloads include error information for debugging - Support retrying failed downloads 3. **Post-download integration** — after a successful download: - Add the file to the database via `shanty-index` (or at minimum insert a record) - If the download was for a watchlist item, update that item's status to `downloaded` - Optionally run the organizer (`shanty-org`) to move the file to the right location 4. **CLI interface** — the `shanty-dl` binary should accept: - `download <query_or_url>` — download a single item - `download --from-queue` — process all pending items in the download queue - `--format <mp3|opus|flac|best>` — output format - `--output <dir>` — output directory - `--dry-run` — show what would be downloaded without doing it - `queue add <query>` — add an item to the download queue - `queue list` — show the current queue - `queue retry` — retry all failed downloads ### Design Considerations - The download backend should be trait-based (`DownloadBackend` trait) so that future backends (torrents via `transmission-rpc`, Soulseek, NZBGet, etc.) can be added. The trait should define methods like `search`, `download`, `supports_url`, etc. - yt-dlp output parsing can be tricky — use `--print-json` or `--dump-json` for structured output. - Consider concurrent downloads (configurable limit, e.g., 3 simultaneous) for queue processing. - yt-dlp must be installed separately — the crate should check for its presence and give a clear error if missing. - Be mindful of YouTube rate limiting — don't hammer the API. ### Acceptance Criteria - [ ] Given a search query, the downloader finds and downloads the correct song via yt-dlp - [ ] Given a direct URL, the downloader downloads the audio from that URL - [ ] Downloaded files are converted to the requested format - [ ] Downloads are recorded in the database - [ ] The download queue works — items can be added, processed, and retried - [ ] Failed downloads are logged with error details - [ ] `DownloadBackend` trait exists and yt-dlp is the first implementation - [ ] CLI interface works as specified - [ ] The crate checks for yt-dlp availability and errors clearly if missing - [ ] Concurrent queue processing works with a configurable limit - [ ] Integration test: download a short public-domain audio clip, verify file exists and is valid
connor started working 2026-03-17 15:26:21 -04:00
connor added the HighPriorityMVP labels 2026-03-17 15:30:17 -04:00
connor worked for 2 hours 49 minutes 2026-03-17 18:15:21 -04:00
Sign in to join this conversation.
1 Participants
Notifications
Total Time Spent: 2 hours 49 minutes
connor
2 hours 49 minutes
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Shanty/Main#7