Implement online metadata lookup in shanty-tag #4

New Issue

2026-03-17T14:17:57-04:00

connor commented

2026-03-17 14:17:57 -04:00

The shanty-tag crate is responsible for filling in missing or incorrect metadata on music files. The MVP approach is "look online first" — query online databases (primarily MusicBrainz) using whatever partial metadata is available (artist + title, album name, etc.) to find the correct tags.

This issue covers:

MusicBrainz client — implement a client that queries the MusicBrainz API to look up track/album/artist metadata. MusicBrainz has a free API with rate limiting (1 request/second for unauthenticated). The client should:
- Search by artist + title to find a matching recording
- Search by album name + artist to find a matching release
- Retrieve full metadata for a matched recording/release (title, artist, album, track number, year, genre, cover art URL via Cover Art Archive, MusicBrainz IDs)
- Respect rate limits (implement a rate limiter / request queue)
- Handle API errors gracefully
Tag matching logic — given a track from the database (which may have partial metadata), attempt to find a match online:
- If artist + title are available, search for the recording
- If only a filename is available, attempt to parse artist/title from the filename (common patterns like "Artist - Title.mp3")
- Score potential matches by similarity to existing metadata (fuzzy string matching)
- Allow a configurable confidence threshold — only apply tags if the match confidence is above the threshold
Database update — when a match is found and accepted, update the track (and album/artist) records in shanty-db with the new metadata. Also update the MusicBrainz IDs for future reference.
File tag writing — optionally write the updated metadata back to the actual music file's embedded tags (ID3, Vorbis comments, etc.). This should be an opt-in behavior since some users may not want their files modified.
CLI interface — the shanty-tag binary should accept:
- A path to the database (optional, with default)
- --all to tag all untagged/partially-tagged tracks in the database
- --track <id> to tag a specific track
- --dry-run to show what would be changed without applying
- --write-tags to enable writing tags back to files
- --confidence <0.0-1.0> to set the match threshold (default ~0.8)

Design Considerations

The data backend should be trait-based so that alternative providers (Last.fm, Discogs, etc.) can be added later without changing the core logic. Define a MetadataProvider trait with methods like search_recording, search_release, get_recording_details, etc.
MusicBrainz requires a descriptive User-Agent header — use something like Shanty/0.1.0 (https://github.com/your-repo).
Batch operations should be parallelized where possible, but respect API rate limits.
Store which provider supplied the metadata so the user knows the source.
We will want to strongly consider (and make available) routines for cleaning up and standardizing titles for artists/albums/songs to look out for odd characters, etc.

Acceptance Criteria

MusicBrainz API client is implemented with proper rate limiting
Given a track with artist + title, the tagger finds a matching recording and retrieves full metadata
Fuzzy matching works — minor spelling differences don't prevent matches
Database records are updated with new metadata and MusicBrainz IDs
--write-tags actually writes metadata back into the music file
--dry-run shows proposed changes without applying them
Confidence threshold filtering works
MetadataProvider trait exists, and MusicBrainz is the first implementation
CLI interface works as specified
Errors from the API (rate limits, network issues, no results) are handled gracefully
Tests exist for matching logic (unit tests with mocked API responses)

Dependencies

Issue #1 (workspace scaffolding)
Issue #2 (shared database schema)
Issue #3 (music indexing — so there are tracks in the DB to tag)

The `shanty-tag` crate is responsible for filling in missing or incorrect metadata on music files. The MVP approach is "look online first" — query online databases (primarily MusicBrainz) using whatever partial metadata is available (artist + title, album name, etc.) to find the correct tags. This issue covers: 1. **MusicBrainz client** — implement a client that queries the MusicBrainz API to look up track/album/artist metadata. MusicBrainz has a free API with rate limiting (1 request/second for unauthenticated). The client should: - Search by artist + title to find a matching recording - Search by album name + artist to find a matching release - Retrieve full metadata for a matched recording/release (title, artist, album, track number, year, genre, cover art URL via Cover Art Archive, MusicBrainz IDs) - Respect rate limits (implement a rate limiter / request queue) - Handle API errors gracefully 2. **Tag matching logic** — given a track from the database (which may have partial metadata), attempt to find a match online: - If artist + title are available, search for the recording - If only a filename is available, attempt to parse artist/title from the filename (common patterns like "Artist - Title.mp3") - Score potential matches by similarity to existing metadata (fuzzy string matching) - Allow a configurable confidence threshold — only apply tags if the match confidence is above the threshold 3. **Database update** — when a match is found and accepted, update the track (and album/artist) records in `shanty-db` with the new metadata. Also update the MusicBrainz IDs for future reference. 4. **File tag writing** — optionally write the updated metadata back to the actual music file's embedded tags (ID3, Vorbis comments, etc.). This should be an opt-in behavior since some users may not want their files modified. 5. **CLI interface** — the `shanty-tag` binary should accept: - A path to the database (optional, with default) - `--all` to tag all untagged/partially-tagged tracks in the database - `--track <id>` to tag a specific track - `--dry-run` to show what would be changed without applying - `--write-tags` to enable writing tags back to files - `--confidence <0.0-1.0>` to set the match threshold (default ~0.8) ### Design Considerations - The data backend should be trait-based so that alternative providers (Last.fm, Discogs, etc.) can be added later without changing the core logic. Define a `MetadataProvider` trait with methods like `search_recording`, `search_release`, `get_recording_details`, etc. - MusicBrainz requires a descriptive User-Agent header — use something like `Shanty/0.1.0 (https://github.com/your-repo)`. - Batch operations should be parallelized where possible, but respect API rate limits. - Store which provider supplied the metadata so the user knows the source. - We will want to strongly consider (and make available) routines for cleaning up and standardizing titles for artists/albums/songs to look out for odd characters, etc. ### Acceptance Criteria - [ ] MusicBrainz API client is implemented with proper rate limiting - [ ] Given a track with artist + title, the tagger finds a matching recording and retrieves full metadata - [ ] Fuzzy matching works — minor spelling differences don't prevent matches - [ ] Database records are updated with new metadata and MusicBrainz IDs - [ ] `--write-tags` actually writes metadata back into the music file - [ ] `--dry-run` shows proposed changes without applying them - [ ] Confidence threshold filtering works - [ ] `MetadataProvider` trait exists, and MusicBrainz is the first implementation - [ ] CLI interface works as specified - [ ] Errors from the API (rate limits, network issues, no results) are handled gracefully - [ ] Tests exist for matching logic (unit tests with mocked API responses) ### Dependencies - Issue #1 (workspace scaffolding) - Issue #2 (shared database schema) - Issue #3 (music indexing — so there are tracks in the DB to tag)

connor added the HighPriority MVP labels 2026-03-17 14:18:27 -04:00

connor started working 2026-03-17 14:45:44 -04:00

connor closed this issue

2026-03-17 15:22:06 -04:00

connor worked for 36 minutes

2026-03-17 15:22:06 -04:00

connor referenced this issue

2026-03-17 16:20:43 -04:00

Implement online music search in `shanty-search` #8

connor referenced this issue

2026-03-17 16:24:56 -04:00

Implement the Actix web backend for `shanty-web` #9