Speed up artist detail loading and "Watch All" operation #45

Open
opened 2026-03-21 14:21:16 -04:00 by connor · 0 comments
Owner

Description:

Root cause analysis

Loading an artist's detail page and "Watch All" are extremely slow due to excessive sequential MusicBrainz API calls. The shared rate limiter (shanty-data/src/http.rs) enforces 1.1s minimum between ALL MB requests across the app.

Tracing the enrich_artist() call chain (shanty-web/src/routes/artists.rs:256-594):

For a typical artist with N release groups:

Step Function MB API Call Count
1 Artist resolution (line ~295) GET /ws/2/artist/{mbid}?fmt=json 1
2 Artist metadata (line ~335) GET /ws/2/artist/{mbid}?inc=url-rels&fmt=json 1 (DUPLICATE of step 1 for the same MBID)
3 Release groups (line ~369) GET /ws/2/release-group?artist={mbid}&type=album|single|ep&fmt=json&limit=100 1
4a Resolve release (line ~197) GET /ws/2/release?release-group={rg_id}&fmt=json&limit=1 Up to N (when first_release_mbid is None)
4b Fetch tracks (line ~204) GET /ws/2/release/{release_mbid}?inc=recordings&fmt=json N

Total: 3 + up to 2N calls. For 20 release groups: up to 43 calls = ~47 seconds minimum.

The add_artist() function (shanty-watch/src/library.rs:63-137) has a separate problem:

  • It calls provider.get_artist_releases(&mbid, 100) which hits GET /ws/2/release/?artist={mbid}&fmt=json&limit=100
  • This returns individual releases (not release groups) — so an artist with 5 release groups but 30 individual releases (reissues, remasters, deluxe editions) triggers 30 get_release_tracks() calls
  • enrich_artist() uses release groups (deduplicated) but add_artist() uses releases (not deduplicated) — this is an architectural mismatch

Specific fixes

  1. Deduplicate get_artist_info() in enrich_artist(): It's called at line ~295 (artist resolution) and again at line ~335 (artist metadata fetch). The second call makes the exact same MB API request. Fix: if the artist was resolved by MBID in step 1, reuse that result for step 2.

  2. Fix add_artist() to use release groups: Change shanty-watch/src/library.rs to call provider.get_artist_release_groups() instead of provider.get_artist_releases(). Then for each release group, resolve the release MBID and fetch tracks. This matches what enrich_artist() does and eliminates redundant work on reissues/remasters.

  3. Eliminate resolve_release_from_group() calls: The get_artist_release_groups() MB response (shanty-data/src/musicbrainz.rs, the MbReleaseGroupResponse struct) already includes a releases array with the first release's MBID. The code at line ~290 does extract this: first_release_mbid: rg.releases.and_then(|r| r.into_iter().next().map(|rel| rel.id)). But if the MB response doesn't include the releases field (which it sometimes doesn't), first_release_mbid is None, triggering an extra API call via resolve_release_from_group() for each such release group. Fix: request the releases include in the MB API call (add &inc=releases or adjust the query), or accept the extra call but cache the result.

  4. Cache at the MB client level: Add an in-memory LRU cache in MusicBrainzFetcher for recent responses (keyed by URL). This prevents redundant calls within a single enrichment request — e.g., if get_artist_info() and the resolution step both hit the same URL, the second one is a cache hit with zero delay.

  5. Show cached data immediately on the frontend: The two-phase loading in shanty-web/frontend/src/pages/artist.rs already loads a "quick" version first (?quick=true). Ensure the quick version returns cached tracklists (10-year TTL for watched artists) and only the slow version fetches uncached release groups. Consider loading uncached release groups one at a time and updating the UI progressively (WebSocket or polling).

Key files

  • shanty-web/src/routes/artists.rsenrich_artist() function (main bottleneck)
  • shanty-watch/src/library.rsadd_artist() function (uses wrong MB endpoint)
  • shanty-data/src/musicbrainz.rsget_artist_release_groups(), resolve_release_from_group(), get_release_tracks()
  • shanty-data/src/http.rsRateLimiter (shared across all MB calls)

Acceptance criteria:

  • get_artist_info() not called twice for same MBID in a single enrichment
  • add_artist() uses release groups (not individual releases) — no redundant work on reissues
  • resolve_release_from_group() eliminated or minimized (first_release_mbid populated from MB response)
  • Artist detail page loads in under 3 seconds for cached artists
  • "Watch All" completes in N × 1.1s where N is number of release groups (not number of releases)
  • "Watch All" provides progress feedback to the user
  • Uncached artists show partial data quickly while remaining release groups load in background
**Description:** ### Root cause analysis Loading an artist's detail page and "Watch All" are extremely slow due to excessive sequential MusicBrainz API calls. The shared rate limiter (`shanty-data/src/http.rs`) enforces 1.1s minimum between ALL MB requests across the app. **Tracing the `enrich_artist()` call chain** (`shanty-web/src/routes/artists.rs:256-594`): For a typical artist with N release groups: | Step | Function | MB API Call | Count | |------|----------|-------------|-------| | 1 | Artist resolution (line ~295) | `GET /ws/2/artist/{mbid}?fmt=json` | 1 | | 2 | Artist metadata (line ~335) | `GET /ws/2/artist/{mbid}?inc=url-rels&fmt=json` | 1 (DUPLICATE of step 1 for the same MBID) | | 3 | Release groups (line ~369) | `GET /ws/2/release-group?artist={mbid}&type=album\|single\|ep&fmt=json&limit=100` | 1 | | 4a | Resolve release (line ~197) | `GET /ws/2/release?release-group={rg_id}&fmt=json&limit=1` | Up to N (when `first_release_mbid` is None) | | 4b | Fetch tracks (line ~204) | `GET /ws/2/release/{release_mbid}?inc=recordings&fmt=json` | N | **Total: 3 + up to 2N calls. For 20 release groups: up to 43 calls = ~47 seconds minimum.** **The `add_artist()` function** (`shanty-watch/src/library.rs:63-137`) has a separate problem: - It calls `provider.get_artist_releases(&mbid, 100)` which hits `GET /ws/2/release/?artist={mbid}&fmt=json&limit=100` - This returns **individual releases** (not release groups) — so an artist with 5 release groups but 30 individual releases (reissues, remasters, deluxe editions) triggers 30 `get_release_tracks()` calls - `enrich_artist()` uses release groups (deduplicated) but `add_artist()` uses releases (not deduplicated) — this is an architectural mismatch ### Specific fixes 1. **Deduplicate `get_artist_info()` in `enrich_artist()`**: It's called at line ~295 (artist resolution) and again at line ~335 (artist metadata fetch). The second call makes the exact same MB API request. Fix: if the artist was resolved by MBID in step 1, reuse that result for step 2. 2. **Fix `add_artist()` to use release groups**: Change `shanty-watch/src/library.rs` to call `provider.get_artist_release_groups()` instead of `provider.get_artist_releases()`. Then for each release group, resolve the release MBID and fetch tracks. This matches what `enrich_artist()` does and eliminates redundant work on reissues/remasters. 3. **Eliminate `resolve_release_from_group()` calls**: The `get_artist_release_groups()` MB response (`shanty-data/src/musicbrainz.rs`, the `MbReleaseGroupResponse` struct) already includes a `releases` array with the first release's MBID. The code at line ~290 does extract this: `first_release_mbid: rg.releases.and_then(|r| r.into_iter().next().map(|rel| rel.id))`. But if the MB response doesn't include the `releases` field (which it sometimes doesn't), `first_release_mbid` is None, triggering an extra API call via `resolve_release_from_group()` for each such release group. Fix: request the `releases` include in the MB API call (add `&inc=releases` or adjust the query), or accept the extra call but cache the result. 4. **Cache at the MB client level**: Add an in-memory LRU cache in `MusicBrainzFetcher` for recent responses (keyed by URL). This prevents redundant calls within a single enrichment request — e.g., if `get_artist_info()` and the resolution step both hit the same URL, the second one is a cache hit with zero delay. 5. **Show cached data immediately on the frontend**: The two-phase loading in `shanty-web/frontend/src/pages/artist.rs` already loads a "quick" version first (`?quick=true`). Ensure the quick version returns cached tracklists (10-year TTL for watched artists) and only the slow version fetches uncached release groups. Consider loading uncached release groups one at a time and updating the UI progressively (WebSocket or polling). ### Key files - `shanty-web/src/routes/artists.rs` — `enrich_artist()` function (main bottleneck) - `shanty-watch/src/library.rs` — `add_artist()` function (uses wrong MB endpoint) - `shanty-data/src/musicbrainz.rs` — `get_artist_release_groups()`, `resolve_release_from_group()`, `get_release_tracks()` - `shanty-data/src/http.rs` — `RateLimiter` (shared across all MB calls) **Acceptance criteria:** - [ ] `get_artist_info()` not called twice for same MBID in a single enrichment - [ ] `add_artist()` uses release groups (not individual releases) — no redundant work on reissues - [ ] `resolve_release_from_group()` eliminated or minimized (first_release_mbid populated from MB response) - [ ] Artist detail page loads in under 3 seconds for cached artists - [ ] "Watch All" completes in N × 1.1s where N is number of release groups (not number of releases) - [ ] "Watch All" provides progress feedback to the user - [ ] Uncached artists show partial data quickly while remaining release groups load in background
connor added the HighPriority label 2026-03-21 14:21:29 -04:00
connor started working 2026-03-21 14:22:48 -04:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Shanty/Main#45