# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What Is Shanty?

Shanty is a self-hosted music management application ("better Lidarr"). It searches MusicBrainz for music metadata, downloads from YouTube via yt-dlp, tags and organizes files, and serves the library over the Subsonic protocol. It is a Cargo workspace where each component is both a standalone CLI tool and a library consumed by the web app.

## Development Commands

A `justfile` provides common workflows. Run `just` to see all targets.

```sh
just check          # fmt + lint + test (full pre-commit check)
just dev            # build frontend + run server
just build          # cargo build --workspace
just test           # cargo test --workspace
just lint           # cargo clippy --workspace -- -D warnings
just fmt            # cargo fmt
just frontend       # cd shanty-web/frontend && trunk build
just run            # cargo run --bin shanty
```

**Running single crate tests:**
```sh
cargo test --package shanty-db
cargo test --package shanty-tag
```

**Running a single test by name:**
```sh
cargo test --package shanty-tag test_tag_with_match
```

**Frontend build (Yew/Trunk → WASM):**
```sh
cd shanty-web/frontend && trunk build              # dev
cd shanty-web/frontend && trunk build --release    # optimized
```

**Running the server with verbose logging:**
```sh
cargo run --bin shanty -- -v      # info
cargo run --bin shanty -- -vv     # debug
cargo run --bin shanty -- -vvv    # trace
```

**MusicBrainz dump import subcommand:**
```sh
cargo run --bin shanty -- mb-import --download --data-dir /path/to/dumps
```

**Prerequisites:** Rust (stable, edition 2024), yt-dlp, ffmpeg, Python 3, ytmusicapi, Trunk. The `rust-toolchain.toml` pins stable and adds the `wasm32-unknown-unknown` target.

## Design Philosophy

1. **Modular crates.** Each crate is a library and a CLI binary. The web app imports the library side; the CLI binary is for standalone use. Crates are git submodules hosted at `ssh://connor@git.rcjohnstone.com:2222/Shanty/{name}.git`. The exceptions are `shanty-config` and `shanty-data`, which are local workspace crates (not submodules).

2. **MBID-first matching.** All matching and deduplication in the web app uses MusicBrainz recording MBIDs, never string-based name matching. Name matching is only used in standalone CLI tools as a fallback.

3. **Provider-swappable data layer.** All external API calls go through trait-based providers in `shanty-data`. Metadata, artist images, bios, lyrics, and cover art each have a trait with multiple implementations. The active provider is selected via config.

4. **Track-level watchlist.** When a user watches an artist or album, it is expanded into individual track `wanted_item` records via MusicBrainz, each with a recording MBID. This enables per-track status tracking through the pipeline.

5. **Release groups, not releases.** The UI shows deduplicated release groups (album concepts), not individual releases (which have tons of reissues/regional editions). Filtered by secondary type -- default is studio only.

## Workspace Structure

All crates live in the workspace root.

| Crate | Purpose |
|-------|---------|
| `shanty` (root) | Top-level binary entry point, Actix server setup, graceful shutdown, background task spawning |
| `shanty-config` | Shared config types (AppConfig), YAML loading/saving, environment variable overrides |
| `shanty-data` | Unified external data providers: MusicBrainz (remote + local hybrid), Wikipedia, fanart.tv, Last.fm, LRCLIB, Cover Art Archive, MB dump import |
| `shanty-db` | Sea-ORM + SQLite schema, migrations, query modules for all tables |
| `shanty-index` | Scan directories, extract metadata from audio files via lofty |
| `shanty-tag` | MusicBrainz lookup, fuzzy matching, file tag writing |
| `shanty-org` | File organization with configurable format templates |
| `shanty-watch` | Watchlist management, MusicBrainz discography expansion (artist/album to tracks) |
| `shanty-dl` | yt-dlp download backend, rate limiting, download queue processing, ytmusicapi search |
| `shanty-search` | SearchProvider trait, MusicBrainz search + release group listing |
| `shanty-playlist` | Playlist generation strategies (similar-artist, genre, random, smart rules) |
| `shanty-web` | Actix backend routes, Yew (WASM) frontend, background task modules |
| `shanty-notify` | Notifications via Apprise/webhooks (stub -- not yet implemented) |
| `shanty-serve` | Subsonic streaming (stub -- functionality is in shanty-web) |
| `shanty-play` | Built-in web player (stub -- not yet implemented) |

The frontend is at `shanty-web/frontend/` and is excluded from the Cargo workspace. It builds separately with Trunk to `shanty-web/static/`.

## Key Architectural Patterns

### Data Providers (shanty-data)

`shanty-data` owns all external API calls. Key traits:

- `MetadataFetcher` -- artist info, release tracks, release resolution (MusicBrainz)
- `ArtistImageFetcher` -- artist photos (Wikipedia, fanart.tv)
- `ArtistBioFetcher` -- artist biographies (Wikipedia, Last.fm)
- `LyricsFetcher` -- song lyrics (LRCLIB)
- `CoverArtFetcher` -- album art (Cover Art Archive)
- `SimilarArtistFetcher` -- similar artist data (Last.fm)

**HybridMusicBrainzFetcher** wraps `LocalMusicBrainzFetcher` (optional) + `MusicBrainzFetcher` (remote API). It tries the local SQLite database first and falls back to the rate-limited remote API. The local DB is populated by importing MusicBrainz JSON dumps.

### Web Server (shanty-web + root binary)

The root `shanty` binary sets up the Actix server, creates shared `AppState`, and spawns background tasks. The `shanty-web` crate provides the route handlers and frontend.

**AppState** holds: database connection, MusicBrainz client (hybrid), search provider, Wikipedia fetcher, shared config (behind `Arc<RwLock>`), task manager, scheduler info, Firefox login session state.

### Background Tasks

Four background loops run via `tokio::spawn` + sleep:

1. **cookie_refresh** -- refreshes YouTube cookies via headless Firefox (every 6 hours, configurable)
2. **pipeline_scheduler** -- runs the full download pipeline automatically (every 3 hours, configurable)
3. **monitor** -- checks monitored artists for new releases (every 12 hours, configurable)
4. **mb_update** -- re-imports MusicBrainz dumps if auto_update is enabled (weekly)

One-off tasks (index, tag, organize, download process, monitor check, MB import) are spawned on demand and tracked via `TaskManager`.

### Database

Sea-ORM with SQLite. Migrations run automatically on startup. Key tables:

- `artists` -- name (unique), musicbrainz_id (unique), monitored flag, top_songs/similar_artists (JSON)
- `albums` -- name, album_artist, year, genre, musicbrainz_id, artist_id FK
- `tracks` -- file_path (unique), all metadata fields, musicbrainz_id, artist_id/album_id FKs
- `wanted_items` -- item_type, name, musicbrainz_id, artist_id, status (Wanted/Available/Downloaded/Owned), user_id
- `download_queue` -- query, wanted_item_id FK, status, retry_count
- `search_cache` -- query_key (unique), provider, result_json, expires_at (used for MB data, lyrics, artist enrichment)
- `users` -- username, password_hash (bcrypt), role (Admin/User), subsonic_password (plaintext per Subsonic protocol)
- `playlists` / `playlist_tracks` -- saved playlists with ordered track references

### Subsonic API

Mounted at `/rest/*` with separate authentication (username + MD5 token, per the Subsonic protocol spec). Supports browsing, streaming, playlists, search, cover art, and scrobbling. Opus files are auto-transcoded to MP3 via ffmpeg for client compatibility.

### Frontend

Yew 0.21 with client-side rendering (CSR, no SSR). Built with Trunk to WASM. The compiled output goes to `shanty-web/static/` and is served by Actix with SPA fallback (all non-API routes serve `index.html`).

## Data Flow (Pipeline)

The full automated pipeline, triggered by "Set Sail" in the UI or by the pipeline scheduler:

1. **Search** -- find artist/album on MusicBrainz
2. **Watch** -- add to watchlist (expands to individual track wanted_items with recording MBIDs)
3. **Sync** -- `shanty_dl::sync_wanted_to_queue` creates download_queue entries for wanted items
4. **Download** -- yt-dlp downloads via YouTube Music search (ytmusicapi Python script), creates track records in DB with MBIDs from wanted_items
5. **Index** -- scan library, extract metadata from new files
6. **Tag** -- MusicBrainz lookup by MBID (skips search since MBID is known), write tags to files
7. **Organize** -- move files to `{artist}/{album}/{track_number} - {title}.{ext}` in the library
8. **Promote** -- all Downloaded wanted_items are marked as Owned

## Configuration

YAML config file at `~/.config/shanty/config.yaml` (or `SHANTY_CONFIG` env var). Environment variables override YAML values. In Docker, the config file is at `/config/config.yaml`.

Key environment variables:
- `SHANTY_DATA_DIR` -- base directory for all application data (Docker: `/data`)
- `SHANTY_DATABASE_URL` -- SQLite connection string
- `SHANTY_LIBRARY_PATH` -- music library root
- `SHANTY_CONFIG` -- path to config YAML
- `SHANTY_WEB_PORT` / `SHANTY_WEB_BIND` -- server binding
- `SHANTY_LASTFM_API_KEY` -- Last.fm API key (for bios and similar-artist playlists)
- `SHANTY_FANART_API_KEY` -- fanart.tv API key (for artist images/banners)

The config is loaded once at startup and held in `Arc<RwLock<AppConfig>>`. It can be updated at runtime via the `/api/config` PUT endpoint, which writes back to the YAML file and updates the in-memory config.

## Coding Standards

- **Rust edition 2024** with resolver v3
- `cargo clippy -- -D warnings` must pass
- `cargo fmt` for formatting
- All crates must compile independently
- Never use `#[allow(dead_code)]` -- remove dead code instead
- Never create local DB records for artists the user is just browsing (only persist when they watch)
- Use MBIDs for all matching in the web app, never name-based matching
- Artist credits: use the primary (first) artist only, never concatenate collaborators

## Testing

```sh
cargo test --workspace           # all tests
cargo test --package shanty-db   # single crate
```

The frontend is excluded from workspace tests (it has its own build process via Trunk).

**Test patterns used across crates:**
- **In-memory SQLite:** Integration tests create `Database::new("sqlite::memory:")` for fast, isolated DB testing. No fixtures directory -- data is inserted programmatically.
- **Mock providers:** Each crate that depends on external APIs defines its own mock trait implementations (e.g., `MockProvider` for `MetadataProvider`, `MockBackend` for `DownloadBackend`). Mocks are self-contained per crate.
- **Temp files:** `tempfile::TempDir` + `lofty` to create real MP3 files with valid ID3 tags for index/org/tag tests.
- **Integration tests:** `{crate}/tests/integration.rs` (async via `#[tokio::test]`)
- **Unit tests:** `#[cfg(test)]` modules inline in source files for pure functions (sanitization, parsing, normalization, template rendering).

## Important Constraints

- **MusicBrainz rate limit:** 1 request per 1.1 seconds for the remote API. Mitigated by the local SQLite database (imported from MB dumps) and aggressive caching in `search_cache`.
- **YouTube cookies expire** roughly every 2 weeks. Auto-refreshed by headless Firefox every 6 hours when cookie_refresh is enabled.
- **Session key is random on startup** -- user sessions do not survive server restarts.
- **Subsonic password is stored in plaintext** per the Subsonic protocol specification. Users are warned about this in the UI.
- **Opus transcoding** for Subsonic clients transcodes the entire file to memory before streaming. Not ideal for very large files.

## Making Changes

- Backend route changes: edit files in `shanty-web/src/routes/`
- Frontend changes: edit files in `shanty-web/frontend/src/`, then `cd shanty-web/frontend && trunk build`
- Config changes: update `shanty-config/src/lib.rs` (add field + default), update `apply_env_overrides` if adding env var support
- Database schema changes: add a migration in `shanty-db`, update entities and queries
- Adding a new external data source: add a provider implementation in `shanty-data` behind the appropriate trait
- Each crate submodule must be committed and pushed independently before updating the parent workspace