What You'll Build
A local, offline pipeline that turns structured tag prompts (instrument → timbre → FX → key → bars → BPM) into tempo-synced, bar-aligned music loops on your Radeon RX 7800 XT (RDNA3, Navi 32, gfx1101) through the ROCm stack. Foundation-1 is a fine-tune of stabilityai/stable-audio-open-1.0 trained for music-production workflows; the RC Stable Audio Tools fork handles the BPM/bar timing alignment automatically. The model is pure PyTorch, so it runs through PyTorch's native attention path on ROCm with no special kernels.
Hardware data: RX 7800 XT (16GB VRAM) · BF16/FP16 · PyTorch SDPA on ROCm 7.2 · ~7 GB usage per the HuggingFace model card (comfortable headroom on a 16 GB card) · See benchmark data
⚠️ This is a ROCm recipe, not CUDA. The RX 7800 XT runs on AMD's ROCm/HIP stack — there is no
cu121/cu124/cu128wheel here, no xformers install, and no FP8/FP4 path. You install PyTorch built for ROCm, not for CUDA. RDNA3 has no FP8/FP4 hardware (its WMMA units accept FP16, BF16, INT8, INT4 only), and at 16 GB Foundation-1's ~7 GB footprint stays comfortably under the ceiling — so you simply run the native BF16/FP16 weights. The attention path is PyTorch's scaled-dot-product attention (SDPA), which is exactly what stable-audio-tools uses on a PyTorch backend — no FlashAttention build is needed or wanted. If a guide tells you topip install xformers, build a flash-attn wheel, or pick acu12xtorch wheel for this card, it's written for the wrong vendor.
ℹ️ Not a text-to-speech model. Foundation-1 is in our
ttsvertical because the catalogue groups all audio models together, but it generates one-shot music samples — bar-locked instrumental loops — not speech. It does not synthesize voices, words, or any spoken audio. For speech synthesis on this GPU, see Kokoro, VoxCPM, or Qwen3-TTS instead. Per its own HuggingFace card, it is "Structured text-to-sample generation for modern music production" — a specialized model for music sample generation, not a general-purpose music generator.
⚠️ Split license — read before shipping. The code and the weights carry two different licenses; do not conflate them.
- Code (the RC Stable Audio Tools fork) is MIT-licensed — see the RC fork repository.
- Weights (Foundation-1 model) are released under the Stability AI Community License. The HuggingFace card states the model "is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M."
If you're a hobbyist or under the $1M revenue threshold you're clear; otherwise contact Stability AI for a commercial license before publishing or selling outputs. The MIT code permission does not extend the weights' usage terms, and the Stability license on the weights does not restrict the MIT-licensed code.
Requirements
| Component | Minimum | Tested |
|---|---|---|
| GPU | 8 GB VRAM (ROCm-supported AMD card, per HF card) | RX 7800 XT (16 GB, ~9 GB headroom) |
| RAM | 16 GB system RAM | — |
| Storage | ~3 GB (2.43 GB weights + venv + deps) | — |
| Driver | AMD ROCm 7.2.x on Linux | — |
| Python | 3.10 (3.11+ may fail SciPy resolution per the RC fork README) | — |
| PyTorch | 2.4+ built for ROCm (whl/rocm7.2 index — NOT a CUDA wheel) | — |
| Software | RC Stable Audio Tools fork or ComfyUI custom node | — |
Installation
This recipe follows the canonical workflow recommended on the Foundation-1 model card — the RC Stable Audio Tools fork, which auto-handles BPM/bar timing alignment — but installs PyTorch for ROCm instead of CUDA. For a ComfyUI alternative, see Troubleshooting.
1. Clone the RC Stable Audio Tools fork
git clone https://github.com/RoyalCities/RC-stable-audio-tools.git
cd RC-stable-audio-tools
2. Create a Python 3.10 virtual environment
python3.10 -m venv venv
source venv/bin/activate
3. Install PyTorch for ROCm (do this BEFORE the package install)
The RX 7800 XT (gfx1101) is an officially ROCm-supported GPU on Linux, so it uses the stable ROCm PyTorch wheel. Install torch from the ROCm wheel index first so the later pip install doesn't pull a CPU/CUDA build:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
ℹ️ Verify the ROCm tag before you copy it. The
rocmX.Ytag in that index moves over time (6.3 → 6.4 → 7.x). Read the current stable line at pytorch.org/get-started/locally (select ROCm) before running. AMD also ships its own Radeon-tuned wheels at repo.radeon.com if you prefer the AMD-recommended build; the upstreamwhl/rocm7.2wheel above is the simplest canonical path on a supported card.
Confirm you got the ROCm build, not a CUDA or CPU one:
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
It should print a +rocm7.2-style version suffix and True (ROCm masquerades as the cuda device namespace under HIP).
4. Install stable-audio-tools and the fork
pip install stable-audio-tools
pip install .
If this step replaces your ROCm torch with a CPU or CUDA wheel (it can, because of dependency pins), re-run step 3 to reinstall the ROCm build, then re-check torch.cuda.is_available().
5. Download Foundation-1 weights
Place both files inside a single subfolder of models/:
mkdir -p models/Foundation-1
cd models/Foundation-1
curl -L -o Foundation_1.safetensors \
https://huggingface.co/RoyalCities/Foundation-1/resolve/main/Foundation_1.safetensors
curl -L -o model_config.json \
https://huggingface.co/RoyalCities/Foundation-1/resolve/main/model_config.json
cd ../..
The safetensors file is 2.43 GB (x-linked-size 2,426,992,388 bytes, HEAD-checked live; HF Files tab). This release ships only the 16-bit weights — per the card, "Unlike prior releases where both 32-bit and 16-bit models were provided, this release includes only the 16-bit version. There is no quality loss, while reducing the model footprint." On RDNA3 these 16-bit weights load and run as BF16/FP16 directly; there is no quantization step to do and no FP8 path to chase.
Running
Launch the Gradio UI, pointing at the Foundation-1 checkpoint and config you just downloaded:
python run_gradio.py \
--model-config models/Foundation-1/model_config.json \
--ckpt-path models/Foundation-1/Foundation_1.safetensors
The Gradio interface opens in your browser. Foundation-1 uses a layered tag prompt schema documented on its model card:
[Instrument Family / Sub-Family], [Timbre], [Musical Behavior / Notation], [FX], [Key], [Bars], [BPM]
A working example prompt from the card's Audio Showcase:
Bass, FM Bass, Medium Delay, Medium Reverb, Low Distortion, Phaser, Sub Bass,
Bass, Upper Mids, Acid, Gritty, Wide, Dubstep, Thick, Silky, Warm, Rich,
Overdriven, Crisp, Deep, Clean, Pitch Bend, 303, 8 Bars, 140 BPM, E minor
Supported loop structures: 4 or 8 bars; supported BPMs: 100, 110, 120, 128, 130, 140, 150. The RC fork's BPM/bar selector locks generation duration to the prompt's musical structure automatically — per the card, "an 8-bar loop at 100 BPM ≈ 19 seconds" of output. The underlying Stable Audio Open base outputs 44.1 kHz stereo; Foundation-1 is constrained to the bar/BPM grid above.
Because the model is pure PyTorch, attention runs through PyTorch's built-in scaled-dot-product attention (SDPA) on ROCm — the default, no flag or extra package required. Do not install xformers or a flash-attn wheel for this; on RDNA3 they are the wrong path and SDPA is what the stack already uses.
Results
- Speed: The model card reports generation time on an RTX 3090 only — "On an RTX 3090, generation time is approximately ~7–8 seconds per sample." The RTX 3090 is an NVIDIA Ampere card on CUDA, not architecturally or runtime-comparable to the RDNA3 Radeon RX 7800 XT on ROCm, so that figure does not transfer as a 7800 XT number. No RX-7800-XT-named generation-time measurement was found in research. Rather than transfer a number from a different GPU and a different stack, the Speed figure is omitted. Once a community benchmark lands it will appear at /check/foundation-1/rx-7800-xt — please contribute yours via the submission form.
- VRAM usage: ~7 GB during generation per the HF card: "Typical VRAM usage during generation is approximately ~7 GB. For reliable operation, a GPU with at least 8 GB of VRAM is recommended." On the RX 7800 XT's 16 GB that leaves roughly 9 GB free — comfortable enough to keep a generation session running without memory pressure.
- Output: stereo
.wavloops aligned to the requested bar count and BPM. Per the model card limitations, percussion and drum sounds are out of scope for this release; the 10 instrument families covered are Synth, Keys, Bass, Bowed Strings, Mallet, Wind, Guitar, Brass, Vocal, and Plucked Strings.
For the full benchmark data, see /check/foundation-1/rx-7800-xt.
Troubleshooting
Gradio launches but reports torch.cuda.is_available() == False
On ROCm, torch.cuda.is_available() returning False almost always means a CPU-only or CUDA torch wheel got installed instead of the ROCm build (commonly because the pip install . step in installation pulled a different torch via a dependency pin). Reinstall the ROCm wheel:
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.2
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
It should print a +rocm7.2 suffix and True. (Under HIP, ROCm exposes the GPU through the cuda device namespace, so torch.cuda.* is the correct API even on AMD.) If it still reports False, confirm your system ROCm stack is installed (rocminfo should list gfx1101) and that your user is in the render/video groups.
A library ships only gfx1100 kernels and won't load on the 7800 XT
The 7800 XT is gfx1101 (Navi 32), while the flagship 7900 XTX is gfx1100 (Navi 31). Most of the ROCm stack ships kernels for both, but occasionally a library or prebuilt extension only carries gfx1100 kernels and refuses to run on gfx1101 with a "no kernel image is available" / missing-gfx1101-kernel error. The standard Linux-only fallback is to mask the card as gfx1100 at runtime:
HSA_OVERRIDE_GFX_VERSION=11.0.0 python run_gradio.py \
--model-config models/Foundation-1/model_config.json \
--ckpt-path models/Foundation-1/Foundation_1.safetensors
This is a legacy fallback, not a default — current PyTorch on the stable ROCm wheel runs Foundation-1 natively on gfx1101 without it. Only reach for it if a specific library refuses to load on the 7800 XT's gfx1101 target.
Dependency resolution failures on Python 3.11+
The RC fork's README explicitly notes to use Python 3.10: "Newer versions (e.g. 3.11+) can fail dependency resolution due to pinned packages (notably older SciPy wheels)." Use a Python 3.10 venv as in step 2.
Prefer ComfyUI over Gradio
Two community ComfyUI custom nodes exist:
Saganaki22/ComfyUI-Foundation-1— auto-downloads weights intoComfyUI/models/stable_audio/Foundation-1/, ships example workflows. Install via ComfyUI Manager (recommended) orgit cloneintoComfyUI/custom_nodes/thenpython install.py. The install script usespip install stable-audio-tools --no-depsto protect your ComfyUI environment from the upstream's aggressivepandas==2.0.2pin.SanDiegoDude/scg_Foundation-1-comfyUI— install via ComfyUI Manager recommended; weights land atComfyUI/models/audio_checkpoints/Foundation-1/.
If you go the ComfyUI route on this card, make sure ComfyUI's own PyTorch is the ROCm build (--index-url https://download.pytorch.org/whl/rocm7.2), exactly as in step 3 — the same ROCm-not-CUDA rule applies. The same ~7 GB VRAM envelope and 8 GB minimum apply regardless of front-end; on the 16 GB 7800 XT both nodes have comfortable headroom.
Want to share the card with a larger model? Enable INT4 / Low-VRAM Mode (TorchAO)
You do not need this on a 16 GB card — the default BF16 path fits Foundation-1's ~7 GB footprint with room to spare — but the RC fork ships an optional INT4 weight-only mode (via TorchAO) you can use if you want to run Foundation-1 alongside a larger model:
pip install torchao
The fork notes INT4 inference "can be very slow on Windows because Triton fast-kernels are usually unavailable (falls back to slower paths)." On Linux/ROCm the Triton-ROCm backend is present, but INT4 weight-only is still a memory-sharing convenience, not a speed win — on a 7800 XT you'd normally leave it off and run BF16. Note that RDNA3 has no FP8/FP4 hardware, so INT4 (which maps to the WMMA INT4 path) is the only sub-16-bit weight format that makes sense here; there is no FP8 option to consider. If TorchAO isn't installed, the INT4 toggle stays hidden in the UI.
Prompt produces drift or incoherent phrases
Per the model card's Limitations section, if generation duration doesn't match the prompt's bar/BPM structure (e.g. requesting an 8-bar loop but capping output at 5 seconds), output coherence degrades. The RC fork handles this alignment automatically — if you're using bare stable-audio-tools or a third-party UI, set the audio duration manually to match the bars × (60 / BPM) × 4 formula. Also: keep prompts in the documented tag order, use 1–3 timbre descriptors, and always include both Bars and BPM.
Percussion or drum prompts produce garbage
By design. The card lists "Percussion and drum sounds are outside the scope of this release." Use a different tool (e.g. a drum sample library or a percussion-specific model) for drum loops.
No widely-reported issues on RX 7800 XT specifically — if you hit one, report it via the submission form.