Common Issues
Solutions for frequent problems with Yule
Vulkan Not Found
Symptom: yule run --backend vulkan fails with a Vulkan initialization error, or --backend auto silently falls back to CPU.
Causes:
- No Vulkan-capable GPU driver installed
- Missing Vulkan ICD (Installable Client Driver) loader
- Running inside a VM without GPU passthrough
Fixes:
- Install your GPU vendor's latest driver (NVIDIA, AMD, or Intel)
- Verify Vulkan works: run
vulkaninfo(from the Vulkan SDK orvulkan-toolspackage) - On Linux, ensure
libvulkan.sois onLD_LIBRARY_PATH - On Windows, Vulkan comes with GPU drivers — update them
- If no discrete GPU is available, use
--backend cpu
Model Too Large for VRAM
Symptom: Out-of-memory error during model loading with the Vulkan backend.
Fixes:
- Use a smaller quantization: Q4_0 (~0.5 GB/B params) vs Q6_K (~0.75 GB/B params)
- Fall back to CPU:
--backend cpu— uses system RAM instead of VRAM - Check available VRAM:
nvidia-smi(NVIDIA) orrocm-smi(AMD)
Rule of thumb: A 7B model at Q4_K_M needs ~4.5 GB VRAM. A 13B model needs ~8 GB.
Sandbox Blocks GPU
Symptom: Vulkan backend fails when sandboxing is active, but works with --no-sandbox.
Cause: The sandbox restricts access to GPU device nodes by default. yule run enables GPU access automatically when --backend is not cpu, but yule serve disables GPU access since it's an API server.
Fixes:
- For
yule run: this should work automatically. If not, file a bug. - For
yule serve: GPU inference in the API server is not yet supported — the server runs CPU inference. Useyule runfor GPU-accelerated inference. - As a last resort:
--no-sandboxdisables all sandboxing (not recommended for untrusted models).
Authentication Errors (401 Unauthorized)
Symptom: API requests return 401 Unauthorized.
Common mistakes:
- Missing
Bearerprefix: The header must beAuthorization: Bearer yule_..., not justAuthorization: yule_... - Wrong token: Each
yule servesession generates a new token (printed to stderr on startup). Tokens from previous sessions are invalid. - Extra whitespace or newlines in the token string
Verify:
# Copy the token exactly from the serve output
TOKEN="yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc"
curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/healthModel Parse Errors
Symptom: PARSE FAILED or invalid magic errors when loading a model.
Causes:
- File is not a valid GGUF file (wrong format, corrupt download, or truncated)
- Unsupported GGUF version (Yule supports v3)
- File is a safetensors/PyTorch model (not yet supported)
Fixes:
- Verify the file:
yule verify ./model.gguf— if this fails, the file is corrupt - Re-download the model:
yule pull publisher/repo - Ensure the file is GGUF format (
.ggufextension, downloaded from a GGUF-tagged repo) - Check file size — if it's much smaller than expected, the download may have been truncated
Slow Performance
Symptom: Low tok/s during inference.
Checklist:
- Check which backend is active: Look at the startup output for
backend: vulkanvsbackend: cpu. Vulkan gives ~8.5x speedup on quantized models. - Model fits in VRAM? If the model spills to system RAM, GPU inference stalls. Use a smaller quant.
- Build in release mode:
cargo build --release— debug builds are 10-50x slower. - AVX2 support: On x86_64, Yule auto-detects AVX2 for SIMD acceleration. Older CPUs without AVX2 use scalar fallback (~3-5x slower).
- Transparent Huge Pages (Linux): Check
cat /sys/kernel/mm/transparent_hugepage/enabled—[always]or[madvise]is ideal. This reduces TLB misses for large model files.
SSE Streaming Not Working
Symptom: Streaming chat requests hang or return the entire response at once.
Causes:
- Client doesn't handle
text/event-streamcontent type - Proxy or reverse proxy buffering SSE events
- Firewall closing long-lived connections
Fixes:
- Set
stream: truein the request body - If behind nginx: add
proxy_buffering off;andproxy_cache off;to the location block - Test directly without a proxy first:
curl -N -H "Authorization: Bearer $TOKEN" -d '{"messages":[{"role":"user","content":"hi"}],"stream":true}' http://localhost:11434/yule/chat
Audit Log Issues
Symptom: yule audit shows no records, or --verify-chain reports BROKEN.
Causes:
- Audit log hasn't been created yet (no inference has been run through the API server)
- Log file was manually edited or truncated
- Multiple Yule instances writing to the same log
Location: The audit log lives at ~/.yule/audit.jsonl.
Fixes:
- Run at least one inference through
yule serveto create the first record - If chain is broken: the log was tampered with. Back up the file and start fresh if needed.
- Don't run multiple
yule serveinstances simultaneously — they'll interleave writes and break the chain.
macOS-Specific Issues
Seatbelt "operation not permitted"
The macOS sandbox uses sandbox_init() which is permanent once applied. If you see permission errors after sandbox activation, the SBPL profile may be too restrictive for your use case. Use --no-sandbox as a workaround and file a bug with the specific error.
Building on Apple Silicon
Yule's AVX2 SIMD kernels and assembly kernel are x86_64-only. On Apple Silicon (M1/M2/M3/M4), these are skipped automatically and scalar fallback is used for CPU inference. GPU acceleration via Metal is planned but not yet implemented.