Common Issues

Vulkan Not Found

Symptom: yule run --backend vulkan fails with a Vulkan initialization error, or --backend auto silently falls back to CPU.

Causes:

No Vulkan-capable GPU driver installed
Missing Vulkan ICD (Installable Client Driver) loader
Running inside a VM without GPU passthrough

Fixes:

Install your GPU vendor's latest driver (NVIDIA, AMD, or Intel)
Verify Vulkan works: run vulkaninfo (from the Vulkan SDK or vulkan-tools package)
On Linux, ensure libvulkan.so is on LD_LIBRARY_PATH
On Windows, Vulkan comes with GPU drivers — update them
If no discrete GPU is available, use --backend cpu

Model Too Large for VRAM

Symptom: Out-of-memory error during model loading with the Vulkan backend.

Fixes:

Use a smaller quantization: Q4_0 (~0.5 GB/B params) vs Q6_K (~0.75 GB/B params)
Fall back to CPU: --backend cpu — uses system RAM instead of VRAM
Check available VRAM: nvidia-smi (NVIDIA) or rocm-smi (AMD)

Rule of thumb: A 7B model at Q4_K_M needs ~4.5 GB VRAM. A 13B model needs ~8 GB.

Sandbox Blocks GPU

Symptom: Vulkan backend fails when sandboxing is active, but works with --no-sandbox.

Cause: The sandbox restricts access to GPU device nodes by default. yule run enables GPU access automatically when --backend is not cpu, but yule serve disables GPU access since it's an API server.

Fixes:

For yule run: this should work automatically. If not, file a bug.
For yule serve: GPU inference in the API server is not yet supported — the server runs CPU inference. Use yule run for GPU-accelerated inference.
As a last resort: --no-sandbox disables all sandboxing (not recommended for untrusted models).

Authentication Errors (401 Unauthorized)

Symptom: API requests return 401 Unauthorized.

Common mistakes:

Missing Bearer prefix: The header must be Authorization: Bearer yule_..., not just Authorization: yule_...
Wrong token: Each yule serve session generates a new token (printed to stderr on startup). Tokens from previous sessions are invalid.
Extra whitespace or newlines in the token string

Verify:

# Copy the token exactly from the serve output
TOKEN="yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc"
curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/health

Model Parse Errors

Symptom: PARSE FAILED or invalid magic errors when loading a model.

Causes:

File is not a valid GGUF file (wrong format, corrupt download, or truncated)
Unsupported GGUF version (Yule supports v3)
File is a safetensors/PyTorch model (not yet supported)

Fixes:

Verify the file: yule verify ./model.gguf — if this fails, the file is corrupt
Re-download the model: yule pull publisher/repo
Ensure the file is GGUF format (.gguf extension, downloaded from a GGUF-tagged repo)
Check file size — if it's much smaller than expected, the download may have been truncated

Slow Performance

Symptom: Low tok/s during inference.

Checklist:

Check which backend is active: Look at the startup output for backend: vulkan vs backend: cpu. Vulkan gives ~8.5x speedup on quantized models.
Model fits in VRAM? If the model spills to system RAM, GPU inference stalls. Use a smaller quant.
Build in release mode: cargo build --release — debug builds are 10-50x slower.
AVX2 support: On x86_64, Yule auto-detects AVX2 for SIMD acceleration. Older CPUs without AVX2 use scalar fallback (~3-5x slower).
Transparent Huge Pages (Linux): Check cat /sys/kernel/mm/transparent_hugepage/enabled — [always] or [madvise] is ideal. This reduces TLB misses for large model files.

SSE Streaming Not Working

Symptom: Streaming chat requests hang or return the entire response at once.

Causes:

Client doesn't handle text/event-stream content type
Proxy or reverse proxy buffering SSE events
Firewall closing long-lived connections

Fixes:

Set stream: true in the request body
If behind nginx: add proxy_buffering off; and proxy_cache off; to the location block
Test directly without a proxy first: curl -N -H "Authorization: Bearer $TOKEN" -d '{"messages":[{"role":"user","content":"hi"}],"stream":true}' http://localhost:11434/yule/chat

Audit Log Issues

Symptom: yule audit shows no records, or --verify-chain reports BROKEN.

Causes:

Audit log hasn't been created yet (no inference has been run through the API server)
Log file was manually edited or truncated
Multiple Yule instances writing to the same log

Location: The audit log lives at ~/.yule/audit.jsonl.

Fixes:

Run at least one inference through yule serve to create the first record
If chain is broken: the log was tampered with. Back up the file and start fresh if needed.
Don't run multiple yule serve instances simultaneously — they'll interleave writes and break the chain.

macOS-Specific Issues

Seatbelt "operation not permitted"

The macOS sandbox uses sandbox_init() which is permanent once applied. If you see permission errors after sandbox activation, the SBPL profile may be too restrictive for your use case. Use --no-sandbox as a workaround and file a bug with the specific error.

Building on Apple Silicon

Yule's AVX2 SIMD kernels and assembly kernel are x86_64-only. On Apple Silicon (M1/M2/M3/M4), these are skipped automatically and scalar fallback is used for CPU inference. GPU acceleration via Metal is planned but not yet implemented.

Common Issues

On this page