YuleYule
Troubleshooting

Common Issues

Solutions for frequent problems with Yule

Vulkan Not Found

Symptom: yule run --backend vulkan fails with a Vulkan initialization error, or --backend auto silently falls back to CPU.

Causes:

  • No Vulkan-capable GPU driver installed
  • Missing Vulkan ICD (Installable Client Driver) loader
  • Running inside a VM without GPU passthrough

Fixes:

  1. Install your GPU vendor's latest driver (NVIDIA, AMD, or Intel)
  2. Verify Vulkan works: run vulkaninfo (from the Vulkan SDK or vulkan-tools package)
  3. On Linux, ensure libvulkan.so is on LD_LIBRARY_PATH
  4. On Windows, Vulkan comes with GPU drivers — update them
  5. If no discrete GPU is available, use --backend cpu

Model Too Large for VRAM

Symptom: Out-of-memory error during model loading with the Vulkan backend.

Fixes:

  1. Use a smaller quantization: Q4_0 (~0.5 GB/B params) vs Q6_K (~0.75 GB/B params)
  2. Fall back to CPU: --backend cpu — uses system RAM instead of VRAM
  3. Check available VRAM: nvidia-smi (NVIDIA) or rocm-smi (AMD)

Rule of thumb: A 7B model at Q4_K_M needs ~4.5 GB VRAM. A 13B model needs ~8 GB.


Sandbox Blocks GPU

Symptom: Vulkan backend fails when sandboxing is active, but works with --no-sandbox.

Cause: The sandbox restricts access to GPU device nodes by default. yule run enables GPU access automatically when --backend is not cpu, but yule serve disables GPU access since it's an API server.

Fixes:

  1. For yule run: this should work automatically. If not, file a bug.
  2. For yule serve: GPU inference in the API server is not yet supported — the server runs CPU inference. Use yule run for GPU-accelerated inference.
  3. As a last resort: --no-sandbox disables all sandboxing (not recommended for untrusted models).

Authentication Errors (401 Unauthorized)

Symptom: API requests return 401 Unauthorized.

Common mistakes:

  1. Missing Bearer prefix: The header must be Authorization: Bearer yule_..., not just Authorization: yule_...
  2. Wrong token: Each yule serve session generates a new token (printed to stderr on startup). Tokens from previous sessions are invalid.
  3. Extra whitespace or newlines in the token string

Verify:

# Copy the token exactly from the serve output
TOKEN="yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc"
curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/health

Model Parse Errors

Symptom: PARSE FAILED or invalid magic errors when loading a model.

Causes:

  • File is not a valid GGUF file (wrong format, corrupt download, or truncated)
  • Unsupported GGUF version (Yule supports v3)
  • File is a safetensors/PyTorch model (not yet supported)

Fixes:

  1. Verify the file: yule verify ./model.gguf — if this fails, the file is corrupt
  2. Re-download the model: yule pull publisher/repo
  3. Ensure the file is GGUF format (.gguf extension, downloaded from a GGUF-tagged repo)
  4. Check file size — if it's much smaller than expected, the download may have been truncated

Slow Performance

Symptom: Low tok/s during inference.

Checklist:

  1. Check which backend is active: Look at the startup output for backend: vulkan vs backend: cpu. Vulkan gives ~8.5x speedup on quantized models.
  2. Model fits in VRAM? If the model spills to system RAM, GPU inference stalls. Use a smaller quant.
  3. Build in release mode: cargo build --release — debug builds are 10-50x slower.
  4. AVX2 support: On x86_64, Yule auto-detects AVX2 for SIMD acceleration. Older CPUs without AVX2 use scalar fallback (~3-5x slower).
  5. Transparent Huge Pages (Linux): Check cat /sys/kernel/mm/transparent_hugepage/enabled[always] or [madvise] is ideal. This reduces TLB misses for large model files.

SSE Streaming Not Working

Symptom: Streaming chat requests hang or return the entire response at once.

Causes:

  • Client doesn't handle text/event-stream content type
  • Proxy or reverse proxy buffering SSE events
  • Firewall closing long-lived connections

Fixes:

  1. Set stream: true in the request body
  2. If behind nginx: add proxy_buffering off; and proxy_cache off; to the location block
  3. Test directly without a proxy first: curl -N -H "Authorization: Bearer $TOKEN" -d '{"messages":[{"role":"user","content":"hi"}],"stream":true}' http://localhost:11434/yule/chat

Audit Log Issues

Symptom: yule audit shows no records, or --verify-chain reports BROKEN.

Causes:

  • Audit log hasn't been created yet (no inference has been run through the API server)
  • Log file was manually edited or truncated
  • Multiple Yule instances writing to the same log

Location: The audit log lives at ~/.yule/audit.jsonl.

Fixes:

  1. Run at least one inference through yule serve to create the first record
  2. If chain is broken: the log was tampered with. Back up the file and start fresh if needed.
  3. Don't run multiple yule serve instances simultaneously — they'll interleave writes and break the chain.

macOS-Specific Issues

Seatbelt "operation not permitted"

The macOS sandbox uses sandbox_init() which is permanent once applied. If you see permission errors after sandbox activation, the SBPL profile may be too restrictive for your use case. Use --no-sandbox as a workaround and file a bug with the specific error.

Building on Apple Silicon

Yule's AVX2 SIMD kernels and assembly kernel are x86_64-only. On Apple Silicon (M1/M2/M3/M4), these are skipped automatically and scalar fallback is used for CPU inference. GPU acceleration via Metal is planned but not yet implemented.

On this page