Security

Process Sandbox

When yule serve or yule run starts (without --no-sandbox), the process is placed in a platform-specific sandbox before the model is loaded. Even parsing and weight loading run inside the sandbox.

Windows — Job Objects

Memory limit — 32GB default via RLIMIT_AS simulation, prevents runaway allocations
No child spawning — ActiveProcessLimit = 1, the process can't fork or exec
Kill-on-close — if the Job Object handle is closed (crash, parent exit), the process is terminated
UI restrictions — clipboard, desktop switching, display settings, global atoms, user handles, system parameters all blocked
RAII cleanup — SandboxGuard calls CloseHandle on drop

Linux — seccomp-BPF + Landlock + rlimit

Three layers applied in order:

rlimit — RLIMIT_AS caps virtual memory at 32GB
Landlock (kernel 5.13+) — filesystem restriction:
- Model file: read-only
- /dev/dri, /dev/nvidia*: read + ioctl (only if --backend vulkan)
- /usr/lib, /lib, /proc/self: read-only (dynamic linker, shared libraries)
- Everything else: denied
- Graceful degradation on older kernels
seccomp-BPF — syscall allowlist (~60 base syscalls):
- Memory management, file I/O, threads, signals, time, epoll
- Networking syscalls (socket, bind, listen, accept4, etc.) only if allow_network
- ioctl only if allow_gpu (Vulkan/DRM driver communication)
- Default action: EPERM for unlisted syscalls (debuggable, not SIGKILL)

macOS — Seatbelt + rlimit

rlimit — RLIMIT_AS caps virtual memory at 32GB
Seatbelt — dynamically-built SBPL profile via sandbox_init():
- (deny default) — everything denied unless explicitly allowed
- Model file: read-only
- System libraries (/usr/lib, /System/Library, /Library/Apple): read-only
- /dev/urandom, /dev/random: read-only (CSPRNG)
- GPU (iokit-open, /Library/GPUBundles): only if allow_gpu
- Networking (network-outbound, network-inbound, network-bind): only if allow_network
- Profile is permanent once applied — cannot be undone

Sandbox Configuration

The sandbox adapts based on the command:

	`yule run`	`yule serve`
Network	Denied	Allowed (API server)
GPU	Allowed if `--backend` != `cpu`	Denied
Memory	32GB cap	32GB cap

Future Work

The current sandbox is in-process (Phase A). Phase B will implement a broker-target architecture:

Broker (main process) — parses CLI args, validates model, spawns target
Target (child process) — receives model file descriptor via IPC, runs inference, returns tokens
Privilege separation — broker holds no model data, target has no network access

Merkle Verification

At model load time, Yule builds a blake3 Merkle tree over all tensor data:

The tensor payload (everything after the GGUF header) is split into 1MB chunks
Each chunk is hashed with blake3
Leaf hashes are combined into a binary Merkle tree
The 256-bit root hash is stored in memory

This root hash appears in every /yule/chat response under integrity.model_merkle_root. You can verify it matches the hash from yule verify:

# on disk
yule verify ./model.gguf
# → merkle root: ffc7e1fd6016a6f9...

# from the API
curl -H "Authorization: Bearer $TOKEN" http://localhost:11434/yule/model
# → "merkle_root": "ffc7e1fd6016a6f9..."

If someone swaps a tensor in the model file, the Merkle root changes. If the API returns a different root than what you verified, the model has been tampered with.

Authentication

The API uses blake3-derived capability tokens:

On startup, 32 bytes of OS entropy are collected via getrandom
Token derivation: blake3(master_secret || counter || timestamp), truncated to 24 bytes, hex-encoded with yule_ prefix
Only the blake3 hash of the token is stored — the server never keeps plaintext tokens in memory after generation
Verification: hash the provided token, compare against stored hashes

Tokens look like: yule_b49913e2c05162951af4f87d62c2c9a6555eb91299c7fdcc

You can also provide your own token with --token, in which case its hash is stored the same way.

Security

On this page