CLI
yule pull / list
Download models from HuggingFace and manage the local cache
yule pull
Download a GGUF model from HuggingFace and store it in the local cache.
yule pull <model>Arguments
| Argument | Description |
|---|---|
model | HuggingFace model reference (e.g. bartowski/Llama-3.2-1B-Instruct-GGUF) |
Options
| Flag | Default | Description |
|---|---|---|
--verify | true | Compute Merkle root after download |
Authentication
For gated models, set the HF_TOKEN environment variable:
export HF_TOKEN=hf_your_token_here
yule pull meta-llama/Llama-3.2-1B-Instruct-GGUFExample
yule pull bartowski/Llama-3.2-1B-Instruct-GGUFdownloading: Llama-3.2-1B-Instruct-Q4_K_M.gguf
[========================================] 100% (1.24 GB)
model: bartowski/Llama-3.2-1B-Instruct-GGUF
file: Llama-3.2-1B-Instruct-Q4_K_M.gguf
size: 1.24 GB
merkle root: a3f8c1e9d0...
path: ~/.yule/models/bartowski/Llama-3.2-1B-Instruct-GGUF/Llama-3.2-1B-Instruct-Q4_K_M.ggufCache Location
Models are stored at ~/.yule/models/{publisher}/{repo}/. Once pulled, yule run and yule serve can reference models by their registry name instead of a file path:
# these are equivalent after pulling
yule run bartowski/Llama-3.2-1B-Instruct-GGUF --prompt "Hello"
yule run ~/.yule/models/bartowski/Llama-3.2-1B-Instruct-GGUF/Llama-3.2-1B-Instruct-Q4_K_M.gguf --prompt "Hello"yule list
Show all models in the local cache.
yule listExample
cached models:
bartowski/Llama-3.2-1B-Instruct-GGUF
file: Llama-3.2-1B-Instruct-Q4_K_M.gguf
size: 1.24 GB
status: verified
merkle: a3f8c1e9d0b2...
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
file: tinyllama-1.1b-chat-v1.0.Q4_0.gguf
size: 637.81 MB
status: verified
merkle: ffc7e1fd6016...If no models are cached:
no cached models
pull one with: yule pull bartowski/Llama-3.2-1B-Instruct-GGUF