GGUF
Best default and the most configurable format.
- Context length
- Keep in memory
- GPU offload layers
- MoE expert count when applicable
- CPU threads, KV-cache offload,
mmap, warmup skip, and seed - K/V cache quantization and Flash Attention
- The richest sampling controls, including temperature, top-p, top-k, min-p, and repetition penalty
