Noema
    Noema
    Model Controls

    Model Settings

    Simple mode only exposes the basics. Advanced mode is where most tuning appears. In practice, GGUF is the most configurable format and AFM is the least.

    Most tunable
    GGUF

    Richest runtime and sampling control surface in Noema.

    Moderate
    MLX / CML

    Useful tuning remains, but backend behavior is more opinionated.

    Minimal
    ET

    Small on-device models with very few format-specific controls.

    Almost fixed
    AFM

    Built-in Apple model with fixed context and only light policy control.

    Mode split

    What Simple mode vs Advanced mode changes

    Simple mode

    Exposes the basics you are most likely to adjust quickly, such as temperature and the small set of format-specific settings Noema considers essential.

    Advanced mode

    Unlocks deeper backend behavior, richer sampling controls, and lower-level performance options. GGUF benefits the most from switching this on.

    Local formats

    Format-by-format settings reference

    GGUF

    Best default and the most configurable format.

    Highest tuning depth
    Simple mode
    • Context length
    • Keep in memory
    • GPU offload layers
    • MoE expert count when applicable
    Advanced mode
    • CPU threads, KV-cache offload, mmap, warmup skip, and seed
    • K/V cache quantization and Flash Attention
    • The richest sampling controls, including temperature, top-p, top-k, min-p, and repetition penalty

    MLX

    Best for Apple Silicon speed with fewer low-level knobs.

    Moderate tuning
    User-adjustable settings
    • Context length
    • Seed
    • Tokenizer override
    • Shared sampling controls
    Behavior

    MLX is comparatively opinionated. You still get practical sampling control, but not the same backend surface as GGUF.

    ET / ExecuTorch

    Best for responsive small on-device models with minimal setup.

    Minimal tuning
    User-adjustable settings
    • Context length
    • Seed
    • Tokenizer override
    • Temperature
    Behavior

    Backend and delegate selection are automatic in the current build. ET is intentionally closer to “works well by default” than to “tune every layer.”

    CML / Core ML

    Best for Apple-native on-device execution with Apple accelerators.

    Processing-unit focused
    User-adjustable settings
    • Processing unit: All, CPU Only, CPU + GPU, or CPU + Neural Engine
    • Tokenizer override
    • Lighter shared sampling controls
    Behavior

    Context length is effectively fixed from the model metadata or name, so the main format-specific choice is how the model uses Apple compute resources.

    AFM / Apple Foundation Model

    Best for the built-in Apple system model on supported devices.

    Lowest-friction setup
    User-adjustable settings
    • Guardrails: Default or Permissive Content Transformations
    Behavior

    Context is fixed at 4096. AFM is the least configurable format and is designed for the simplest system-integrated path.

    Shared controls

    Sampling settings you will see across models

    Shared sampling controls typically include temperature, top-p, top-k, min-p, and repetition penalty. GGUF exposes the richest set, MLX and remote-backed models expose a practical middle ground, and ET or AFM stay intentionally lighter.

    Remote-backed models

    Which remote types expose per-model settings

    LM Studio

    Exposes per-model remote settings including context length and standard sampling controls such as temperature, top-p, top-k, min-p, and repetition penalty.

    OpenRouter

    Also exposes per-model remote settings. Some models can additionally surface provider defaults, so Noema may start from the provider’s recommended behavior before you override it.