Noema
    Noema
    Local Models

    Downloading New Models

    Noema supports five local model formats: GGUF, MLX, ET, CML, and AFM. They do not all install the same way, so the fastest path is to pick the format that matches your hardware and the amount of control you want.

    Most tunable
    GGUF

    Best default when you want portability, broad compatibility, and the richest tuning options.

    Apple speed
    MLX

    Best for fast local inference on Apple Silicon with fewer low-level knobs than GGUF.

    Minimal setup
    ET / CML

    Good for Apple-native and mobile-friendly deployments when you want predictable behavior over deep manual tuning.

    Almost fixed
    AFM

    Apple’s built-in Foundation Model for supported Apple Intelligence devices with the lowest-friction setup.

    Format guide

    Choose the right local format

    GGUF

    Best default and most portable.

    Broadest compatibility

    Pick GGUF when you want the widest hardware coverage and the most tuning. It is the format to reach for when you care about advanced performance controls, sampling precision, quantization choice, and runtime experimentation.

    Download path

    Open Explore, filter or search for a GGUF model, inspect quantization and size, then tap Download.

    MLX

    Best for Apple Silicon speed.

    Fast on Apple hardware

    Pick MLX when you want quick local inference on Apple hardware with less backend fiddling. Noema exposes the core settings you are most likely to adjust while the backend stays comparatively opinionated.

    Download path

    Browse Explore for MLX builds when you want Apple-optimized local downloads without the extra GGUF tuning surface.

    ET / ExecuTorch

    Best for responsive small on-device models.

    Runs well anywhere

    Pick ET when you want minimal setup and you are comfortable with fewer knobs. Delegate and backend selection are automatic in the current build, so the experience is intentionally lighter than GGUF or MLX.

    Download path

    Search Explore for ET variants when you want compact mobile-focused deployments with fast startup and fewer runtime choices.

    CML / Core ML

    Best for Apple-native execution on Apple accelerators.

    Apple-native runtime

    Pick CML when you want an Apple-first deployment path for iPhone, iPad, or visionOS and you care more about native execution than about maximum manual tuning. The main format-specific choice is the processing unit used at runtime.

    Download path

    Use Explore to find Core ML builds targeted at Apple devices, then choose the model that matches your supported runtime and storage budget.

    AFM / Apple Foundation Model

    Best for the system model on Apple Intelligence devices.

    Built into supported devices

    AFM is the lowest-friction option. It is Apple’s built-in system model, so it behaves more like enabling a local capability than downloading a conventional model package.

    Activation path

    There is typically no separate weight download in Noema. On supported Apple Intelligence devices, AFM appears as an available local model option once system requirements are satisfied.

    At a glance

    What changes between formats

    FormatBest forTuning level
    GGUF
    Portability, compatibility, and deep control.
    Most tunable
    MLX
    Fast Apple Silicon inference with moderate tuning.
    Moderate
    ET
    Small, responsive on-device models with minimal setup.
    Minimal
    CML
    Apple-native execution with processing-unit choice.
    Moderate
    AFM
    System-integrated Apple model on supported devices.
    Almost fixed
    Download flow

    From Explore to active model

    1. Open Explore and search or filter by the format you want.
    2. Read the format badge first, then compare storage size, compatibility notes, and capabilities.
    3. For downloadable formats like GGUF, MLX, ET, and CML, tap Download and wait until the status changes to ready.
    4. For AFM, confirm the device supports Apple Intelligence and then activate the built-in model when it appears in the library.
    5. Switch into chat and verify the selected model in the picker before starting a conversation.
    Troubleshooting

    Common format-specific issues

    GGUF or MLX feels slow

    Switch to a smaller model, lighter quantization, or shorter context length. GGUF gives you the widest tuning surface if you need to trade quality for speed.

    ET or CML options seem limited

    That is expected. These formats intentionally expose fewer runtime controls than GGUF so setup stays predictable on-device.

    AFM does not appear

    Verify the device supports Apple Intelligence features and that the built-in system model is available on that OS and hardware combination.

    Not sure which format to start with

    Start with GGUF for the broadest flexibility. Move to MLX when you want Apple Silicon speed, or AFM when you want the simplest Apple-native option.