Noema
    Noema

    Downloading New Models

    Expand your AI capabilities by downloading additional language models from Hugging Face and other sources.

    Devices with chips older than A13 Bionic have limited GPU offload, which slows down GGUF model inference and prevents MLX support. For these devices, we recommend downloading small language models (SLMs) for better performance.

    Accessing the Model Library

    Built-in Model Browser

    1. 1. Open the "Models" tab in Noema
    2. 2. Select the appropriate format
    3. 3. Browse the curated model collection
    4. 4. Models are pre-filtered for compatibility

    Understanding Quantization

    QuantizationQualitySize ReductionBest For
    Q8_0Highest~25%High-end devices, best quality
    Q5_K_MVery Good~50%Balanced performance
    Q4_K_MGood~65%Most devices, recommended
    Q3_K_MAcceptable~75%Older devices, limited RAM

    Download Process

    Step-by-Step

    1. 1. Select Model: Choose from the browser or search results
    2. 2. Choose Quantization: Pick the best size/quality balance for your device
    3. 3. Review Details: Check model size, requirements, and description
    4. 4. Start Download: Tap "Download" to begin the process
    5. 5. Monitor Progress: Watch the download progress in the background
    6. 6. Auto-Install: Model becomes available once download completes

    Managing Downloaded Models

    Model Management

    • Model Info: View details, size, and performance specs
    • Delete Models: Remove unused models to free space

    Troubleshooting

    Potential Issues

    • Download fails: Check internet connection and available storage
    • Model won't load: Ensure sufficient RAM and close background apps
    • Slow performance: Try a smaller or more quantized model
    • Missing model: Verify download completed successfully