Downloading New Models
Expand your AI capabilities by downloading additional language models from Hugging Face and other sources.
Devices with chips older than A13 Bionic have limited GPU offload, which slows down GGUF model inference and prevents MLX support. For these devices, we recommend downloading small language models (SLMs) for better performance.
Accessing the Model Library
Built-in Model Browser
- 1. Open the "Models" tab in Noema
- 2. Select the appropriate format
- 3. Browse the curated model collection
- 4. Models are pre-filtered for compatibility
Understanding Quantization
Quantization | Quality | Size Reduction | Best For |
---|---|---|---|
Q8_0 | Highest | ~25% | High-end devices, best quality |
Q5_K_M | Very Good | ~50% | Balanced performance |
Q4_K_M | Good | ~65% | Most devices, recommended |
Q3_K_M | Acceptable | ~75% | Older devices, limited RAM |
Download Process
Step-by-Step
- 1. Select Model: Choose from the browser or search results
- 2. Choose Quantization: Pick the best size/quality balance for your device
- 3. Review Details: Check model size, requirements, and description
- 4. Start Download: Tap "Download" to begin the process
- 5. Monitor Progress: Watch the download progress in the background
- 6. Auto-Install: Model becomes available once download completes
Managing Downloaded Models
Model Management
- • Model Info: View details, size, and performance specs
- • Delete Models: Remove unused models to free space
Troubleshooting
Potential Issues
- • Download fails: Check internet connection and available storage
- • Model won't load: Ensure sufficient RAM and close background apps
- • Slow performance: Try a smaller or more quantized model
- • Missing model: Verify download completed successfully