Noema - Local AI Assistant

Expand your AI capabilities by downloading additional language models from Hugging Face and other sources.

Devices with chips older than A13 Bionic have limited GPU offload, which slows down GGUF model inference and prevents MLX support. For these devices, we recommend downloading small language models (SLMs) for better performance.

Accessing the Model Library

Built-in Model Browser

1. Open the "Models" tab in Noema
2. Select the appropriate format
3. Browse the curated model collection
4. Models are pre-filtered for compatibility

Understanding Quantization

Quantization	Quality	Size Reduction	Best For
Q8_0	Highest	~25%	High-end devices, best quality
Q5_K_M	Very Good	~50%	Balanced performance
Q4_K_M	Good	~65%	Most devices, recommended
Q3_K_M	Acceptable	~75%	Older devices, limited RAM

Download Process

Step-by-Step

1. Select Model: Choose from the browser or search results
2. Choose Quantization: Pick the best size/quality balance for your device
3. Review Details: Check model size, requirements, and description
4. Start Download: Tap "Download" to begin the process
5. Monitor Progress: Watch the download progress in the background
6. Auto-Install: Model becomes available once download completes

Managing Downloaded Models

Model Management

• Model Info: View details, size, and performance specs
• Delete Models: Remove unused models to free space

Troubleshooting

Potential Issues

• Download fails: Check internet connection and available storage
• Model won't load: Ensure sufficient RAM and close background apps
• Slow performance: Try a smaller or more quantized model
• Missing model: Verify download completed successfully

Documentation

Downloading New Models