Noema - Local AI Assistant

Fine-tune your AI model behavior with comprehensive settings and customization options.

Basic Model Configuration

Model Selection

• Choose from downloaded models
• Switch models per conversation
• View model specifications and capabilities
• Set default models for different tasks

Devices with chips older than A13 Bionic have limited GPU acceleration and cannot run MLX, which slows down GGUF model inference. For these devices, choose small language models (SLMs) and conservative settings.

Generation Parameters

Temperature

Controls randomness and creativity in responses.

• 0.1-0.3: Very focused, deterministic responses
• 0.4-0.7: Balanced creativity and consistency
• 0.8-1.0: High creativity, more varied responses
• 1.0+: Maximum creativity, chaotic responses

Advanced Settings

Context Management

• Context window size
• Context overflow handling
• Memory optimization
• Conversation summarization

Performance Tuning

• Thread count adjustment
• GPU acceleration settings
• Memory usage limits
• Batch size optimization

💡 Pro Tips

• Start with default settings and adjust incrementally
• Lower temperature for factual queries, higher for creative tasks
• Save custom presets for different use cases
• Monitor performance impact of advanced settings
• Use shorter max tokens for faster responses on slower devices

Documentation

Model Settings