Changelog
Release 1.4
Released October 3, 2025Noema is fully free with smarter performance tools
Release 1.4 makes every feature of Noema completely free while introducing FlashAttention, V cache quantization, MoE versus dense model detection, benchmarking, refreshed UI, and an updated llama.cpp core.
Highlights
- Noema is now completely free—unlimited web search and every feature are included with no upsell.
- FlashAttention and V cache quantization dramatically cut memory usage so larger models fit on more devices.
- Model catalog now distinguishes Mixture-of-Experts and dense architectures for clearer deployment decisions.
- Benchmark any model or optimization setup to compare prompt processing speed and token throughput.
- Revamped input fields and remote endpoint forms deliver a clearer, more modern interface.
- Updated llama.cpp underpinnings keep compatibility with the latest upstream improvements.
Past releases
Catch up on earlier Noema updates and explore the releases that paved the way for today's improvements.