Changelog

    Release 1.4

    Released October 3, 2025

    Noema is fully free with smarter performance tools

    Release 1.4 makes every feature of Noema completely free while introducing FlashAttention, V cache quantization, MoE versus dense model detection, benchmarking, refreshed UI, and an updated llama.cpp core.

    Highlights

    • Noema is now completely free—unlimited web search and every feature are included with no upsell.
    • FlashAttention and V cache quantization dramatically cut memory usage so larger models fit on more devices.
    • Model catalog now distinguishes Mixture-of-Experts and dense architectures for clearer deployment decisions.
    • Benchmark any model or optimization setup to compare prompt processing speed and token throughput.
    • Revamped input fields and remote endpoint forms deliver a clearer, more modern interface.
    • Updated llama.cpp underpinnings keep compatibility with the latest upstream improvements.

    Past releases

    Catch up on earlier Noema updates and explore the releases that paved the way for today's improvements.