Noema
    Noema

    Dataset Integration

    Bring your own knowledge into Noema with Retrieval-Augmented Generation (RAG). This guide explains how to prepare sources, ingest them, and keep everything fresh so your assistant always answers with your latest context.

    Concept overview

    How RAG enhances your chats

    Retrieval-Augmented Generation lets Noema search your indexed documents whenever you ask a question. Relevant snippets are appended to the model prompt so answers stay grounded in your own material instead of generic training data.

    Supported inputs

    File types that index cleanly

    Documents

    • PDF, Markdown, and TXT notes
    • Word documents and lecture decks exported as PDF
    • Research papers or briefs

    Structured data

    • CSV tables for metrics, logs, or glossaries
    • JSON exports from knowledge bases
    • Curated text bundles from other tools
    Adding sources

    Ingest datasets in two ways

    Local files

    1. Open Explore and press the Import icon.
    2. Select Import from Files and pick documents from your device or iCloud.
    3. Keep Noema active while the importer chunks and embeds each page.
    4. Enable the dataset in chat settings to make it available to every conversation.

    Open Textbook Library

    1. Head to the Explore tab and browse curated academic titles.
    2. Add the books that match your domain; they index in the background.
    3. Review the summary once processing finishes to confirm availability.
    What happens inside

    The three stages of retrieval

    Document processing

    Files are chunked into passages and converted into embeddings so Noema can search semantically instead of relying on keywords.

    Query matching

    Each prompt generates its own embedding. The system compares it against your dataset vectors to surface the closest matches.

    Context injection

    Relevant excerpts are appended to the model input so responses cite the exact sections that informed them.

    Best practices

    • Keep datasets focused on a single subject for sharp retrieval results.
    • Name files clearly so you can trace answers back to the right source.
    • Break massive PDFs into chapters to speed up processing.
    • Refresh datasets as your material changes to avoid stale context.
    • Periodically run sample questions to confirm the citations look correct.