Dataset Integration
Bring your own knowledge into Noema with Retrieval-Augmented Generation (RAG). This guide explains how to prepare sources, ingest them, and keep everything fresh so your assistant always answers with your latest context.
How RAG enhances your chats
Retrieval-Augmented Generation lets Noema search your indexed documents whenever you ask a question. Relevant snippets are appended to the model prompt so answers stay grounded in your own material instead of generic training data.
File types that index cleanly
Documents
- PDF, Markdown, and TXT notes
- Word documents and lecture decks exported as PDF
- Research papers or briefs
Structured data
- CSV tables for metrics, logs, or glossaries
- JSON exports from knowledge bases
- Curated text bundles from other tools
Ingest datasets in two ways
Local files
- Open Explore and press the Import icon.
- Select Import from Files and pick documents from your device or iCloud.
- Keep Noema active while the importer chunks and embeds each page.
- Enable the dataset in chat settings to make it available to every conversation.
Open Textbook Library
- Head to the Explore tab and browse curated academic titles.
- Add the books that match your domain; they index in the background.
- Review the summary once processing finishes to confirm availability.
The three stages of retrieval
Document processing
Files are chunked into passages and converted into embeddings so Noema can search semantically instead of relying on keywords.
Query matching
Each prompt generates its own embedding. The system compares it against your dataset vectors to surface the closest matches.
Context injection
Relevant excerpts are appended to the model input so responses cite the exact sections that informed them.
Best practices
- Keep datasets focused on a single subject for sharp retrieval results.
- Name files clearly so you can trace answers back to the right source.
- Break massive PDFs into chapters to speed up processing.
- Refresh datasets as your material changes to avoid stale context.
- Periodically run sample questions to confirm the citations look correct.
