Enhanced Data Stores

What Enhanced Data Stores are, how they differ from online data sources, and when to use them to unlock vector search, OCR, audio transcription, and other advanced capabilities.

Enhanced Data Stores

AnEnhanced Data Storeis a private, indexed document store that you build and own. Unlike online data sources — which are live connections to external services — an Enhanced Data Store pulls content in, processes it, and keeps it in an internal index purpose-built for advanced search and analysis.

Navigate toNews → Data Sources → Enhanced Data Storesto manage your organization's Enhanced Data Stores.

What Is an Enhanced Data Store?

When you create an Enhanced Data Store, you define:

  1. A content source— where documents come from (uploaded files, or an existing data source)
  2. A transform— how each document is processed before indexing
  3. A splitter— how each document is divided into searchable chunks
  4. An index technology— the search engine that powers queries

The result is a fully searchable document collection with capabilities that far exceed what a live online connection can offer — including semantic (vector) search, OCR, and audio transcription.

Content Sources

User Files

You upload documents directly from your computer. Any file type that the platform can process — PDFs, Word documents, plain text files, audio files, video files — can be added. The store grows as you upload more files.

Best for:Private research libraries, internal reports, archived documents, and recordings you want to make searchable.

Latest News from a Data Source

The store automatically fetches the most recent articles from an existing online data source (one that supports theLatestcapability) and indexes them. You can set a cutoff date to control how far back to go.

Best for:Creating a deep, searchable archive of recent coverage from a specific news outlet or social channel.

Search Results from a Data Source

The store runs a specific keyword query against an existing data source and indexes everything it finds. You configure the search term when creating the store.

Best for:Building a focused collection of documents on a specific topic drawn from a larger data source.

Processing Pipeline

Every document that enters an Enhanced Data Store passes through a configurable processing pipeline.

Transforms

ATransformpre-processes each document before it is split and indexed. Choose the transform that matches your content type:

TransformWhat it does
DefaultAutomatic format detection and text extraction — the right choice for most situations
NoneUses the document in its original form without any processing
HTML Open Graph metadataExtracts Open Graph metadata (title, description, image) from HTML pages
HTML meta metadataExtracts standard<meta>tag data from HTML pages
HTML header metadataExtracts headings and structure from HTML pages
Markdown header metadataExtracts headings and structure from Markdown documents
VTT metadataExtracts metadata from WebVTT subtitle files
Extract audio from videoUses FFmpeg to extract the audio track from a video file (use before audio transcription)
Audio transcriptionConverts spoken audio into searchable text
Azure Document IntelligenceUses Microsoft Azure AI to extract text and structure from complex PDFs and scanned documents (OCR)
Remove whitespaceStrips excess spaces and blank lines to reduce token usage
HTML minificationCompresses HTML content
Data source transformApplies the transform defined by the underlying data source

Splitters

ASplitterdivides each processed document into chunks. Smaller, well-defined chunks improve search precision because the search engine can pinpoint the exact section of a long document that is relevant to a query.

SplitterHow it works
ParagraphsSplits on paragraph boundaries
Paragraph stacking(default)Groups paragraphs together with overlap so context is preserved across chunk boundaries
HierarchicalSplits by the document's heading structure, keeping each section together
HalvesRecursively splits the document in half until chunks are small enough
Whole documentTreats the entire document as a single chunk — best for short documents

Index Technology

The indexed chunks are stored in a vector database that enables semantic search.

TechnologyDescription
FlashTopHack's proprietary vector search engine — fast, cost-effective, and the recommended choice
Azure AI SearchMicrosoft Azure AI Search(coming soon)
OpenAI Vector StoreOpenAI's native vector store(coming soon)

You also choose anembedding modelduring setup. The embedding model converts text into vectors; the choice of model affects the quality and language coverage of semantic search results.

Why Use an Enhanced Data Store?

NeedHow an Enhanced Data Store helps
Search your own files semanticallyUpload files, apply transcription or OCR, and query by meaning
Make podcasts and videos searchableTranscribe audio content and index it for keyword and semantic search
Extract text from scanned documentsApply Azure Document Intelligence to index PDFs and images
Build a curated research archivePull content from a live data source into a private, reusable index
Search with higher precisionFine-tune chunking and transforms to match your content type

Relationship to Online Data Sources

An Enhanced Data Store appears as a regular data source in thesearch module— you can select it and query it just like any built-in or custom online source. The difference is that results come from your private index rather than a live external query.

When you set up an Enhanced Data Store that pulls from a live source, that source needs to support theLatestorSearchcapability. The Enhanced Data Store adds theVector SearchandFull-Text Searchcapabilities on top.

SeeAbout Data Sourcesfor an overview of how online sources and Enhanced Data Stores fit together.