Data Source Capabilities

An overview of all data source capabilities, what each one means, and how the platform uses them when searching, browsing the latest news, or working with comments.

Data Source Capabilities

Every data source exposes a set ofcapabilitiesthat describe what it can do. The platform uses these capabilities to decide which data sources are available for a given operation — for example, only sources with theLatestcapability appear in the "latest news" feed, and only sources with at least one search capability can be queried in the search module.

Capabilities are displayed as pills on the data source list and in the search interface.

Search Capabilities

There are multiple search capabilities because different data sources use different underlying search technologies. A data source can support more than one.

Vector Search uses AI-generated embeddings to find documents by meaning rather than exact keywords. When you search with a phrase or question, the platform converts it to a vector and retrieves the documents whose content is semantically closest.

When it applies:Data sources backed by an internal Enhanced Data Store index that has been processed with an embedding model. SeeEnhanced Data Storesfor details.

Best for:Finding conceptually related content even when the exact words are not present in the document.

Full-Text Search matches documents that contain the exact keywords (or close variants) of your query. It is the classic keyword-based search.

When it applies:Data sources that expose a full-text index, such as uploaded file stores and certain pipeline-based sources.

Best for:Finding documents that contain a precise term, product name, or quoted phrase.

API Search sends your query to the data source's own search API and returns whatever that API returns. The platform acts as a relay — it passes the query through and displays the results.

When it applies:Sources such as Google Custom Search or Telex, which have their own search endpoints.

Best for:Leveraging the native search quality of a specific service.

Web Search uses a general-purpose web search engine (such as Google) to find pages on a specific domain or across the web.

When it applies:Google Site data sources and similar web-crawling types.

Best for:Broad discovery of pages on a website or across the internet when no structured API is available.

Entity Search filters documents by the entities they mention — people, organizations, places, and other named entities. Instead of a keyword query, you search for a specific entity and receive documents where that entity is present.

When it applies:Sources with entity extraction enabled.

Best for:Tracking coverage of a specific company, person, or location across multiple sources.

Content Capabilities

Latest

TheLatestcapability means the data source can return its most recently published documents without a search query. The platform polls or fetches from the source on a schedule and makes the results available as a live feed.

When it applies:RSS and Atom feeds, social media channels (YouTube, Reddit, TikTok, Instagram, Facebook), and pipeline sources set up to ingest recent articles.

Best for:Monitoring a source for new content over time, powering "latest news" tiles on dashboards, and triggering alerts.

Sources with this capability show aPreviewbutton on theNews → Data Sourcespage, letting you check the most recent results at any time.

Comments

TheCommentscapability means the data source can retrieve reader or viewer comments attached to its documents. When you open an article in the News module, the platform uses this capability to fetch comments from the original platform (such as YouTube, Facebook, or Reddit).

When it applies:Social media data sources such as YouTube channels, Facebook pages and profiles, and Reddit subreddits and users.

Best for:Social Analytics — understanding audience reactions, sentiment, and engagement around specific content.

Processing Capabilities

Audio Transcription

Audio Transcription means the data source can convert spoken audio (from podcasts, videos, or audio files) into searchable text. Documents processed with this capability have their spoken content indexed so it can be searched and analyzed like regular text.

When it applies:Enhanced Data Stores configured with an audio transcription transform, and data sources pointing to content with audio tracks.

Best for:Making podcast episodes and video content fully searchable without manual effort.

Optical Character Recognition (OCR)

OCR means the data source can extract text from images and scanned PDF documents. Files that are images or non-text PDFs are processed and the recognized text is indexed.

When it applies:Enhanced Data Stores configured with an OCR transform (such as Azure Document Intelligence).

Best for:Indexing scanned documents, slides, or other image-based content.

Other Capabilities

User-Defined

TheUser-Definedcapability marks data sources that are custom-built by your organization rather than provided by the platform. These sources are fully controlled by your team.

When it applies:Any data source you create yourself on theNews → Data Sourcespage.