API · Filings
Semantic content search (RAG)
Semantic search across the full text of every filing.
POST/api/v1/data/filings/content-search
Returns the most relevant passages from filings ranked by vector similarity
to the query. Each result includes raw text, source filing (docid), and a
page_id you can resolve to cite the original PDF or EDGAR page.
When to use this
- "What does Neste say about renewable-diesel capacity in 2024?"
- "Find any mention of supply-chain risk in Nokia's 2023 annual report."
- "Show me Apple's dividend policy language."
This is the right tool whenever the answer lives in prose rather than in a financial line item.
Body
| Field | Type | Description |
|---|---|---|
| queryrequired | string | Natural-language search query, in any major language. |
| company | string | Optionally restrict to a specific company (name or ticker). |
| year | integer | Optionally restrict to a specific year. |
| document_type | string | Optionally filter by type. |
| limit | integer default: 5 | Number of passages to return (1–20). |
Example
curl
curl https://api.clarifo.com/api/v1/data/filings/content-search \
-H "Authorization: Bearer $CLARIFO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "renewable diesel production capacity 2024",
"company": "Neste",
"document_type": "annual_report",
"limit": 3
}'Response
json
{
"results": [
{
"company_name": "Neste Oyj",
"year": "2023",
"document_type": "annual_report",
"content": "Renewable Products segment produced 3.3 million tonnes…",
"content_title": "Renewable Products",
"docid": "neste_2023_annual",
"page_id": "722df34e9f191a97_20",
"source_url": "https://…",
"score": 0.8921
}
],
"total": 1,
"query": "renewable diesel production capacity 2024"
}Tips
- The query is natural language in any major language. A Finnish passage is retrievable with an English query and vice versa.
- Cap
limitto what your downstream LLM context can handle. Five passages is usually enough. - Combine with the filings search endpoint when you need a deterministic scope.