Knowledge Sources
Deep dive into each knowledge source type, the processing pipeline, and webhook callbacks.
Source Types
URL Sources
Scrape and index any public web page.
{
"name": "Product Documentation",
"type": "url",
"url": "https://docs.example.com/getting-started"
}Processing: Fetches the page, extracts text content, chunks it, and generates embeddings.
Duplicate detection: If a URL has already been added to the Mind, the request will return 400 with code duplicate_source.
URLs are validated for SSRF protection. Private IP addresses, localhost, and internal domains are rejected.
Document Sources
Upload and index PDF, DOCX, TXT, or Markdown files.
Two-step process:
- Upload the file via the upload endpoint
- Create the knowledge source with the returned
fileUrl
# Step 1: Upload
curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge/upload" \
-H "Authorization: Bearer tr_live_your_api_key" \
-F "file=@/path/to/guide.pdf"
# Step 2: Create source
curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge" \
-H "Authorization: Bearer tr_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Product Guide",
"type": "document",
"fileUrl": "https://cdn.trigglio.com/minds/.../guide.pdf",
"fileName": "guide.pdf",
"mimeType": "application/pdf",
"fileSize": 1048576
}'Supported formats:
| Format | MIME Type | Max Size |
|---|---|---|
application/pdf | 50 MB | |
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document | 50 MB |
| TXT | text/plain | 50 MB |
| Markdown | text/markdown | 50 MB |
YouTube Sources
Index YouTube video transcripts automatically.
{
"name": "Product Demo Video",
"type": "youtube",
"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}Supported URL formats:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_ID
Processing: Extracts captions if available. If no captions exist, downloads and transcribes the audio using Groq Whisper. The status will transition through processing → transcribing → ready.
Podcast Sources
Upload and transcribe audio files.
Two-step process (same as documents):
- Upload the audio file
- Create a
podcastknowledge source with the returnedfileUrl
{
"name": "Episode 42: AI in Production",
"type": "podcast",
"fileUrl": "https://cdn.trigglio.com/minds/.../episode-42.mp3",
"fileName": "episode-42.mp3",
"mimeType": "audio/mpeg",
"fileSize": 52428800
}Supported audio formats: MP3, WAV, M4A, OGG (max 500 MB)
Processing Pipeline
Every knowledge source goes through this pipeline:
- Extraction — Content is fetched/parsed from the source
- Moderation — Content is checked for safety
- Chunking — Text is split into ~800 token chunks with 100 token overlap
- Embedding — Each chunk is converted to a vector embedding and stored
For audio sources (YouTube without captions, podcasts), there’s an additional transcription step using Groq Whisper before chunking.
Status Values
| Status | Description |
|---|---|
pending | Source created, waiting to be processed |
processing | Content is being extracted and chunked |
transcribing | Audio is being transcribed (YouTube/podcast) |
ready | Processing complete, source is searchable |
failed | Processing failed (check error field) |
Polling for Status
async function waitForReady(mindId, sourceId, apiKey) {
while (true) {
const res = await fetch(
`https://trigglio.com/api/v1/minds/${mindId}/knowledge/${sourceId}`,
{ headers: { 'Authorization': `Bearer ${apiKey}` } }
);
const { source } = await res.json();
if (source.status === 'ready') return source;
if (source.status === 'failed') throw new Error(source.error);
// Respect Retry-After header
const retryAfter = res.headers.get('Retry-After') || '5';
await new Promise(r => setTimeout(r, parseInt(retryAfter) * 1000));
}
}Webhook Callbacks
Instead of polling, provide a callbackUrl when creating a knowledge source. Trigglio will POST to your URL when processing finishes.
Setup
Include callbackUrl in your create request:
{
"name": "My Source",
"type": "url",
"url": "https://example.com",
"callbackUrl": "https://your-server.com/webhooks/trigglio"
}Callback URLs must use HTTPS and be publicly accessible. Private IP addresses are rejected.
Payload
{
"event": "knowledge.processed",
"sourceId": "src_abc123",
"mindId": "mind_xyz789",
"name": "My Source",
"type": "url",
"status": "ready",
"chunkCount": 42,
"error": null,
"timestamp": "2024-06-15T10:30:00.000Z"
}For failures, event will be knowledge.failed and error will contain the error message.
Signature Verification
Each callback includes an X-Trigglio-Signature header containing an HMAC-SHA256 signature. Verify it to ensure the request is authentic:
import { createHmac } from 'crypto';
function verifySignature(body, signature, secret) {
const expected = createHmac('sha256', secret)
.update(JSON.stringify(body))
.digest('hex');
return expected === signature;
}Retry Behavior
Callbacks are fire-and-forget with a 10-second timeout. If your server doesn’t respond in time, the callback is not retried. Use status polling as a fallback.
Limits
| Resource | Limit |
|---|---|
| Knowledge sources per Mind | Depends on plan (typically 10-50) |
| Document file size | 50 MB |
| Audio file size | 500 MB |
| Callback URL length | 2,048 characters |
Error Codes
| Code | Description |
|---|---|
validation_error | Missing or invalid field |
source_limit_reached | Too many sources on this Mind |
storage_limit_exceeded | Account storage quota exceeded |
duplicate_source | URL or video already added |
invalid_url | URL failed SSRF or format validation |
youtube_invalid_url | Could not extract YouTube video ID |
unsupported_document_type | File format not supported |
unsupported_audio_type | Audio format not supported |
invalid_status | Cannot reprocess from current status |