Skip to Content
Knowledge Sources

Knowledge Sources

Deep dive into each knowledge source type, the processing pipeline, and webhook callbacks.

Source Types

URL Sources

Scrape and index any public web page.

{ "name": "Product Documentation", "type": "url", "url": "https://docs.example.com/getting-started" }

Processing: Fetches the page, extracts text content, chunks it, and generates embeddings.

Duplicate detection: If a URL has already been added to the Mind, the request will return 400 with code duplicate_source.

URLs are validated for SSRF protection. Private IP addresses, localhost, and internal domains are rejected.

Document Sources

Upload and index PDF, DOCX, TXT, or Markdown files.

Two-step process:

  1. Upload the file via the upload endpoint
  2. Create the knowledge source with the returned fileUrl
# Step 1: Upload curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge/upload" \ -H "Authorization: Bearer tr_live_your_api_key" \ -F "file=@/path/to/guide.pdf" # Step 2: Create source curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge" \ -H "Authorization: Bearer tr_live_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "name": "Product Guide", "type": "document", "fileUrl": "https://cdn.trigglio.com/minds/.../guide.pdf", "fileName": "guide.pdf", "mimeType": "application/pdf", "fileSize": 1048576 }'

Supported formats:

FormatMIME TypeMax Size
PDFapplication/pdf50 MB
DOCXapplication/vnd.openxmlformats-officedocument.wordprocessingml.document50 MB
TXTtext/plain50 MB
Markdowntext/markdown50 MB

YouTube Sources

Index YouTube video transcripts automatically.

{ "name": "Product Demo Video", "type": "youtube", "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }

Supported URL formats:

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID

Processing: Extracts captions if available. If no captions exist, downloads and transcribes the audio using Groq Whisper. The status will transition through processingtranscribingready.

Podcast Sources

Upload and transcribe audio files.

Two-step process (same as documents):

  1. Upload the audio file
  2. Create a podcast knowledge source with the returned fileUrl
{ "name": "Episode 42: AI in Production", "type": "podcast", "fileUrl": "https://cdn.trigglio.com/minds/.../episode-42.mp3", "fileName": "episode-42.mp3", "mimeType": "audio/mpeg", "fileSize": 52428800 }

Supported audio formats: MP3, WAV, M4A, OGG (max 500 MB)


Processing Pipeline

Every knowledge source goes through this pipeline:

  1. Extraction — Content is fetched/parsed from the source
  2. Moderation — Content is checked for safety
  3. Chunking — Text is split into ~800 token chunks with 100 token overlap
  4. Embedding — Each chunk is converted to a vector embedding and stored

For audio sources (YouTube without captions, podcasts), there’s an additional transcription step using Groq Whisper before chunking.

Status Values

StatusDescription
pendingSource created, waiting to be processed
processingContent is being extracted and chunked
transcribingAudio is being transcribed (YouTube/podcast)
readyProcessing complete, source is searchable
failedProcessing failed (check error field)

Polling for Status

async function waitForReady(mindId, sourceId, apiKey) { while (true) { const res = await fetch( `https://trigglio.com/api/v1/minds/${mindId}/knowledge/${sourceId}`, { headers: { 'Authorization': `Bearer ${apiKey}` } } ); const { source } = await res.json(); if (source.status === 'ready') return source; if (source.status === 'failed') throw new Error(source.error); // Respect Retry-After header const retryAfter = res.headers.get('Retry-After') || '5'; await new Promise(r => setTimeout(r, parseInt(retryAfter) * 1000)); } }

Webhook Callbacks

Instead of polling, provide a callbackUrl when creating a knowledge source. Trigglio will POST to your URL when processing finishes.

Setup

Include callbackUrl in your create request:

{ "name": "My Source", "type": "url", "url": "https://example.com", "callbackUrl": "https://your-server.com/webhooks/trigglio" }

Callback URLs must use HTTPS and be publicly accessible. Private IP addresses are rejected.

Payload

{ "event": "knowledge.processed", "sourceId": "src_abc123", "mindId": "mind_xyz789", "name": "My Source", "type": "url", "status": "ready", "chunkCount": 42, "error": null, "timestamp": "2024-06-15T10:30:00.000Z" }

For failures, event will be knowledge.failed and error will contain the error message.

Signature Verification

Each callback includes an X-Trigglio-Signature header containing an HMAC-SHA256 signature. Verify it to ensure the request is authentic:

import { createHmac } from 'crypto'; function verifySignature(body, signature, secret) { const expected = createHmac('sha256', secret) .update(JSON.stringify(body)) .digest('hex'); return expected === signature; }

Retry Behavior

Callbacks are fire-and-forget with a 10-second timeout. If your server doesn’t respond in time, the callback is not retried. Use status polling as a fallback.


Limits

ResourceLimit
Knowledge sources per MindDepends on plan (typically 10-50)
Document file size50 MB
Audio file size500 MB
Callback URL length2,048 characters

Error Codes

CodeDescription
validation_errorMissing or invalid field
source_limit_reachedToo many sources on this Mind
storage_limit_exceededAccount storage quota exceeded
duplicate_sourceURL or video already added
invalid_urlURL failed SSRF or format validation
youtube_invalid_urlCould not extract YouTube video ID
unsupported_document_typeFile format not supported
unsupported_audio_typeAudio format not supported
invalid_statusCannot reprocess from current status
Last updated on