Knowledge Sources

Deep dive into each knowledge source type, the processing pipeline, and webhook callbacks.

Source Types

URL Sources

Scrape and index any public web page.


{
  "name": "Product Documentation",
  "type": "url",
  "url": "https://docs.example.com/getting-started"
}

Processing: Fetches the page, extracts text content, chunks it, and generates embeddings.

Duplicate detection: If a URL has already been added to the Mind, the request will return 400 with code duplicate_source.

URLs are validated for SSRF protection. Private IP addresses, localhost, and internal domains are rejected.

Document Sources

Upload and index PDF, DOCX, TXT, or Markdown files.

Two-step process:

Upload the file via the upload endpoint
Create the knowledge source with the returned fileUrl


# Step 1: Upload
curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge/upload" \
  -H "Authorization: Bearer tr_live_your_api_key" \
  -F "file=@/path/to/guide.pdf"
 
# Step 2: Create source
curl -X POST "https://trigglio.com/api/v1/minds/{mindId}/knowledge" \
  -H "Authorization: Bearer tr_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Guide",
    "type": "document",
    "fileUrl": "https://cdn.trigglio.com/minds/.../guide.pdf",
    "fileName": "guide.pdf",
    "mimeType": "application/pdf",
    "fileSize": 1048576
  }'

Supported formats:

Format	MIME Type	Max Size
PDF	`application/pdf`	50 MB
DOCX	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`	50 MB
TXT	`text/plain`	50 MB
Markdown	`text/markdown`	50 MB

YouTube Sources

Index YouTube video transcripts automatically.


{
  "name": "Product Demo Video",
  "type": "youtube",
  "youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Supported URL formats:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID

Processing: Extracts captions if available. If no captions exist, downloads and transcribes the audio using Groq Whisper. The status will transition through processing → transcribing → ready.

Podcast Sources

Upload and transcribe audio files.

Two-step process (same as documents):

Upload the audio file
Create a podcast knowledge source with the returned fileUrl


{
  "name": "Episode 42: AI in Production",
  "type": "podcast",
  "fileUrl": "https://cdn.trigglio.com/minds/.../episode-42.mp3",
  "fileName": "episode-42.mp3",
  "mimeType": "audio/mpeg",
  "fileSize": 52428800
}

Supported audio formats: MP3, WAV, M4A, OGG (max 500 MB)

Processing Pipeline

Every knowledge source goes through this pipeline:

Extraction — Content is fetched/parsed from the source
Moderation — Content is checked for safety
Chunking — Text is split into ~800 token chunks with 100 token overlap
Embedding — Each chunk is converted to a vector embedding and stored

For audio sources (YouTube without captions, podcasts), there’s an additional transcription step using Groq Whisper before chunking.

Status Values

Status	Description
`pending`	Source created, waiting to be processed
`processing`	Content is being extracted and chunked
`transcribing`	Audio is being transcribed (YouTube/podcast)
`ready`	Processing complete, source is searchable
`failed`	Processing failed (check `error` field)

Polling for Status


async function waitForReady(mindId, sourceId, apiKey) {
  while (true) {
    const res = await fetch(
      `https://trigglio.com/api/v1/minds/${mindId}/knowledge/${sourceId}`,
      { headers: { 'Authorization': `Bearer ${apiKey}` } }
    );
    const { source } = await res.json();
 
    if (source.status === 'ready') return source;
    if (source.status === 'failed') throw new Error(source.error);
 
    // Respect Retry-After header
    const retryAfter = res.headers.get('Retry-After') || '5';
    await new Promise(r => setTimeout(r, parseInt(retryAfter) * 1000));
  }
}

Webhook Callbacks

Instead of polling, provide a callbackUrl when creating a knowledge source. Trigglio will POST to your URL when processing finishes.

Setup

Include callbackUrl in your create request:


{
  "name": "My Source",
  "type": "url",
  "url": "https://example.com",
  "callbackUrl": "https://your-server.com/webhooks/trigglio"
}

Callback URLs must use HTTPS and be publicly accessible. Private IP addresses are rejected.

Payload


{
  "event": "knowledge.processed",
  "sourceId": "src_abc123",
  "mindId": "mind_xyz789",
  "name": "My Source",
  "type": "url",
  "status": "ready",
  "chunkCount": 42,
  "error": null,
  "timestamp": "2024-06-15T10:30:00.000Z"
}

For failures, event will be knowledge.failed and error will contain the error message.

Signature Verification

Each callback includes an X-Trigglio-Signature header containing an HMAC-SHA256 signature. Verify it to ensure the request is authentic:


import { createHmac } from 'crypto';
 
function verifySignature(body, signature, secret) {
  const expected = createHmac('sha256', secret)
    .update(JSON.stringify(body))
    .digest('hex');
  return expected === signature;
}

Retry Behavior

Callbacks are fire-and-forget with a 10-second timeout. If your server doesn’t respond in time, the callback is not retried. Use status polling as a fallback.

Limits

Resource	Limit
Knowledge sources per Mind	Depends on plan (typically 10-50)
Document file size	50 MB
Audio file size	500 MB
Callback URL length	2,048 characters

Error Codes

Code	Description
`validation_error`	Missing or invalid field
`source_limit_reached`	Too many sources on this Mind
`storage_limit_exceeded`	Account storage quota exceeded
`duplicate_source`	URL or video already added
`invalid_url`	URL failed SSRF or format validation
`youtube_invalid_url`	Could not extract YouTube video ID
`unsupported_document_type`	File format not supported
`unsupported_audio_type`	Audio format not supported
`invalid_status`	Cannot reprocess from current status