Step by Step

Setup Tutorial

mission-briefing.md

TaskAGI Document & Image Processing Workflow Setup Guide

What This Agent Does

This powerful automation workflow transforms how you manage documents and images shared via Telegram. It automatically captures files and photos, extracts text using advanced AI vision technology, stores everything securely in Google Drive, and maintains a searchable index in Airtable—all triggered by a simple Telegram message. Whether you're processing receipts, contracts, whiteboards, or important documents, this agent handles the entire pipeline with zero manual intervention.

Key benefits include:

Instant text extraction from any document or image using Claude Vision AI
Automatic cloud storage organization with Google Drive integration
Searchable database of all processed items via Airtable indexing
Time savings of hours per week on manual data entry and file organization
Flexible search capability to retrieve previously processed documents directly from Telegram

Target use cases:

Receipt and invoice processing for expense tracking
Document digitization and archival
Whiteboard and handwritten note capture
Contract and legal document management
Research paper and article collection
Team knowledge base building

Who Is It For

This workflow is ideal for:

Professionals managing high volumes of documents and receipts
Teams needing centralized document storage with searchability
Researchers collecting and organizing reference materials
Businesses automating document intake and processing
Anyone who wants to eliminate manual file organization and data entry

No coding experience required—just basic familiarity with Telegram and cloud services.

Required Integrations

Why it's needed: Telegram serves as your primary interface for triggering the workflow and receiving results. Users send documents or photos through Telegram, and the bot responds with confirmation messages and search results.

Setup steps:

Create a Telegram Bot
- Open Telegram and search for @BotFather
- Send the command /newbot
- Follow the prompts to name your bot (e.g., "DocumentProcessorBot")
- Copy the API Token provided (format: 123456789:ABCdefGHIjklmnoPQRstuvWXYZ)
Enable Webhook in TaskAGI
- In TaskAGI, navigate to your workflow
- Locate the "Telegram Trigger" node
- Paste your API Token in the api_token field
- Copy the Webhook URL generated by TaskAGI
- Return to BotFather and send /setwebhook followed by your webhook URL
Test Bot Connectivity
- Send a test message to your bot in Telegram
- Verify the workflow triggers successfully

Configuration in TaskAGI:

Set parse_mode to HTML for formatted message responses
Enable allow_user_ids if you want to restrict access to specific users
Store your bot token securely—never share it publicly

Google Drive

Why it's needed: Google Drive provides secure cloud storage for all processed documents and images, ensuring they're backed up, organized, and accessible from anywhere.

Setup steps:

Create a Google Cloud Project
- Visit Google Cloud Console
- Click "Create Project" and name it (e.g., "TaskAGI Document Processor")
- Wait for the project to initialize
Enable Google Drive API
- In the Cloud Console, search for "Google Drive API"
- Click "Enable"
- Go to "Credentials" in the left sidebar
- Click "Create Credentials" → "Service Account"
- Fill in the service account name and click "Create and Continue"
- Grant the service account "Editor" role for Google Drive access
- Click "Continue" and then "Done"
Generate and Download Service Account Key
- In the Credentials page, find your service account
- Click on it, then go to the "Keys" tab
- Click "Add Key" → "Create new key"
- Choose JSON format and download the file
- Keep this file secure—it contains your credentials
Configure in TaskAGI
- In the "Upload Document to Drive" and "Upload Photo to Drive" nodes
- Paste the entire JSON key content in the credentials field
- Set folder_id to your target Google Drive folder ID (found in the folder's URL)
- Specify file_name using dynamic data: ${message.file_name} or ${message.photo_name}

Pro tip: Create a dedicated folder in Google Drive for this workflow (e.g., "TaskAGI Processed Documents") and use its ID for organization.

Anthropic (Claude Vision)

Why it's needed: Claude's advanced vision capabilities extract and transcribe all visible text from documents and images with exceptional accuracy, enabling full-text search and data extraction.

Setup steps:

Create Anthropic Account
- Visit Anthropic Console
- Sign up or log in with your account
- Verify your email address
Generate API Key
- Navigate to "API Keys" in your account settings
- Click "Create Key"
- Copy the generated key (format: sk-ant-...)
- Store it securely—never commit it to version control
Set Up Billing
- Add a payment method to your Anthropic account
- Set usage limits if desired to control costs
- Note: Vision analysis costs approximately $0.003 per image
Configure in TaskAGI
- In both "Claude Vision OCR" nodes (Document and Photo)
- Paste your API key in the api_key field
- Verify the model is set to claude-sonnet-4-5-20250929 (latest vision model)
- The prompt is pre-configured: "Extract and transcribe all text visible in this image"
- Leave max_tokens at 2048 for comprehensive text extraction

Airtable

Why it's needed: Airtable creates a searchable, queryable database of all processed documents, enabling users to find previously processed items directly through Telegram.

Setup steps:

Create Airtable Workspace
- Visit Airtable and sign up
- Create a new workspace for document processing
- Create a new base named "Document Index"
Design Your Table
- Create a table with these fields:
  - Document Name (Single line text)
  - File Type (Single select: Document, Photo)
  - Extracted Text (Long text)
  - Drive URL (URL)
  - Processed Date (Date)
  - Search Tags (Multiple select)
Generate API Token
- Go to Airtable Developer Hub
- Click "Generate Token"
- Grant permissions for data.records:read and data.records:write
- Copy your personal access token
Configure in TaskAGI
- In the "Index in Airtable" node, paste your token in api_token
- Set base_id to your base ID (found in Airtable API documentation)
- Set table_name to "Document Index"
- Map fields: name, type, extracted_text, drive_url, date
- In the "Search in Airtable" node, use the same credentials with view_name set to "Grid view"

Configuration Steps

Trigger Setup: Telegram Webhook

The workflow begins when a user sends a message to your Telegram bot containing a file or photo.

Node: Telegram Trigger
Configuration: API token and webhook URL (see Telegram integration section)
Output: Message object containing file metadata and user information

Decision Node: Check if File or Image

This node determines the processing path based on message content.

Node: Check if File or Image (core.if_condition)
Condition: ${message.document !== null || message.photo !== null}
True path: Continue to file/photo extraction
False path: Jump to search command check (handles text-only messages)

File Type Detection: Document vs. Photo

Two parallel extraction nodes handle different media types:

Extract File Metadata (documents): Captures file_id, file_name, file_size, mime_type
Extract Photo Metadata (photos): Captures file_id, photo_size, width, height

Both nodes feed into download operations that retrieve the actual files from Telegram's servers.

Download and Upload Operations

Download nodes use telegram.getFile to retrieve files from Telegram
Upload nodes use googledrive.uploadFile to store files in your Drive folder
Dynamic naming: Files are named using ${extracted_metadata.file_name} to maintain original names

OCR Processing: Text Extraction

The workflow includes intelligent OCR eligibility checking:

Check if OCR Eligible determines if the file type supports text extraction
Claude Vision OCR nodes process eligible files through Anthropic's API
Extracted text is formatted into readable, searchable content
Non-eligible files skip OCR and proceed directly to indexing

Data Merging and Indexing

Multiple data paths converge at merge nodes:

Merge Document Paths combines OCR results with file metadata
Merge All Paths unifies document and photo processing streams
Index in Airtable creates a searchable record with all extracted information

Search Functionality

When users send text-only messages:

Check Search Command detects search intent (keywords like "search", "find", "lookup")
Extract Search Query isolates the search terms
Search in Airtable queries the database for matching records
Format Search Results creates readable Telegram messages
Send Search Results returns matches to the user

Testing Your Agent

Pre-Launch Checklist

Before going live, verify each integration:

Telegram: Send a test message to your bot—verify it's received
Google Drive: Check that a test file appears in your designated folder
Anthropic: Confirm API key is valid and billing is active
Airtable: Verify the base and table exist with correct field names

Test Execution Steps

Test 1: Document Upload

Send a PDF or document file to your Telegram bot
Wait 10-15 seconds for processing
Verify success message is received
Check Google Drive for the uploaded file
Check Airtable for a new record with extracted text

Test 2: Photo Upload

Send a photo (screenshot, whiteboard, receipt) to the bot
Verify the photo appears in Google Drive
Confirm Airtable record contains extracted text from the image
Check that extracted text is accurate and complete

Test 3: Search Functionality

Send message: "search invoice"
Verify the bot returns matching records from Airtable
Confirm results include document names and extracted text snippets

Success Indicators

✅ Workflow is working correctly when:

Files appear in Google Drive within 30 seconds
Airtable records are created with complete metadata
Extracted text is accurate and searchable
Search queries return relevant results
Telegram messages confirm each step

Troubleshooting tips:

Check TaskAGI logs for specific error messages
Verify all API tokens are current and have proper permissions
Ensure Google Drive folder ID is correct
Confirm Airtable base and table names match exactly

Congratulations! Your document processing automation is now live and ready to save you hours of manual work.

Deploy This Agent Now

Telegram Document Archive with AI OCR

Need custom configuration?

INTEGRATED_MODULES

Setup Tutorial

TaskAGI Document & Image Processing Workflow Setup Guide

What This Agent Does

Who Is It For

Required Integrations

Telegram

Google Drive

Anthropic (Claude Vision)

Airtable

Configuration Steps

Trigger Setup: Telegram Webhook

Decision Node: Check if File or Image

File Type Detection: Document vs. Photo

Download and Upload Operations

OCR Processing: Text Extraction

Data Merging and Indexing

Search Functionality

Testing Your Agent

Pre-Launch Checklist

Test Execution Steps

Success Indicators

Related Agents

Telegram Expense Tracker AI Agent

Telegram News Article RAG Chat Bot

Telegram UGC Video Generator