TaskAGI Document & Image Processing Workflow Setup Guide
What This Agent Does
This powerful automation workflow transforms how you manage documents and images shared via Telegram. It automatically captures files and photos, extracts text using advanced AI vision technology, stores everything securely in Google Drive, and maintains a searchable index in Airtable—all triggered by a simple Telegram message. Whether you're processing receipts, contracts, whiteboards, or important documents, this agent handles the entire pipeline with zero manual intervention.
Key benefits include:
-
Instant text extraction from any document or image using Claude Vision AI
-
Automatic cloud storage organization with Google Drive integration
-
Searchable database of all processed items via Airtable indexing
-
Time savings of hours per week on manual data entry and file organization
-
Flexible search capability to retrieve previously processed documents directly from Telegram
Target use cases:
- Receipt and invoice processing for expense tracking
- Document digitization and archival
- Whiteboard and handwritten note capture
- Contract and legal document management
- Research paper and article collection
- Team knowledge base building
Who Is It For
This workflow is ideal for:
-
Professionals managing high volumes of documents and receipts
-
Teams needing centralized document storage with searchability
-
Researchers collecting and organizing reference materials
-
Businesses automating document intake and processing
-
Anyone who wants to eliminate manual file organization and data entry
No coding experience required—just basic familiarity with Telegram and cloud services.
Required Integrations
Telegram
Why it's needed: Telegram serves as your primary interface for triggering the workflow and receiving results. Users send documents or photos through Telegram, and the bot responds with confirmation messages and search results.
Setup steps:
-
Create a Telegram Bot
- Open Telegram and search for
@BotFather
- Send the command
/newbot
- Follow the prompts to name your bot (e.g., "DocumentProcessorBot")
- Copy the API Token provided (format:
123456789:ABCdefGHIjklmnoPQRstuvWXYZ)
-
Enable Webhook in TaskAGI
- In TaskAGI, navigate to your workflow
- Locate the "Telegram Trigger" node
- Paste your API Token in the
api_token field
- Copy the Webhook URL generated by TaskAGI
- Return to BotFather and send
/setwebhook followed by your webhook URL
-
Test Bot Connectivity
- Send a test message to your bot in Telegram
- Verify the workflow triggers successfully
Configuration in TaskAGI:
- Set
parse_mode to HTML for formatted message responses
- Enable
allow_user_ids if you want to restrict access to specific users
- Store your bot token securely—never share it publicly
Google Drive
Why it's needed: Google Drive provides secure cloud storage for all processed documents and images, ensuring they're backed up, organized, and accessible from anywhere.
Setup steps:
-
Create a Google Cloud Project
- Visit Google Cloud Console
- Click "Create Project" and name it (e.g., "TaskAGI Document Processor")
- Wait for the project to initialize
-
Enable Google Drive API
- In the Cloud Console, search for "Google Drive API"
- Click "Enable"
- Go to "Credentials" in the left sidebar
- Click "Create Credentials" → "Service Account"
- Fill in the service account name and click "Create and Continue"
- Grant the service account "Editor" role for Google Drive access
- Click "Continue" and then "Done"
-
Generate and Download Service Account Key
- In the Credentials page, find your service account
- Click on it, then go to the "Keys" tab
- Click "Add Key" → "Create new key"
- Choose JSON format and download the file
-
Keep this file secure—it contains your credentials
-
Configure in TaskAGI
- In the "Upload Document to Drive" and "Upload Photo to Drive" nodes
- Paste the entire JSON key content in the
credentials field
- Set
folder_id to your target Google Drive folder ID (found in the folder's URL)
- Specify
file_name using dynamic data: ${message.file_name} or ${message.photo_name}
Pro tip: Create a dedicated folder in Google Drive for this workflow (e.g., "TaskAGI Processed Documents") and use its ID for organization.
Anthropic (Claude Vision)
Why it's needed: Claude's advanced vision capabilities extract and transcribe all visible text from documents and images with exceptional accuracy, enabling full-text search and data extraction.
Setup steps:
-
Create Anthropic Account
- Visit Anthropic Console
- Sign up or log in with your account
- Verify your email address
-
Generate API Key
- Navigate to "API Keys" in your account settings
- Click "Create Key"
- Copy the generated key (format:
sk-ant-...)
- Store it securely—never commit it to version control
-
Set Up Billing
- Add a payment method to your Anthropic account
- Set usage limits if desired to control costs
- Note: Vision analysis costs approximately $0.003 per image
-
Configure in TaskAGI
- In both "Claude Vision OCR" nodes (Document and Photo)
- Paste your API key in the
api_key field
- Verify the model is set to
claude-sonnet-4-5-20250929 (latest vision model)
- The prompt is pre-configured: "Extract and transcribe all text visible in this image"
- Leave
max_tokens at 2048 for comprehensive text extraction
Airtable
Why it's needed: Airtable creates a searchable, queryable database of all processed documents, enabling users to find previously processed items directly through Telegram.
Setup steps:
-
Create Airtable Workspace
- Visit Airtable and sign up
- Create a new workspace for document processing
- Create a new base named "Document Index"
-
Design Your Table
- Create a table with these fields:
-
Document Name (Single line text)
-
File Type (Single select: Document, Photo)
-
Extracted Text (Long text)
-
Drive URL (URL)
-
Processed Date (Date)
-
Search Tags (Multiple select)
-
Generate API Token
- Go to Airtable Developer Hub
- Click "Generate Token"
- Grant permissions for
data.records:read and data.records:write
- Copy your personal access token
-
Configure in TaskAGI
- In the "Index in Airtable" node, paste your token in
api_token
- Set
base_id to your base ID (found in Airtable API documentation)
- Set
table_name to "Document Index"
- Map fields:
name, type, extracted_text, drive_url, date
- In the "Search in Airtable" node, use the same credentials with
view_name set to "Grid view"
Configuration Steps
Trigger Setup: Telegram Webhook
The workflow begins when a user sends a message to your Telegram bot containing a file or photo.
-
Node: Telegram Trigger
-
Configuration: API token and webhook URL (see Telegram integration section)
-
Output: Message object containing file metadata and user information
Decision Node: Check if File or Image
This node determines the processing path based on message content.
-
Node: Check if File or Image (core.if_condition)
-
Condition:
${message.document !== null || message.photo !== null}
-
True path: Continue to file/photo extraction
-
False path: Jump to search command check (handles text-only messages)
File Type Detection: Document vs. Photo
Two parallel extraction nodes handle different media types:
-
Extract File Metadata (documents): Captures
file_id, file_name, file_size, mime_type
-
Extract Photo Metadata (photos): Captures
file_id, photo_size, width, height
Both nodes feed into download operations that retrieve the actual files from Telegram's servers.
Download and Upload Operations
-
Download nodes use
telegram.getFile to retrieve files from Telegram
-
Upload nodes use
googledrive.uploadFile to store files in your Drive folder
-
Dynamic naming: Files are named using
${extracted_metadata.file_name} to maintain original names
OCR Processing: Text Extraction
The workflow includes intelligent OCR eligibility checking:
-
Check if OCR Eligible determines if the file type supports text extraction
-
Claude Vision OCR nodes process eligible files through Anthropic's API
-
Extracted text is formatted into readable, searchable content
-
Non-eligible files skip OCR and proceed directly to indexing
Data Merging and Indexing
Multiple data paths converge at merge nodes:
-
Merge Document Paths combines OCR results with file metadata
-
Merge All Paths unifies document and photo processing streams
-
Index in Airtable creates a searchable record with all extracted information
Search Functionality
When users send text-only messages:
-
Check Search Command detects search intent (keywords like "search", "find", "lookup")
-
Extract Search Query isolates the search terms
-
Search in Airtable queries the database for matching records
-
Format Search Results creates readable Telegram messages
-
Send Search Results returns matches to the user
Testing Your Agent
Pre-Launch Checklist
Before going live, verify each integration:
-
Telegram: Send a test message to your bot—verify it's received
-
Google Drive: Check that a test file appears in your designated folder
-
Anthropic: Confirm API key is valid and billing is active
-
Airtable: Verify the base and table exist with correct field names
Test Execution Steps
Test 1: Document Upload
- Send a PDF or document file to your Telegram bot
- Wait 10-15 seconds for processing
- Verify success message is received
- Check Google Drive for the uploaded file
- Check Airtable for a new record with extracted text
Test 2: Photo Upload
- Send a photo (screenshot, whiteboard, receipt) to the bot
- Verify the photo appears in Google Drive
- Confirm Airtable record contains extracted text from the image
- Check that extracted text is accurate and complete
Test 3: Search Functionality
- Send message: "search invoice"
- Verify the bot returns matching records from Airtable
- Confirm results include document names and extracted text snippets
Success Indicators
✅ Workflow is working correctly when:
- Files appear in Google Drive within 30 seconds
- Airtable records are created with complete metadata
- Extracted text is accurate and searchable
- Search queries return relevant results
- Telegram messages confirm each step
Troubleshooting tips:
- Check TaskAGI logs for specific error messages
- Verify all API tokens are current and have proper permissions
- Ensure Google Drive folder ID is correct
- Confirm Airtable base and table names match exactly
Congratulations! Your document processing automation is now live and ready to save you hours of manual work.