RAG simplified with Truto



Introduction
Building a RAG system that trains on a URL or file upload is straightforward. But does this approach truly meet your need for an AI chatbot that can access and answer questions using internal data from Confluence pages, Jira tickets, Notion pages, or other SaaS tools?
I assume the answer is no, so you may want to explore using a RAG provider along with connectors. Here are a few key questions to consider while looking for a RAG provider:
- Do they support all the connectors your customers need? 
- Can they quickly add new connectors before your lead loses momentum? 
- Do they offer the flexibility to choose your own embedding model and vector database? 
With Truto, building native integrations is effortless—we've perfected them. Now, we've also solved your RAG challenges, enabling seamless data syncing from virtually any SaaS tool. (Fun fact: We already support 350+ integrations, allowing instant data syncing to your vector database, and even enabling data write-backs based on user actions.)
Challenge #1 - File or Page selection
You wouldn’t want your AI model—or any third-party provider—accessing sensitive internal data, so it's best to restrict syncing to specific files or pages. While apps like Google Drive, SharePoint, and Box provide native file pickers, what about integrations that don’t?
Truto solves this with RapidForm, allowing users to select exactly which files and pages to sync during the connection process. Plus, we support native file pickers for Google Drive and SharePoint (with more on the way!) to ensure a seamless user experience.
Check out our change log for a sneak peek at the native file pickers: Truto Change Log.
Challenge #2 - Content retrieval
APIs can be unpredictable, each with its own quirks in handling responses. For example, the Confluence API delivers page content as a single record, the Notion API structures content in blocks requiring pagination, and the SharePoint API retrieves only the file itself—not its content.
Notion
At Truto, tackling these API challenges is what we do best—so you don’t have to. Take Notion, for example—its API delivers content in fragmented blocks, requiring pagination. We’ve solved this by spooling content in memory and seamlessly merging it into a single record, all thanks to RapidBridge (also known as Sync Job).
The code snippet below shows how our Notion Sync Job works: it fetches pages, recursively retrieves their content, spools the data, and merges everything into a unified record for each page.
{ "type":"request", "name":"list-pages", "resource":"knowledge-base/pages", "method":"list", "integrated_account_id":"{{args.integrated_account_id}}" }, { "type":"add_context", "name":"add-page-name", "depends_on":"list-pages", "config":{ "expression":"{ \"page_id\": resources.`knowledge-base`.pages.id, \"page_title\": resources.`knowledge-base`.pages.title, \"page_url\": resources.`knowledge-base`.pages.urls[type=\"view\"].url }" } }, { "type":"request", "name":"get-page-content", "resource":"knowledge-base/page-content", "method":"list", "depends_on":"list-pages", "query":{ "page":{ "id":"{{resources.knowledge-base.pages.id}}" } }, "recurse":{ "if":"{{resources.knowledge-base.page-content.has_children:bool}}", "config":{ "query":{ "page_content_id":"{{resources.knowledge-base.page-content.id}}" } } }, "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"remove-remote-data", "type":"transform", "config":{ "expression":"resources.`knowledge-base`.`page-content`.$sift(function($v, $k) { $k != remote_data })" }, "depends_on":"get-page-content" }, { "name":"all-page-content", "type":"spool", "depends_on":"remove-remote-data" }, { "name":"combine-page-content", "type":"transform", "config":{ "expression":"{ \"file_id\": page_id, \"file_name\": page_title, \"content\": \"# \" & page_title & \"\n\n\" & $reduce($sortNodes(resources.`knowledge-base`.`page-content`, \"id\", \"parent.id\"), function($acc, $v) { $acc & $v.body.content }, \"\" ) }" }, "depends_on":"all-page-content" }
SharePoint
With SharePoint, the process is a bit different—we need to download and parse the file to extract its content.
{ "type":"request", "name":"list-files", "resource":"file-storage/drive-items", "method":"get", "integrated_account_id":"{{args.integrated_account_id}}", "id":"{{drive_items.id}}", "query":{ "drive":{ "id":"{{drive_items.parentReference.driveId}}" }, "workspace":{ "id":"{{drive_items.parentReference.sharepointIds.siteId}}" } }, "loop_on":"drive_items" }, { "type":"add_context", "name":"add-file-name", "depends_on":"list-files", "config":{ "expression":"{'file_name': resources.`file-storage`.`drive-items`.name, 'file_id': resources.`file-storage`.`drive-items`.id, 'file_type': resources.`file-storage`.`drive-items`.mime_type, 'file_size': resources.`file-storage`.`drive-items`.size, 'file_url': resources.`file-storage`.`drive-items`.urls[type=\"self\"].url, 'aws_file_path': $join([tenant_id,resources.`file-storage`.`drive-items`.workspace.id, resources.`file-storage`.`drive-items`.drive.id],\"/\") }" } }, { "type":"request", "name":"download-file", "resource":"file-storage/drive-items", "method":"download", "depends_on":"list-files", "query":"{'file_url': resources.`file-storage`.`drive-items`.urls[type = 'download'].url, 'truto_response_format': 'stream'}", "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"tee-stream", "type":"transform", "config":{ "expression":"{ \"file_streams\": $teeStream(resources.`file-storage`.`drive-items`) }" }, "depends_on":"download-file" }, { "name":"transform-file-content", "type":"transform", "config":{ "expression":"{ 'file_content': $parseDocument(resources.`file-storage`.`drive-items`[0].file_streams[0], file_type)}" }, "depends_on":"tee-stream" }
Challenge #3 – Embeddings Generation
Once you’ve got the content, the next challenge is generating embeddings. But can you process the entire content in one go? Of course not. You’ll need to split it into chunks first. (Yeah, we get the "pain".)
The following Sync job node demonstrates how we chunk the page content using the recursiveCharacterTextSplitter method from langchain/textsplitters.
{ "name":"file-content-chunk", "type":"add_context", "config":{ "expression":"{'file_content_chunk': $recursiveCharacterTextSplitter(resources.`file-storage`.`drive-items`[0].file_content) }" }, "depends_on":"transform-file-content" }
Remember, you're not locked into this specific chunking method—you can easily swap it out for your own custom approach if needed.
Next, we move on to generating embeddings for these chunks. The Sync job node below leverages the Cohere Embed API with the embed-multilingual-light-v3.0 model to generate the embeddings:
{ "name":"generate-embeddings", "type":"transform", "config":{ "expression":"{ 'content_embeddings' : $generateEmbeddingsCohere({'model': 'embed-multilingual-light-v3.0', 'input_type': 'search_document', 'embedding_types': ['float'], 'texts': file_content_chunk }, args.cohere_api_key)}" }, "depends_on":"transform-file-content" }
This approach is highly flexible—change the model attribute to use a different Cohere model or swap out the $generateEmbeddingsCohere() method to use another provider like OpenAI.
Challenge #4 - Storing embeddings
Now we reach the stage where the real magic unfolds: storing the embeddings in your own vector database. This is a crucial step, as it enables rapid similarity searches and efficient retrieval of your processed data.
In the example below, we form the payload in accordance with Qdrant’s API request body schema. The subsequent node then calls the upsertPoints method of the Qdrant datastore, ensuring your embeddings are correctly indexed and stored.
{ "name":"qdrant-config", "type":"transform", "config":{ "expression":"{ 'qdrant_config': ($texts:= resources.`file-storage`.`drive-items`[0].content_embeddings.texts; $embeddings:= resources.`file-storage`.`drive-items`[0].content_embeddings.embeddings.float; $file_id:= file_id; $tenant_id:= $toNumber(tenant_id); $file_url:= file_url; $site_id:= site_id; $aws_file_path:= aws_file_path; {'query': {'ordering': 'strong'}, 'body': {'points': $texts#$i.{ 'id': $uuid(), 'payload': {'content': $, 'chunk_number': $i, 'organization_user_file_id': $file_id, 'user_id': $tenant_id, 'external_url': $file_url, 'updated_at': $now(), 'site_id': $site_id, 'saved_filename':$aws_file_path }, 'vector': $embeddings[$i]}}})}" }, "depends_on":"generate-embeddings" }, { "name":"qdrant-db", "type":"destination", "destination_type":"datastore", "method":"upsertPoints", "config":{ "id":"{{args.qdrant_datastore_id}}", "config":{ "query":"{{payload.records.0.qdrant_config.query:json}}", "body":"{{payload.records.0.qdrant_config.body:json}}" } }, "run_if":"$exists(args.qdrant_datastore_id)", "resources_to_persist":[ "qdrant-config" ] }
It also incorporates the actual content, the chunk number, the integration's file URL, and a reference to the original file in your object storage. This method ensures your data remains organized, easily searchable, and scalable as it grows.
Challenge #5 - Saving the Original File
It is not done until it is. Retaining the source file is crucial, especially if you plan to display it as part of your chatbot's response.
Truto makes this easy by supporting file storage in Google Cloud Storage and any S3-compatible object storage. Once you configure your datastore through the Truto console, the Sync job node below handles uploading the file to the specified path in your storage system.
{ "name":"s3-storage", "type":"destination", "destination_type":"datastore", "method":"uploadObject", "config":{ "id":"{{args.s3_datastore_id}}", "config":{ "path":"{{aws_file_path}}", "file_name":"{{file_name}}", "content":"{{payload.records.0.file_streams.1}}" } }, "run_if":"$exists(args.s3_datastore_id)", "resources_to_persist":[ "tee-stream" ] }
Getting Started
By now, you might be convinced that Truto is your ultimate solution for integrations and RAG. Truto isn't just a single service—it’s a comprehensive package that supports every step of your workflow.
Still not convinced? Consider this: Truto’s Unified APIs empower you to perform virtually any action your users request. For example:
User Prompt: "Give me a list of Jira tickets assigned to me."
AI Agent Response:
- Ticket #1 – Backlogged 
- Ticket #2 – Pending from engineering 
- Ticket #3 – Resolved … and so on. 
User Prompt: "Leave a comment on all tickets pending from engineering, asking for follow-ups."
AI Agent Response: Calls the CREATE ticketing/comments API for all pending tickets.
Sounds amazing, right? Don’t wait any longer—experience the power of Truto for yourself and transform the way you manage integrations and data. Get started today!
Introduction
Building a RAG system that trains on a URL or file upload is straightforward. But does this approach truly meet your need for an AI chatbot that can access and answer questions using internal data from Confluence pages, Jira tickets, Notion pages, or other SaaS tools?
I assume the answer is no, so you may want to explore using a RAG provider along with connectors. Here are a few key questions to consider while looking for a RAG provider:
- Do they support all the connectors your customers need? 
- Can they quickly add new connectors before your lead loses momentum? 
- Do they offer the flexibility to choose your own embedding model and vector database? 
With Truto, building native integrations is effortless—we've perfected them. Now, we've also solved your RAG challenges, enabling seamless data syncing from virtually any SaaS tool. (Fun fact: We already support 350+ integrations, allowing instant data syncing to your vector database, and even enabling data write-backs based on user actions.)
Challenge #1 - File or Page selection
You wouldn’t want your AI model—or any third-party provider—accessing sensitive internal data, so it's best to restrict syncing to specific files or pages. While apps like Google Drive, SharePoint, and Box provide native file pickers, what about integrations that don’t?
Truto solves this with RapidForm, allowing users to select exactly which files and pages to sync during the connection process. Plus, we support native file pickers for Google Drive and SharePoint (with more on the way!) to ensure a seamless user experience.
Check out our change log for a sneak peek at the native file pickers: Truto Change Log.
Challenge #2 - Content retrieval
APIs can be unpredictable, each with its own quirks in handling responses. For example, the Confluence API delivers page content as a single record, the Notion API structures content in blocks requiring pagination, and the SharePoint API retrieves only the file itself—not its content.
Notion
At Truto, tackling these API challenges is what we do best—so you don’t have to. Take Notion, for example—its API delivers content in fragmented blocks, requiring pagination. We’ve solved this by spooling content in memory and seamlessly merging it into a single record, all thanks to RapidBridge (also known as Sync Job).
The code snippet below shows how our Notion Sync Job works: it fetches pages, recursively retrieves their content, spools the data, and merges everything into a unified record for each page.
{ "type":"request", "name":"list-pages", "resource":"knowledge-base/pages", "method":"list", "integrated_account_id":"{{args.integrated_account_id}}" }, { "type":"add_context", "name":"add-page-name", "depends_on":"list-pages", "config":{ "expression":"{ \"page_id\": resources.`knowledge-base`.pages.id, \"page_title\": resources.`knowledge-base`.pages.title, \"page_url\": resources.`knowledge-base`.pages.urls[type=\"view\"].url }" } }, { "type":"request", "name":"get-page-content", "resource":"knowledge-base/page-content", "method":"list", "depends_on":"list-pages", "query":{ "page":{ "id":"{{resources.knowledge-base.pages.id}}" } }, "recurse":{ "if":"{{resources.knowledge-base.page-content.has_children:bool}}", "config":{ "query":{ "page_content_id":"{{resources.knowledge-base.page-content.id}}" } } }, "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"remove-remote-data", "type":"transform", "config":{ "expression":"resources.`knowledge-base`.`page-content`.$sift(function($v, $k) { $k != remote_data })" }, "depends_on":"get-page-content" }, { "name":"all-page-content", "type":"spool", "depends_on":"remove-remote-data" }, { "name":"combine-page-content", "type":"transform", "config":{ "expression":"{ \"file_id\": page_id, \"file_name\": page_title, \"content\": \"# \" & page_title & \"\n\n\" & $reduce($sortNodes(resources.`knowledge-base`.`page-content`, \"id\", \"parent.id\"), function($acc, $v) { $acc & $v.body.content }, \"\" ) }" }, "depends_on":"all-page-content" }
SharePoint
With SharePoint, the process is a bit different—we need to download and parse the file to extract its content.
{ "type":"request", "name":"list-files", "resource":"file-storage/drive-items", "method":"get", "integrated_account_id":"{{args.integrated_account_id}}", "id":"{{drive_items.id}}", "query":{ "drive":{ "id":"{{drive_items.parentReference.driveId}}" }, "workspace":{ "id":"{{drive_items.parentReference.sharepointIds.siteId}}" } }, "loop_on":"drive_items" }, { "type":"add_context", "name":"add-file-name", "depends_on":"list-files", "config":{ "expression":"{'file_name': resources.`file-storage`.`drive-items`.name, 'file_id': resources.`file-storage`.`drive-items`.id, 'file_type': resources.`file-storage`.`drive-items`.mime_type, 'file_size': resources.`file-storage`.`drive-items`.size, 'file_url': resources.`file-storage`.`drive-items`.urls[type=\"self\"].url, 'aws_file_path': $join([tenant_id,resources.`file-storage`.`drive-items`.workspace.id, resources.`file-storage`.`drive-items`.drive.id],\"/\") }" } }, { "type":"request", "name":"download-file", "resource":"file-storage/drive-items", "method":"download", "depends_on":"list-files", "query":"{'file_url': resources.`file-storage`.`drive-items`.urls[type = 'download'].url, 'truto_response_format': 'stream'}", "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"tee-stream", "type":"transform", "config":{ "expression":"{ \"file_streams\": $teeStream(resources.`file-storage`.`drive-items`) }" }, "depends_on":"download-file" }, { "name":"transform-file-content", "type":"transform", "config":{ "expression":"{ 'file_content': $parseDocument(resources.`file-storage`.`drive-items`[0].file_streams[0], file_type)}" }, "depends_on":"tee-stream" }
Challenge #3 – Embeddings Generation
Once you’ve got the content, the next challenge is generating embeddings. But can you process the entire content in one go? Of course not. You’ll need to split it into chunks first. (Yeah, we get the "pain".)
The following Sync job node demonstrates how we chunk the page content using the recursiveCharacterTextSplitter method from langchain/textsplitters.
{ "name":"file-content-chunk", "type":"add_context", "config":{ "expression":"{'file_content_chunk': $recursiveCharacterTextSplitter(resources.`file-storage`.`drive-items`[0].file_content) }" }, "depends_on":"transform-file-content" }
Remember, you're not locked into this specific chunking method—you can easily swap it out for your own custom approach if needed.
Next, we move on to generating embeddings for these chunks. The Sync job node below leverages the Cohere Embed API with the embed-multilingual-light-v3.0 model to generate the embeddings:
{ "name":"generate-embeddings", "type":"transform", "config":{ "expression":"{ 'content_embeddings' : $generateEmbeddingsCohere({'model': 'embed-multilingual-light-v3.0', 'input_type': 'search_document', 'embedding_types': ['float'], 'texts': file_content_chunk }, args.cohere_api_key)}" }, "depends_on":"transform-file-content" }
This approach is highly flexible—change the model attribute to use a different Cohere model or swap out the $generateEmbeddingsCohere() method to use another provider like OpenAI.
Challenge #4 - Storing embeddings
Now we reach the stage where the real magic unfolds: storing the embeddings in your own vector database. This is a crucial step, as it enables rapid similarity searches and efficient retrieval of your processed data.
In the example below, we form the payload in accordance with Qdrant’s API request body schema. The subsequent node then calls the upsertPoints method of the Qdrant datastore, ensuring your embeddings are correctly indexed and stored.
{ "name":"qdrant-config", "type":"transform", "config":{ "expression":"{ 'qdrant_config': ($texts:= resources.`file-storage`.`drive-items`[0].content_embeddings.texts; $embeddings:= resources.`file-storage`.`drive-items`[0].content_embeddings.embeddings.float; $file_id:= file_id; $tenant_id:= $toNumber(tenant_id); $file_url:= file_url; $site_id:= site_id; $aws_file_path:= aws_file_path; {'query': {'ordering': 'strong'}, 'body': {'points': $texts#$i.{ 'id': $uuid(), 'payload': {'content': $, 'chunk_number': $i, 'organization_user_file_id': $file_id, 'user_id': $tenant_id, 'external_url': $file_url, 'updated_at': $now(), 'site_id': $site_id, 'saved_filename':$aws_file_path }, 'vector': $embeddings[$i]}}})}" }, "depends_on":"generate-embeddings" }, { "name":"qdrant-db", "type":"destination", "destination_type":"datastore", "method":"upsertPoints", "config":{ "id":"{{args.qdrant_datastore_id}}", "config":{ "query":"{{payload.records.0.qdrant_config.query:json}}", "body":"{{payload.records.0.qdrant_config.body:json}}" } }, "run_if":"$exists(args.qdrant_datastore_id)", "resources_to_persist":[ "qdrant-config" ] }
It also incorporates the actual content, the chunk number, the integration's file URL, and a reference to the original file in your object storage. This method ensures your data remains organized, easily searchable, and scalable as it grows.
Challenge #5 - Saving the Original File
It is not done until it is. Retaining the source file is crucial, especially if you plan to display it as part of your chatbot's response.
Truto makes this easy by supporting file storage in Google Cloud Storage and any S3-compatible object storage. Once you configure your datastore through the Truto console, the Sync job node below handles uploading the file to the specified path in your storage system.
{ "name":"s3-storage", "type":"destination", "destination_type":"datastore", "method":"uploadObject", "config":{ "id":"{{args.s3_datastore_id}}", "config":{ "path":"{{aws_file_path}}", "file_name":"{{file_name}}", "content":"{{payload.records.0.file_streams.1}}" } }, "run_if":"$exists(args.s3_datastore_id)", "resources_to_persist":[ "tee-stream" ] }
Getting Started
By now, you might be convinced that Truto is your ultimate solution for integrations and RAG. Truto isn't just a single service—it’s a comprehensive package that supports every step of your workflow.
Still not convinced? Consider this: Truto’s Unified APIs empower you to perform virtually any action your users request. For example:
User Prompt: "Give me a list of Jira tickets assigned to me."
AI Agent Response:
- Ticket #1 – Backlogged 
- Ticket #2 – Pending from engineering 
- Ticket #3 – Resolved … and so on. 
User Prompt: "Leave a comment on all tickets pending from engineering, asking for follow-ups."
AI Agent Response: Calls the CREATE ticketing/comments API for all pending tickets.
Sounds amazing, right? Don’t wait any longer—experience the power of Truto for yourself and transform the way you manage integrations and data. Get started today!
Introduction
Building a RAG system that trains on a URL or file upload is straightforward. But does this approach truly meet your need for an AI chatbot that can access and answer questions using internal data from Confluence pages, Jira tickets, Notion pages, or other SaaS tools?
I assume the answer is no, so you may want to explore using a RAG provider along with connectors. Here are a few key questions to consider while looking for a RAG provider:
- Do they support all the connectors your customers need? 
- Can they quickly add new connectors before your lead loses momentum? 
- Do they offer the flexibility to choose your own embedding model and vector database? 
With Truto, building native integrations is effortless—we've perfected them. Now, we've also solved your RAG challenges, enabling seamless data syncing from virtually any SaaS tool. (Fun fact: We already support 350+ integrations, allowing instant data syncing to your vector database, and even enabling data write-backs based on user actions.)
Challenge #1 - File or Page selection
You wouldn’t want your AI model—or any third-party provider—accessing sensitive internal data, so it's best to restrict syncing to specific files or pages. While apps like Google Drive, SharePoint, and Box provide native file pickers, what about integrations that don’t?
Truto solves this with RapidForm, allowing users to select exactly which files and pages to sync during the connection process. Plus, we support native file pickers for Google Drive and SharePoint (with more on the way!) to ensure a seamless user experience.
Check out our change log for a sneak peek at the native file pickers: Truto Change Log.
Challenge #2 - Content retrieval
APIs can be unpredictable, each with its own quirks in handling responses. For example, the Confluence API delivers page content as a single record, the Notion API structures content in blocks requiring pagination, and the SharePoint API retrieves only the file itself—not its content.
Notion
At Truto, tackling these API challenges is what we do best—so you don’t have to. Take Notion, for example—its API delivers content in fragmented blocks, requiring pagination. We’ve solved this by spooling content in memory and seamlessly merging it into a single record, all thanks to RapidBridge (also known as Sync Job).
The code snippet below shows how our Notion Sync Job works: it fetches pages, recursively retrieves their content, spools the data, and merges everything into a unified record for each page.
{ "type":"request", "name":"list-pages", "resource":"knowledge-base/pages", "method":"list", "integrated_account_id":"{{args.integrated_account_id}}" }, { "type":"add_context", "name":"add-page-name", "depends_on":"list-pages", "config":{ "expression":"{ \"page_id\": resources.`knowledge-base`.pages.id, \"page_title\": resources.`knowledge-base`.pages.title, \"page_url\": resources.`knowledge-base`.pages.urls[type=\"view\"].url }" } }, { "type":"request", "name":"get-page-content", "resource":"knowledge-base/page-content", "method":"list", "depends_on":"list-pages", "query":{ "page":{ "id":"{{resources.knowledge-base.pages.id}}" } }, "recurse":{ "if":"{{resources.knowledge-base.page-content.has_children:bool}}", "config":{ "query":{ "page_content_id":"{{resources.knowledge-base.page-content.id}}" } } }, "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"remove-remote-data", "type":"transform", "config":{ "expression":"resources.`knowledge-base`.`page-content`.$sift(function($v, $k) { $k != remote_data })" }, "depends_on":"get-page-content" }, { "name":"all-page-content", "type":"spool", "depends_on":"remove-remote-data" }, { "name":"combine-page-content", "type":"transform", "config":{ "expression":"{ \"file_id\": page_id, \"file_name\": page_title, \"content\": \"# \" & page_title & \"\n\n\" & $reduce($sortNodes(resources.`knowledge-base`.`page-content`, \"id\", \"parent.id\"), function($acc, $v) { $acc & $v.body.content }, \"\" ) }" }, "depends_on":"all-page-content" }
SharePoint
With SharePoint, the process is a bit different—we need to download and parse the file to extract its content.
{ "type":"request", "name":"list-files", "resource":"file-storage/drive-items", "method":"get", "integrated_account_id":"{{args.integrated_account_id}}", "id":"{{drive_items.id}}", "query":{ "drive":{ "id":"{{drive_items.parentReference.driveId}}" }, "workspace":{ "id":"{{drive_items.parentReference.sharepointIds.siteId}}" } }, "loop_on":"drive_items" }, { "type":"add_context", "name":"add-file-name", "depends_on":"list-files", "config":{ "expression":"{'file_name': resources.`file-storage`.`drive-items`.name, 'file_id': resources.`file-storage`.`drive-items`.id, 'file_type': resources.`file-storage`.`drive-items`.mime_type, 'file_size': resources.`file-storage`.`drive-items`.size, 'file_url': resources.`file-storage`.`drive-items`.urls[type=\"self\"].url, 'aws_file_path': $join([tenant_id,resources.`file-storage`.`drive-items`.workspace.id, resources.`file-storage`.`drive-items`.drive.id],\"/\") }" } }, { "type":"request", "name":"download-file", "resource":"file-storage/drive-items", "method":"download", "depends_on":"list-files", "query":"{'file_url': resources.`file-storage`.`drive-items`.urls[type = 'download'].url, 'truto_response_format': 'stream'}", "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"tee-stream", "type":"transform", "config":{ "expression":"{ \"file_streams\": $teeStream(resources.`file-storage`.`drive-items`) }" }, "depends_on":"download-file" }, { "name":"transform-file-content", "type":"transform", "config":{ "expression":"{ 'file_content': $parseDocument(resources.`file-storage`.`drive-items`[0].file_streams[0], file_type)}" }, "depends_on":"tee-stream" }
Challenge #3 – Embeddings Generation
Once you’ve got the content, the next challenge is generating embeddings. But can you process the entire content in one go? Of course not. You’ll need to split it into chunks first. (Yeah, we get the "pain".)
The following Sync job node demonstrates how we chunk the page content using the recursiveCharacterTextSplitter method from langchain/textsplitters.
{ "name":"file-content-chunk", "type":"add_context", "config":{ "expression":"{'file_content_chunk': $recursiveCharacterTextSplitter(resources.`file-storage`.`drive-items`[0].file_content) }" }, "depends_on":"transform-file-content" }
Remember, you're not locked into this specific chunking method—you can easily swap it out for your own custom approach if needed.
Next, we move on to generating embeddings for these chunks. The Sync job node below leverages the Cohere Embed API with the embed-multilingual-light-v3.0 model to generate the embeddings:
{ "name":"generate-embeddings", "type":"transform", "config":{ "expression":"{ 'content_embeddings' : $generateEmbeddingsCohere({'model': 'embed-multilingual-light-v3.0', 'input_type': 'search_document', 'embedding_types': ['float'], 'texts': file_content_chunk }, args.cohere_api_key)}" }, "depends_on":"transform-file-content" }
This approach is highly flexible—change the model attribute to use a different Cohere model or swap out the $generateEmbeddingsCohere() method to use another provider like OpenAI.
Challenge #4 - Storing embeddings
Now we reach the stage where the real magic unfolds: storing the embeddings in your own vector database. This is a crucial step, as it enables rapid similarity searches and efficient retrieval of your processed data.
In the example below, we form the payload in accordance with Qdrant’s API request body schema. The subsequent node then calls the upsertPoints method of the Qdrant datastore, ensuring your embeddings are correctly indexed and stored.
{ "name":"qdrant-config", "type":"transform", "config":{ "expression":"{ 'qdrant_config': ($texts:= resources.`file-storage`.`drive-items`[0].content_embeddings.texts; $embeddings:= resources.`file-storage`.`drive-items`[0].content_embeddings.embeddings.float; $file_id:= file_id; $tenant_id:= $toNumber(tenant_id); $file_url:= file_url; $site_id:= site_id; $aws_file_path:= aws_file_path; {'query': {'ordering': 'strong'}, 'body': {'points': $texts#$i.{ 'id': $uuid(), 'payload': {'content': $, 'chunk_number': $i, 'organization_user_file_id': $file_id, 'user_id': $tenant_id, 'external_url': $file_url, 'updated_at': $now(), 'site_id': $site_id, 'saved_filename':$aws_file_path }, 'vector': $embeddings[$i]}}})}" }, "depends_on":"generate-embeddings" }, { "name":"qdrant-db", "type":"destination", "destination_type":"datastore", "method":"upsertPoints", "config":{ "id":"{{args.qdrant_datastore_id}}", "config":{ "query":"{{payload.records.0.qdrant_config.query:json}}", "body":"{{payload.records.0.qdrant_config.body:json}}" } }, "run_if":"$exists(args.qdrant_datastore_id)", "resources_to_persist":[ "qdrant-config" ] }
It also incorporates the actual content, the chunk number, the integration's file URL, and a reference to the original file in your object storage. This method ensures your data remains organized, easily searchable, and scalable as it grows.
Challenge #5 - Saving the Original File
It is not done until it is. Retaining the source file is crucial, especially if you plan to display it as part of your chatbot's response.
Truto makes this easy by supporting file storage in Google Cloud Storage and any S3-compatible object storage. Once you configure your datastore through the Truto console, the Sync job node below handles uploading the file to the specified path in your storage system.
{ "name":"s3-storage", "type":"destination", "destination_type":"datastore", "method":"uploadObject", "config":{ "id":"{{args.s3_datastore_id}}", "config":{ "path":"{{aws_file_path}}", "file_name":"{{file_name}}", "content":"{{payload.records.0.file_streams.1}}" } }, "run_if":"$exists(args.s3_datastore_id)", "resources_to_persist":[ "tee-stream" ] }
Getting Started
By now, you might be convinced that Truto is your ultimate solution for integrations and RAG. Truto isn't just a single service—it’s a comprehensive package that supports every step of your workflow.
Still not convinced? Consider this: Truto’s Unified APIs empower you to perform virtually any action your users request. For example:
User Prompt: "Give me a list of Jira tickets assigned to me."
AI Agent Response:
- Ticket #1 – Backlogged 
- Ticket #2 – Pending from engineering 
- Ticket #3 – Resolved … and so on. 
User Prompt: "Leave a comment on all tickets pending from engineering, asking for follow-ups."
AI Agent Response: Calls the CREATE ticketing/comments API for all pending tickets.
Sounds amazing, right? Don’t wait any longer—experience the power of Truto for yourself and transform the way you manage integrations and data. Get started today!
Introduction
Building a RAG system that trains on a URL or file upload is straightforward. But does this approach truly meet your need for an AI chatbot that can access and answer questions using internal data from Confluence pages, Jira tickets, Notion pages, or other SaaS tools?
I assume the answer is no, so you may want to explore using a RAG provider along with connectors. Here are a few key questions to consider while looking for a RAG provider:
- Do they support all the connectors your customers need? 
- Can they quickly add new connectors before your lead loses momentum? 
- Do they offer the flexibility to choose your own embedding model and vector database? 
With Truto, building native integrations is effortless—we've perfected them. Now, we've also solved your RAG challenges, enabling seamless data syncing from virtually any SaaS tool. (Fun fact: We already support 350+ integrations, allowing instant data syncing to your vector database, and even enabling data write-backs based on user actions.)
Challenge #1 - File or Page selection
You wouldn’t want your AI model—or any third-party provider—accessing sensitive internal data, so it's best to restrict syncing to specific files or pages. While apps like Google Drive, SharePoint, and Box provide native file pickers, what about integrations that don’t?
Truto solves this with RapidForm, allowing users to select exactly which files and pages to sync during the connection process. Plus, we support native file pickers for Google Drive and SharePoint (with more on the way!) to ensure a seamless user experience.
Check out our change log for a sneak peek at the native file pickers: Truto Change Log.
Challenge #2 - Content retrieval
APIs can be unpredictable, each with its own quirks in handling responses. For example, the Confluence API delivers page content as a single record, the Notion API structures content in blocks requiring pagination, and the SharePoint API retrieves only the file itself—not its content.
Notion
At Truto, tackling these API challenges is what we do best—so you don’t have to. Take Notion, for example—its API delivers content in fragmented blocks, requiring pagination. We’ve solved this by spooling content in memory and seamlessly merging it into a single record, all thanks to RapidBridge (also known as Sync Job).
The code snippet below shows how our Notion Sync Job works: it fetches pages, recursively retrieves their content, spools the data, and merges everything into a unified record for each page.
{ "type":"request", "name":"list-pages", "resource":"knowledge-base/pages", "method":"list", "integrated_account_id":"{{args.integrated_account_id}}" }, { "type":"add_context", "name":"add-page-name", "depends_on":"list-pages", "config":{ "expression":"{ \"page_id\": resources.`knowledge-base`.pages.id, \"page_title\": resources.`knowledge-base`.pages.title, \"page_url\": resources.`knowledge-base`.pages.urls[type=\"view\"].url }" } }, { "type":"request", "name":"get-page-content", "resource":"knowledge-base/page-content", "method":"list", "depends_on":"list-pages", "query":{ "page":{ "id":"{{resources.knowledge-base.pages.id}}" } }, "recurse":{ "if":"{{resources.knowledge-base.page-content.has_children:bool}}", "config":{ "query":{ "page_content_id":"{{resources.knowledge-base.page-content.id}}" } } }, "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"remove-remote-data", "type":"transform", "config":{ "expression":"resources.`knowledge-base`.`page-content`.$sift(function($v, $k) { $k != remote_data })" }, "depends_on":"get-page-content" }, { "name":"all-page-content", "type":"spool", "depends_on":"remove-remote-data" }, { "name":"combine-page-content", "type":"transform", "config":{ "expression":"{ \"file_id\": page_id, \"file_name\": page_title, \"content\": \"# \" & page_title & \"\n\n\" & $reduce($sortNodes(resources.`knowledge-base`.`page-content`, \"id\", \"parent.id\"), function($acc, $v) { $acc & $v.body.content }, \"\" ) }" }, "depends_on":"all-page-content" }
SharePoint
With SharePoint, the process is a bit different—we need to download and parse the file to extract its content.
{ "type":"request", "name":"list-files", "resource":"file-storage/drive-items", "method":"get", "integrated_account_id":"{{args.integrated_account_id}}", "id":"{{drive_items.id}}", "query":{ "drive":{ "id":"{{drive_items.parentReference.driveId}}" }, "workspace":{ "id":"{{drive_items.parentReference.sharepointIds.siteId}}" } }, "loop_on":"drive_items" }, { "type":"add_context", "name":"add-file-name", "depends_on":"list-files", "config":{ "expression":"{'file_name': resources.`file-storage`.`drive-items`.name, 'file_id': resources.`file-storage`.`drive-items`.id, 'file_type': resources.`file-storage`.`drive-items`.mime_type, 'file_size': resources.`file-storage`.`drive-items`.size, 'file_url': resources.`file-storage`.`drive-items`.urls[type=\"self\"].url, 'aws_file_path': $join([tenant_id,resources.`file-storage`.`drive-items`.workspace.id, resources.`file-storage`.`drive-items`.drive.id],\"/\") }" } }, { "type":"request", "name":"download-file", "resource":"file-storage/drive-items", "method":"download", "depends_on":"list-files", "query":"{'file_url': resources.`file-storage`.`drive-items`.urls[type = 'download'].url, 'truto_response_format': 'stream'}", "integrated_account_id":"{{args.integrated_account_id}}" }, { "name":"tee-stream", "type":"transform", "config":{ "expression":"{ \"file_streams\": $teeStream(resources.`file-storage`.`drive-items`) }" }, "depends_on":"download-file" }, { "name":"transform-file-content", "type":"transform", "config":{ "expression":"{ 'file_content': $parseDocument(resources.`file-storage`.`drive-items`[0].file_streams[0], file_type)}" }, "depends_on":"tee-stream" }
Challenge #3 – Embeddings Generation
Once you’ve got the content, the next challenge is generating embeddings. But can you process the entire content in one go? Of course not. You’ll need to split it into chunks first. (Yeah, we get the "pain".)
The following Sync job node demonstrates how we chunk the page content using the recursiveCharacterTextSplitter method from langchain/textsplitters.
{ "name":"file-content-chunk", "type":"add_context", "config":{ "expression":"{'file_content_chunk': $recursiveCharacterTextSplitter(resources.`file-storage`.`drive-items`[0].file_content) }" }, "depends_on":"transform-file-content" }
Remember, you're not locked into this specific chunking method—you can easily swap it out for your own custom approach if needed.
Next, we move on to generating embeddings for these chunks. The Sync job node below leverages the Cohere Embed API with the embed-multilingual-light-v3.0 model to generate the embeddings:
{ "name":"generate-embeddings", "type":"transform", "config":{ "expression":"{ 'content_embeddings' : $generateEmbeddingsCohere({'model': 'embed-multilingual-light-v3.0', 'input_type': 'search_document', 'embedding_types': ['float'], 'texts': file_content_chunk }, args.cohere_api_key)}" }, "depends_on":"transform-file-content" }
This approach is highly flexible—change the model attribute to use a different Cohere model or swap out the $generateEmbeddingsCohere() method to use another provider like OpenAI.
Challenge #4 - Storing embeddings
Now we reach the stage where the real magic unfolds: storing the embeddings in your own vector database. This is a crucial step, as it enables rapid similarity searches and efficient retrieval of your processed data.
In the example below, we form the payload in accordance with Qdrant’s API request body schema. The subsequent node then calls the upsertPoints method of the Qdrant datastore, ensuring your embeddings are correctly indexed and stored.
{ "name":"qdrant-config", "type":"transform", "config":{ "expression":"{ 'qdrant_config': ($texts:= resources.`file-storage`.`drive-items`[0].content_embeddings.texts; $embeddings:= resources.`file-storage`.`drive-items`[0].content_embeddings.embeddings.float; $file_id:= file_id; $tenant_id:= $toNumber(tenant_id); $file_url:= file_url; $site_id:= site_id; $aws_file_path:= aws_file_path; {'query': {'ordering': 'strong'}, 'body': {'points': $texts#$i.{ 'id': $uuid(), 'payload': {'content': $, 'chunk_number': $i, 'organization_user_file_id': $file_id, 'user_id': $tenant_id, 'external_url': $file_url, 'updated_at': $now(), 'site_id': $site_id, 'saved_filename':$aws_file_path }, 'vector': $embeddings[$i]}}})}" }, "depends_on":"generate-embeddings" }, { "name":"qdrant-db", "type":"destination", "destination_type":"datastore", "method":"upsertPoints", "config":{ "id":"{{args.qdrant_datastore_id}}", "config":{ "query":"{{payload.records.0.qdrant_config.query:json}}", "body":"{{payload.records.0.qdrant_config.body:json}}" } }, "run_if":"$exists(args.qdrant_datastore_id)", "resources_to_persist":[ "qdrant-config" ] }
It also incorporates the actual content, the chunk number, the integration's file URL, and a reference to the original file in your object storage. This method ensures your data remains organized, easily searchable, and scalable as it grows.
Challenge #5 - Saving the Original File
It is not done until it is. Retaining the source file is crucial, especially if you plan to display it as part of your chatbot's response.
Truto makes this easy by supporting file storage in Google Cloud Storage and any S3-compatible object storage. Once you configure your datastore through the Truto console, the Sync job node below handles uploading the file to the specified path in your storage system.
{ "name":"s3-storage", "type":"destination", "destination_type":"datastore", "method":"uploadObject", "config":{ "id":"{{args.s3_datastore_id}}", "config":{ "path":"{{aws_file_path}}", "file_name":"{{file_name}}", "content":"{{payload.records.0.file_streams.1}}" } }, "run_if":"$exists(args.s3_datastore_id)", "resources_to_persist":[ "tee-stream" ] }
Getting Started
By now, you might be convinced that Truto is your ultimate solution for integrations and RAG. Truto isn't just a single service—it’s a comprehensive package that supports every step of your workflow.
Still not convinced? Consider this: Truto’s Unified APIs empower you to perform virtually any action your users request. For example:
User Prompt: "Give me a list of Jira tickets assigned to me."
AI Agent Response:
- Ticket #1 – Backlogged 
- Ticket #2 – Pending from engineering 
- Ticket #3 – Resolved … and so on. 
User Prompt: "Leave a comment on all tickets pending from engineering, asking for follow-ups."
AI Agent Response: Calls the CREATE ticketing/comments API for all pending tickets.
Sounds amazing, right? Don’t wait any longer—experience the power of Truto for yourself and transform the way you manage integrations and data. Get started today!
In this article
Content Title
Content Title
Content Title
ON THIS PAGE
RAG simplified with Truto
More from our Blog
Product Updates
Why Sprinto opted for Truto despite already using a leading Unified API
Sprinto chose Truto to ship PrimePay HRIS integration, despite having another Unified API in place. Speed, reliability, and flexibility are non-negotiable when in compliance.
Product Updates
Why Sprinto opted for Truto despite already using a leading Unified API
Sprinto chose Truto to ship PrimePay HRIS integration, despite having another Unified API in place. Speed, reliability, and flexibility are non-negotiable when in compliance.
Product Updates
Why Sprinto opted for Truto despite already using a leading Unified API
Sprinto chose Truto to ship PrimePay HRIS integration, despite having another Unified API in place. Speed, reliability, and flexibility are non-negotiable when in compliance.
Product Updates
AI-Ready Integrations on Truto
Truto’s AI-ready integrations make your APIs accessible to AI agents and MCP servers, helping you deliver secure and consistent AI features without the extra work.

Product Updates
AI-Ready Integrations on Truto
Truto’s AI-ready integrations make your APIs accessible to AI agents and MCP servers, helping you deliver secure and consistent AI features without the extra work.

Product Updates
AI-Ready Integrations on Truto
Truto’s AI-ready integrations make your APIs accessible to AI agents and MCP servers, helping you deliver secure and consistent AI features without the extra work.

Product Updates
Introducing Truto Agent Toolsets
Newest offering of Truto SuperAI. It helps teams using Truto convert the existing integrations endpoints into tools usable by LLM agents.

Product Updates
Introducing Truto Agent Toolsets
Newest offering of Truto SuperAI. It helps teams using Truto convert the existing integrations endpoints into tools usable by LLM agents.

Product Updates
Introducing Truto Agent Toolsets
Newest offering of Truto SuperAI. It helps teams using Truto convert the existing integrations endpoints into tools usable by LLM agents.

Take back focus where it matters. Let Truto do integrations.
Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.
Take back focus where it matters. Let Truto do integrations.
Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.
Take back focus where it matters. Let Truto do integrations.
Learn more about our unified API service and solutions. This is a short, crisp 30-minute call with folks who understand the problem of alternatives.
Developers
Developers
Developers
ATS
Conversational Intelligence
Default
Helpdesk
HRIS
Event Management
Marketing Automation
Remote Support
Ticketing
Did our integrations roster hit the spot?
© Yin Yang, Inc. 2024. All rights reserved.
9450 SW Gemini Dr, PMB 69868, Beaverton, Oregon 97008-7105, United States
ATS
Conversational Intelligence
Default
Event Management
Helpdesk
HRIS
Marketing Automation
Remote Support
Ticketing
Did our integrations roster hit the spot?
© Yin Yang, Inc. 2024. All rights reserved.
9450 SW Gemini Dr, PMB 69868, Beaverton, Oregon 97008-7105, United States
ATS
Conversational Intelligence
Default
Helpdesk
HRIS
Event Management
Marketing Automation
Remote Support
Ticketing
Did our integrations roster hit the spot?
© Yin Yang, Inc. 2024. All rights reserved.
9450 SW Gemini Dr, PMB 69868, Beaverton, Oregon 97008-7105, United States