Using the Page Splitter Property

Introduction

The Page Splitter property is designed to help you work with large documents that exceed the context window of Large Language Models (LLMs). By dividing a long file—such as a PDF—into smaller, manageable documents, you can ensure that each portion can be processed efficiently by downstream properties or agents.

Why Use a Page Splitter Property?

Overcome LLM Context Limits: Break up large files so each part can be processed by an LLM or other tools.
Map-Reduce Workflows: Analyze each split document independently, then aggregate or summarize results at a higher level.
Clarity and Organization: Each split document is named to indicate the range of pages it contains (e.g., Annual_Report_pages_1-5.pdf), making it easy to track and reference.
Flexible Integration: Use split documents as input for downstream properties or agents, or process them within the collection for parallel analysis.

Output and Naming

After processing, the Page Splitter property outputs a collection of new documents. Each split document is named to indicate the range of pages it contains, making it easy to identify and organize your workflow. For example:

Annual_Report_pages_1-5.pdf
Annual_Report_pages_6-10.pdf
...and so on.

Example Use Cases

Summarizing a lengthy annual report by splitting it into 10-page documents and summarizing each part.
Extracting key data from a large contract by processing each split document separately, then combining the results.
Running entity extraction or classification on each split document and aggregating findings for a comprehensive overview.

Step 1: Add a Page Splitter Property

Add a new property to your agent and set its type to Page Splitter.
Name it clearly, e.g., Document Page Splitter or PDF Splitter.
Specify how many pages you want in each split document (e.g., 5 pages per split).
Select the file input (such as uploading a PDF or referencing another file property).

Step 2: Add Properties Within the Collection

Once the Page Splitter property is set up, you can add properties inside the collection. For example, add a summarization or extraction property to each split document.
Each property you add here will operate independently on each split document, allowing for parallel processing and analysis.

Step 3: Map-Reduce: Aggregate Results Back to the Main Agent

After processing each split document within the collection, you can aggregate or summarize the results at the top level of your agent.
This could involve combining summaries, extracting key data points, or generating a final report that synthesizes the outputs from all split documents.

The Page Splitter property is your go-to for breaking down large documents into manageable, clearly named files that can be processed individually or in aggregate, enabling efficient and scalable document workflows.