V7 Go is powered by a selection of LLMs and foundation models. Let’s breakdown each of the available tools.

📘

OCR and audio extraction

Whenever text or audio are detected in an input file, V7 Go will automatically use an OCR (Optical Character Recognition) or AST (Automatic Speech Recognition) model to extract text, regardless of which AI Tool is selected. This means that audio and text extraction are implicit and optimized for minimal token usage.

Text Models

Plug any of the models below into V7 Go properties to process, understand, and generate human-like text. Check out our prompting guide here to get the most out of each model.

GPT 3.5 Turbo

Developed by: OpenAI (check out OpenAI’s GPT 3.5 Turbo docs here)

Inputs: Text

Outputs: Text, JSON, single-select, multi-select

Using GPT 3.5 Turbo in V7 Go: Another of OpenAI’s language models, GPT 3.5 Turbo is a variant on GPT 3.5 optimized for speed and efficiency. Use GPT 3.5 for projects where speed and efficiency are priorities. GPT 3.5 can be used for any of the tasks where GPT 4 would be used, but the quality of the model’s responses is likely to be slightly lower for complex tasks.

Gemini Pro

Developed by: Google (check out Google's Gemini landing page here)

Inputs: Text

Outputs: Text, JSON, single-select, multi-select

Using Gemini Pro in V7 Go: Gemini Pro is Google DeepMind’s answer to OpenAI’s language models like GPT 4. It is capable of advanced language understanding, lagging very slightly behind GPT 4’s performance on document understanding. Use Gemini pro to perform complex text analysis tasks.

Vision Models

Plug the AI Tools below into V7 Go properties to analyze visual data and extract either raw text data (in the case of OCR), or generated text interpretations (in the case of large language model tools).

Gemini Pro Vision

Developed by: Google (check out Google's Gemini landing page here)

Inputs: Text, Images, PDF documents

Outputs: Text, JSON, single-select, multi-select

Using Gemini Pro Vision in V7 Go: Gemini Pro Vision uses the Gemini Pro model’s multimodal capabilities to understand images and extract information from them in plain text. While Gemini Pro Vision can be used in place of an OCR model to extract text, it’s more efficiently used to extract contextual information from images. Gemini Pro Vision is roughly equivalent in performance to GPT 4 Vision in natural image understanding. Use Gemini Pro to analyse visual data and extract generated interpretations in text.

Multimodal Models


GPT 4o

Developed by: OpenAI (check out OpenAI’s GPT 4 docs here)

Inputs: Text, Images, PDF documents, Audio

Outputs: Text, JSON, single-select, multi-select

Using GPT 4 in V7 Go: GPT 4 Omni is the most advanced of OpenAI’s language models available in V7 Go. It was trained across text, video, and audio, and it offers slight improvements over GPT 4 turbo in text tasks with increased speed and efficiency. Use GPT 4 Omni to perform complex text or multi-modal tasks where accuracy is paramount.

GPT 4 Turbo

Developed by: OpenAI (check out OpenAI’s GPT 4 docs here)

Inputs: Text, Images, PDF documents, Audio

Outputs: Text, JSON, single-select, multi-select

Using GPT 4 in V7 Go: GPT 4 is OpenAI's second most powerful model behind GPT 4o. It has a 128k context window, and improved language understanding and more accurate responses over GPT 3.5 Turbo, but is less optimized for speed and efficiency. Use GPT 4 to perform complex text analysis as well as visual tasks where accuracy is paramount.

Gemini 1.5 Pro

Developed by: Google (check out Google's Gemini landing page here)

Inputs: Text, Images, PDF documents, Audio

Outputs: Text, JSON, single-select, multi-select

Using Gemini 1.5 Pro in V7 Go: Gemini 1.5 Pro is Google DeepMind’s Most advanced LLM. It is capable of advanced language understanding and multi-modal capability. Use Gemini pro to perform complex text analysis as well as visual tasks where accuracy is paramount.

Claude 3 Opus

Developed by: Anthropic (Check out Anthropic's Claude langing page here)

Inputs: Text, Images, PDF documents

Outputs: Text, JSON, single-select, multi-select

Using Claude 3 Opus in V7 Go: Opus is Anthropic's highest performing model, and competes with or outperforms GPT 4 and Gemini Ultra on most text-based tasks. Opus can be used with text inputs as well as image data. Use Claude 3 Opus to perform complex text analysis as well as visual tasks where accuracy is paramount.

Claude 3 Sonnet

Developed by: Anthropic (Check out Anthropic's Claude langing page here)

Inputs: Text, Images, PDF documents

Outputs: Text, JSON, single-select, multi-select

Using Claude 3 Sonnet in V7 Go: Sonnet is the mid tier of Anthropic's models available on V7 Go and offers a balance between price and speed, compared to Claude 3 Opus and Haiku. It can be used with text inputs as well as image data. Use Claude 3 Sonnet to perform text analysis and visual tasks to optimise for both performance and Go Token cost.

Claude 3 Haiku

Developed by: Anthropic (Check out Anthropic's Claude langing page here)

Inputs: Text, Images, PDF documents

Outputs: Text, JSON, single-select, multi-select

Using Claude 3 Haiku in V7 Go: Haiku is the fastest and cheapest model of the Claude 3 family. Like Opus and Sonnet, Haiku is multimodal and can reason across text as well as images. Use Claude 3 Haiku where speed and Token cost are a priority.

Are we missing a tool? Let us know!