Civis provides the Civis AI: Core script templates (Anthropic Claude version, Meta Llama 3 version) to securely and easily interact with generative AI using your data in Civis Platform. The templates allow you to send text to a state-of-the-art language model and receive responses (e.g., answers to questions, labels for documents you want to categorize, a summary of a collection of documents). The “Usage Patterns” section below provides some examples of ways you can use the template.
Links to these templates can be found in the “Civis AI” menu.
Generative AI
Generative AI has the potential to automate a variety of tasks involving text data. Current models simulate the understanding of human language and can follow complex directions and leverage extensive knowledge in order to solve such tasks. For example, one might ask an AI model to examine free text notes about organization members and identify those who are likely to donate or respond to outreach.
The Civis AI: Core script templates works as a bridge between the Civis Platform and a generative AI model. Currently, there are versions that provide access to two families of models:
- Anthropic’s Claude models (Sonnet 3.5 by default). When using this template, you agree to abide by Anthropic’s Acceptable Use Policy, which may be revised and/or updated.
- Meta’s Llama 3 models (Llama 3.2 by default). When using this template, you agree to abide by Meta’s Acceptable Use Policy and Responsible Use Guide. Llama 3 is licensed under the Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
While generative AI models are highly flexible and can respond to various inputs (i.e., “prompts”), one can often achieve better results by considering model-specific guidelines for writing prompts (i.e., for “prompt engineering”). To improve results, we encourage users to read through the prompt engineering resources for Anthropic Claude here, or for Meta Llama 3 here.
Usage Basics
To use the script template, click the name of the version which you’d like to use under the “Civis AI” menu.
On the resulting Custom Script page, you can specify what to send to the AI model in the “User Message” field. The “Usage Patterns” section below provides some examples.
If you want to include input from a text column in a database table, you can include the strings {{{ single_text }}} or {{{ all_texts }}} in your prompt. You will also need to specify information about the table, as described below.
If you specify {{{ single_text }}}, then the Script will process a single text value at a time. It will replace that placeholder string with, e.g., <text>An example text value from a row in your input table</text>. It will iterate over the rows in your input table, make calls to the AI model, and then provide an output text value for each input text.
If you specify {{{ all_texts }}}, then the Script will process all the texts at once and provide a single output. In the call made to the AI model, the placeholder string {{{ all_texts }}} will be replaced by a string of multiple texts from your input table (e.g., <text>The first text</text> <text>The second text</text> …).
Your data should not be included in the prompts directly. Instead, it should be incorporated via the {{{ all_texts }}} or {{{ single_text }}} placeholders.
All of the parameters other than “User Message” are optional:
- System Prompt: You may want to add a system prompt to provide context, instructions, or guidelines to the model. For the Claude version of the template, see this page.
- Assistant Response Prefix (Claude version only): If you want to ensure that responses from the model begin a certain way, you can specify this, as described here in the documentation for Claude. For example, you could specify { here if responses should be JSON-formatted.
- Model ID: The model ID to use. We generally recommend using the default option.
- Database Name: If you are reading input from a database table or want the output to go to a database table, specify the name of the database (i.e., the name that shows up in the dropdown menu on the right of the top navigation bar).
- Input Schema and Table: To process texts from a database table, specify the table identifier (e.g., “myschema.mytable”).
- Input Text Column: To process texts from a database table, specify the name of the column containing the texts. Only a single input text column can be specified.
- Input ID Column: If you are iteratively processing texts with {{{ single_text }}}, then you’ll need to specify a column of IDs with which you can join back the output to the input.
- Output Schema and Table: Specify this parameter to have the output stored in a table. Note that one can see output in JSON format attached as a run output regardless of whether this is specified.
- If Output Table Exists: What to do if the output table exists. The options are “fail”, “append” (to add rows to the table), “drop” (to drop and recreate the table), and “truncate” (to remove all existing rows before adding the current run’s output). The default is “fail”.
- Post Run Output: If this is checked (the default), then the results will be posted as a JSON run output. You may need to uncheck this if your job produces a lot of output and is part of a workflow.
- Temperature: The amount of randomness in the response, from 0 to 1. Use values closer to 0 for more “analytical” responses (e.g., for categorization or summarization tasks) and for less variation from run to run. Use values closer to 1 to get more "creative" responses (e.g., for some types of writing assistance tasks).
- Sample Size: If you are processing inputs from a table, you can use this parameter to specify the number of texts to randomly sample from the table. Defaults to 20, so that you can test your prompt on a small sample before processing a whole table. To process all of the texts, set this to 0.
If you don’t have an ID column, you can create a table with one using a SQL statement like the following:
create table as schema123.table_with_ids as |
Usage Patterns
There are a few different ways to use the script template, as described in this section.
Sending custom text prompts
The most basic way to use the script template is to send it a prompt, without specifying an input database table of data to process. This is the simplest way to use the Script, but it’s also flexible because you can include data from your database manually if you wish to do so. As an example, here’s an example of asking the AI model to assign a category label to a survey question about hypothetical legislation to ensure funding for national parks.
System Prompt |
You are tasked with categorizing a response to a survey question asking, "After viewing this advertisement, would you consider making a donation to preserve our national parks?" Categorize the text as "positive", "negative", or "other". Respond with just a single word, without preamble. If you are not sure, say "other". <example> H: <text>No. Not my cause.</text> A: negative </example> <example> H: <text>Our national parks are a treasure and should be funded.</text> A: positive </example> <example> H: <text>not sure</text> A: other </example> |
User Message |
Here is the text to categorize: <text>I love national parks. I would definitely donate.</text> |
This prompt uses various prompt engineering guidelines, as discussed in Anthropic's documentation for the Claude model, to improve the quality of the response. In particular, it provides examples of input/output pairs, tells the model to be concise, and adds some XML formatting to clarify what text should be categorized.
The Meta Llama 3 model documentation suggests a slightly different format for providing examples, but it also seems to work well with prompts like the above.
Processing input texts one-at-a-time
In the above example, a single text was labeled. One might want to label many texts from an input table. To do that, one can specify the input table parameters and include {{{ single_text }}} in the prompt, as in the following example.
User Message |
Here is the text to categorize: {{{ single_text }}} |
The “System Prompt” would be the same as in the example above.
Note that the Civis AI: Taxonomer template is specifically for categorizing text data. That template uses a different, task-specific method. You may want to try both if you have a categorization problem to solve.
Processing many texts at once
One might also want to process many texts at once to get a summary, for example. Here is what a prompt for that might look like.
User Message |
The following texts are [DESCRIBE YOUR DATA HERE]. You will be asked to provide a summary. {{{ all_texts }}} Now please summarize the texts. |
To help improve the results, it often helps to provide additional information about the data (e.g., “The following texts are notes about interactions with supporters of my organization”). Note that since a collection of texts can be long, it is often helpful to provide specific instructions at the end, as in the example above.
Limitations
There are some important limitations of this tool to consider.
- Usage is capped for your organization, see Token Usage section below to learn more.
- AI models can’t solve every task well. This page from Anthropic provides some information about the text capabilities of the Claude model. Note that the vision capabilities listed there are not relevant for this template.
- AI models can occasionally generate inaccurate information, a phenomenon referred to as “hallucination.” While this can be reduced through prompt engineering (see, e.g., this page for Claude or this page for Llama 3), it’s important to recognize that inaccuracies can still occur. We advise caution when using AI tools, especially for high-stakes tasks, and recommend human review of output.
- There are some limits to how much text can be processed by the tool. The prompt-related parameters such as the user message are limited to approximately 130,000 characters in total, which is about 20,000 words or 50 pages. If you are using {{{ all_texts }}} in your prompt, then the length of the combination of the prompt and the data from your database can exceed that, but the Claude model itself has a length limit of 200,000 tokens, which is approximately 150,000 words or 500 pages.
Token Usage
Each month, you will have a specific number of credits available to use across Civis AI tools. As an example, we estimate that using the Core template to analyze 10,000 short texts will take about 6,000 credits, though the exact number will depend on your data and prompt. You can use the “Sample Size” feature to help you keep costs low while iterating on prompts.
Job logs include information about the usage of the current run as well as your organization’s current usage level. Please reach out to support@civisanalytics.com if you are interested in increasing the limit.
Comments
0 comments
Please sign in to leave a comment.