The Civis AI: Taxonomer tool uses generative AI to categorize text data stored in your database hosted on Civis Platform.. You only need to tell it where your data is and what categories you want to label the data with.
Taxonomer jobs are scripts in Civis Platform, so they can be shared, automated, cloned, etc. They can also be configured and run in Workflows or via the API, as discussed below.
|
The Civis AI: Taxonomer script template works as a bridge between the Civis Platform and third-party generative AI models. Currently, Taxonomer relies on the following models:
|
Note that Civis provides another template (Civis AI: Core) to interact more directly with a generative AI model. You can also use that template for text categorization since you are able to specify a custom prompt, though Civis AI: Taxonomer is focused solely on text categorization and is designed to be easier to use for that purpose.
Usage
To use the script template, click the “Taxonomer” link in the “Civis AI” menu:
Input Data
After creating a Taxonomer job, you’ll need to specify what data it should process.
You’ll first need to select a database and credential to use. These may be automatically selected if there is only one option available.
You’ll then specify an input schema, a table in that schema, and the column in that table that contains the text data that you want to categorize.
You’ll also need to specify a column with unique ID values so that you can join back the output to your input table.
If you don’t have an ID column, you can create a table with one using a SQL statement like the following:
|
create table as schema123.table_with_ids as select row_number() over(order by response_text) as response_id, response_text from schema123.table_without_ids; |
Prompt Configuration
You’ll also need to configure the AI model prompt by providing information about your task and how to categorize the data.
You’ll need to specify the “Text Description”, which provides some background information to the model. For example, this might be something like “A response to a survey question asking one’s opinion about funding for national parks” or “The text of an advertisement promoting the work of such-and-such organization.” You can also provide longer descriptions if you wish to provide more context about the task.
You’ll also need to specify categories by clicking the “Add New Category” button. Each category requires a label and a description. The category label is what the tool will provide in its output for each input text, and the category description provides information to the model about what the label means.
Optionally, you can provide examples of what the inputs might look like for each category. Examples aren’t necessary, but if you notice some mistakes in your output, adding examples may improve results by providing additional guidance to the model.
Here is an example showing categories a hypothetical task of categorizing responses to a question about supporting funding for national parks:
Previewing and Iteration
By default, Taxonomer will process a small sample of data and show you a preview so that you can check the results and iteratively refine your prompt configuration. As such, it does not require you to specify an output table.
Once you’ve specified your categories as described above, you can click the “Run” button to run the job. This will take you to a run details page that will show a preview of the results on a small sample of records from your input table. Once the run is finished, the preview results will show up with IDs, input texts, and the output labels produced by the AI model. The preview results usually take about a minute to appear if you haven’t specified an output table.
The Taxonomer run details page will also show you the settings used for the run, the logs, and usage information.
At the top of the run details page, there is a toggle that allows you to see the “Run Arguments” (i.e., the arguments specified for the run) or the “Current Arguments” (i.e., the job’s current configuration, regardless of which run you are looking at). The current arguments are editable, so you can adjust your prompt configuration as you inspect the preview results.
For example, you might notice that a particular text is miscategorized, and you could add it as an example for the appropriate category. You could then click the run button to create another run and see if adding the example improved the results.
Outputting Results
To process the entire input table, you can uncheck the box labeled “Only Process A Sample”. You must then also specify the names of an output schema and table to save the results to. Note that when processing the entire input table, the preview interface will only show up to 100 results.
You can also control what happens if the output table already exists with the “If Table Exists” setting. The options are “fail”, “append” (to add rows to the table), “drop” (to drop and recreate the table), and “truncate” (to remove all existing rows before adding the current run’s output). The default is “fail”.
Selecting a Model
We recommend using the default model (currently Meta Llama 4 Maverick) since it is the lowest cost option in terms of credit usage but still seems to provide good results for tasks that we’ve tested it with. However, you could also try one of the other options to see if they produce better results. Anthropic Claude Sonnet 4 is currently the most costly option available. Claude 3.5 Haiku is cheaper than Sonnet 4 and only slightly more costly than Llama 4 Maverick. If your selected model becomes deprecated, Taxonomer will use a newer model in the same family.
Usage via the API and Workflows
Taxonomer runs as a custom script from a script template in Civis Platform, so it can be configured or executed via the Civis API or in Civis Workflows. For example, if you’ve configured a Taxonomer job, you can include it in a Workflow as follows:
|
version: '2.0' workflow: tasks: taxonomer_task: action: civis.run_job input: job_id: 12345 # Use your job ID here. |
The Taxonomer job ID is at the end of the URL for the job page (e.g., 12345 in “https://platform.civisanalytics.com/spa/#/civis-ai/taxonomer/12345”).
Since Taxonomer is a custom script, it can be accessed like any other custom script using the Civis API. Creating and/or updating a Taxonomer job through the Civis API may be useful for jobs that involve a large number of categories.
import civis
import yaml
client = civis.APIClient()
# Specify a database id.
# This example uses "Civis Database"
DATABASE_ID = 326
# Specify categories. The categories are hard-coded here but they can be derived from a table or programmatically
categories = [
"Health & Human Services",
"Community & Economic Development",
"Education, Arts & Humanities",
"Environment & Agriculture"
]
# Create the category object taxonomer expects. Descriptions are required.
categories_with_description = [
{
"label": category,
"description": f"The category {category}",
"examples": [],
}
for category in categories
]
# Patch the arguments for the job.
# Comment out values that you don't want to update
arguments = {
'INPUT_OUTPUT_DATABASE': {'database': DATABASE_ID, 'credential': client.default_database_credential_id },
'INPUT_SCHEMA': 'taxonomer',
'INPUT_TABLE': 'grants',
'INPUT_TEXT_COL': 'opportunity_title',
'INPUT_ID_COL': 'opportunity_id',
'TEXT_DESCRIPTION': 'Some Task text',
'CATEGORIES': yaml.dump(categories_with_description, indent=2, sort_keys=False)
}
# Create a new custom script from the Taxonomer template
taxonomer = client.scripts.post_custom(from_template_id=271710, arguments=arguments)
# Print and paste the url into the browser to look at the job in the UI
print(f"https://platform.civisanalytics.com/spa/#/civis-ai/taxonomer/{taxonomer.id}")
# Optional: Run the job
future = civis.utils.run_job(taxonomer.id)
# Wait for the job to complete
future.result()
Civis AI Taxonomer: Video Walkthrough
Limitations
There are some important limitations of this tool to consider.
- Usage is capped for your organization. Each month, you will have a specific number of credits available to use across Civis AI tools. As an example, using Taxonomer to categorize 1,000 short texts with the default model (Meta Llama 4 Maverick) might use about 25 credits, though the exact number will depend on your data and prompt configuration. The “Usage” tab in the run details page provides information about the usage of the current run as well as your organization’s current usage level. Please reach out to support@civisanalytics.com if you are interested in increasing the limit.
- AI models can occasionally generate inaccurate information. We advise caution when using AI tools, especially for high-stakes tasks, and recommend human review of output.
Comments
0 comments
Please sign in to leave a comment.