Civis Data Match

Civis Data can be used to enrich your organization’s data with additional information about people in your file for purposes like crosstabs, modeling, or activation.

The first step to utilizing Civis Data is to match your organization's data to a match target using the Civis Data Match job. A match target is a pre-prepared file of person level data. We generate match targets for extremely large data sets (100 million+ records) that are going to be matched against frequently (more than 5 times a month). The most common ones are the Civis Data files.

This job will output the IDs of your input data and the IDs of the match target data that are likely the same person, along with a score representing the chance that the matched records represent the same person.

Getting Started

To create a new Civis Data Match job:

Go to Tools > Enhancements
Select Civis Data Match.

Setting up the Job

Input Table: The location of the people data you want to match. Currently, only Redshift is supported.
Match Target: Select the match target that you want to use.
Output Table: Type in a destination for the results of the Civis Data Match job. This is where the job will put a table with three columns (Note: Only Redshift is supported):
- Source_id: id of a record in your table
- Matched_id: id of a record in the match target
- Score: likelihood score between 0 and 1 that the two records are the same person
- NOTE: you must write to a new table. If the destination table already exists, your job will fail.
Maximum Matches to Return: In the case that there are multiple match target records that could be the same as your record, specify how many you’d like to be included in the output table. Possible values are 1 to 10, or 0 if you’d like all matches to be returned. The job will always return the matches with the highest scores.
Match Score Threshold (Advanced): Specify the minimum score at which matches should be returned. For example, entering 0.8 will only return matches that have a score of at least 0.8. The default value is 0, which will return all matches. We recommend using this default value and filtering out low-quality matches in a later step (e.g., a SQL join with "WHERE score > 0.8"). In that later step, it's probably best to use a score cutoff above 0.5, such as 0.8, since false positives (i.e., including incorrect matches) are probably worse than false negatives (i.e., not including correct matches).
Mapping Columns for Input Table: Specify what data to use for the match. You will be given a column mapper for doing this.

Screen_Shot_2020-03-24_at_4.03.19_PM.png

Mapping a column for primary key is required.

Below are the remaining columns you can match. These columns are optional, though strongly recommended as more information will help improve the quality of the match:

First Name
Last Name
Address (or separate columns for House Number, Street, and Unit Number)
City
State Code (or State)
Zip Code
Email Address
Phone Number
Birthdate (or separate columns for Birth Month, Day, and Year)
Gender
Middle Name

Formatting your columns

Please review our documentation on formatting columns for matching enhancements.

Using the Python API Client

A Civis Data Match job can also be created programmatically via a Civis Platform API client. Here is a sample code snippet to do so with the Python client:

import civis

client = civis.APIClient()

# Create the Civis Data Match job.
job = client.enhancements.post_civis_data_match(
    name="your job name",

    max_matches=1,
    threshold=0.8,
    input_field_mapping={
        # The keys in this dict are the columns recognized by Civis
        # Data Match, whereas the values are what you have to provide
        # as the column names from your input table.

        # The "primary_key" key is required, while all other keys are
        # optional depending on what columns of your input table you'd
        # like to use.
        "primary_key": "your primary key column",

        "first_name": "your first name column",
        "middle_name": "your middle name column",
        "last_name": "your last name column",
        "name_suffix": "your name suffix column",

        # Provide either "full_address" or any combination of
        # {"house_number" + "street" + "unit"}, but not both.
        "full_address": "your full address column",

        # For other supported fields, please see here:
        # https://api.civisanalytics.com/enhancements/field-mapping
     },

    # 116 for "Civis Data (quarterly commercial)"
    # 117 for "Civis Data (quarterly voterfile)"
    # 217 for “Civis Data (monthly commercial)”
    # 219 for “Civis Data (monthly voterfile)”
    match_target_id=116,

    input_table={
        "database_name": "your redshift database for input table",
        "schema": "your schema",
        "table": "your table",
    },
    output_table={
        "database_name": "your redshift database for output table",
        "schema": "your schema",
        "table": "your table",
    },
)

# Run the job.
future = civis.utils.run_job(job.id)

# It is recommended that you keep track of the job ID and run ID
# in some way. In this code snippet, we're just printing them.
print(f"Civis Data Match job ID {future.job_id} run ID {future.run_id}")

# Code execution does not block here after run_job().
# In other words, any subsequent code
# runs at the same time as your Civis Data Match job is running
# on Civis Platform. If you need to block here (e.g., if you need
# the result from Civis Data Match for downstream code to proceed),
# then you can call `.result()` to wait for the result.
future.result()

Input Field Mapping Examples

Below are some examples of different field mapping values, for use in Civis Data Match API calls.

The `primary_key` field is the only required value, though the first and last name fields are strongly recommended. You can use any combination of the optional values that you would like, but note that some fields can be used in place of each other such as `full_address` can be used in place of the `house_number`/`street`/`unit` fields and `birth_date` can be used in place of `birth_year`/`birth_month`/`birth_day` fields in order to better match your input data’s format.

1. Providing multiple phone number columns (up to four):

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",

    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "phone": ["home_phone", "mobile_phone", "office_phone"]
}

2. Providing a single phone number column:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",

    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "phone": "home_phone"
}

3. Using the full_address field:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",
    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "full_address": "address",
    "city": "address_city",
    "state": "address_state",
    "state_code": "state_number",
    "zip": "zip_code"
}

4. Using a combination of house_number/street/unit address fields:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",
    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "house_number": "address",
    "street": "address_street",
    "unit": "address_apartment",
    "city": "address_city",
    "state": "address_state",
    "state_code": "state_number",
    "zip": "zip_code"
}

5. Using the birth_date field:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",

    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "birth_date": "dob"
}

6. Using the birth_year/birth_month/birth_day fields:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",
    # first and last name is strongly recommended
    "first_name": "first_name",
    "last_name": "last_name",

    "birth_year": "year",
    "birth_month": "month",
    "birth_day": "day"
}

7. Using all optional fields:

input_field_mapping={
    # primary_key is always required
    "primary_key": "id",

    "first_name": "first_name",
    "middle_name": "middle_name",
    "last_name": "last_name",
    "name_suffix": "suffix",
    "gender": "sex",
    "phone": ["home_phone", "mobile_phone"],
    "email": "email_address",
    "birth_date": "dob",

    "full_address": "address",
    "city": "address_city",
    "state": "address_state",
    "state_code": "state_number",
    "zip": "zip_code",

    "lat": "latitude",
    "lon": "longitude"
}

Articles in this section

Getting Started

Setting up the Job

Formatting your columns

Using the Python API Client

Input Field Mapping Examples

Comments

Articles in this section

Getting Started

Setting up the Job

Formatting your columns

Using the Python API Client

Input Field Mapping Examples

Related articles