This tutorial is an introduction to a basic workflow in Civis. In this tutorial, you will:
- unify your data via Imports
- write SQL queries to better understand your data
- use person matching to merge two different datasets about people.
Getting Started
Download the user_data_set.csv and the user_upgrade_signups.csv files attached to this article.
Import Data Files
In order to work with your data, you will first need to import it into Redshift, the Amazon database on which the platform is built. We will do this using the Platform.
- Go to Data > and select File from the popout menu.
- Click Upload File and select the user_data_set.csv file that you saved to your computer earlier.
- Type "public" into the SCHEMA.TABLE field. This will put your table in the "public" schema and name your table "user_table".
- If such a table already exists, you can name your table public.user_table_<your initials>. If you do so, make sure to use your specific table name in later steps.
-
Repeat this process for the user_upgrade_signups.csv file attached to this article, but with a more applicable table name, such as user_upgrade_signups.
- Note: You can see Scheduling & Triggers options by clicking the clock icon in the upper right corner of the Imports window.
- Here you could set up the import to run on a specific schedule, such as every Sunday at 6:05am.
- For now, do not schedule this import. Click the X to close the sidepane.
- Click Run Now near the upper right of the Imports window. The job will run and you will be taken back to the Imports index page.
- You can expect this to take a few minutes. You should receive an email alerting you if the import succeeded or failed.
- On the Index page, your import will have a blue spinning icon in the Status column while it is running. Once it completes, that will become a green checkmark icon.
Query the Data
Now that you have imported the data, it's time to enhance and explore it.
- First, navigate to the Query tab on the navigation pane.
- On the left-hand side of the page, explore the gray data pane that displays all of the schemas in your database.
- Click the down arrow next to the public schema. A dropdown list of all the tables in that schema will appear. You should see two tables that you just created: user_table and user_upgrade_signups.
- Select the user_table. Near the top left, click on view details. This opens a preview of the information in that table.
- Click on the Sample Rows tab to see a preview of some of the data in your table.
- Move your cursor to the Query option near the top of the screen. Write a query to display all the columns from the table user_table.
- You can use the following query text:
-
select * from public.user_table
- As you write this, notice the way the system gently helps you write your query by suggesting names and commands.
-
- To run the Query, click the blue arrow on the right-hand side of the screen, or press Ctrl+Enter on your keyboard (Apple users can press ⌘+Enter).
- By default you will only see the first 10 rows of your query results. To see more rows, click on the arrow in the Preview 10 rows dialog. Change the number of rows to be shown to 20 and run your query again.
- Update your query to only show you the first name, last name, city, and state of users who live in New York.
- You can use the following query text:
-
select firstname, lastname, city, state from public.user_table where state='NY'
-
- Write a new query to show the number of users who live in each state.
- You can use the following query text:
-
select state, count(id) from public.user_table group by state
-
Enhance
Unifying datasets that contain records of people can be difficult without a shared unique identifier, which you may or may not have. To help you merge your data about people, we've created Civis Data Match, which uses a proprietary matching algorithm that outputs the likelihood that a record from one dataset is the same as a record in another dataset.
- Go to Tools > Enhancements > then select Civis Data Match.
- You want to match this list of users who came to your event against your full database of users. To begin, enter the input table that you want to match. The table name should be in the format “public.user”. As you type, the field will expand to show suggested tables. Select one from the list.
- Once you've selected your input table, select a Civis Data asset to match against your data. Please note: the Civis Data asset is only available to organizations whose contract includes data. If this feature is not available for you, please skip this step.
-
Next you'll need to designate where you'll want the output table to go and what it should be named. For Schema Name, type in "public" and for Table Name type in "user_table_matched". This is where the job will put a file with three columns--
-
Source_id: id of a record in your table
-
Matched_id: id of Civis record corresponding to your record
-
Score: likelihood score between 0 and 1 that the two records are the same person
-
- Leave the Matching Score Threshold and Max Matches to Return as their default values.
- Select Map Columns in Input Table.
- Map "ID" as the primary key by dragging the box to the field and match the remaining columns.
- Click Run Now. This will match your data against the entire USA consumer file (~230 million individuals), please be sure to set aside some time for this matching process to happen.
- When the match has completed, it will create an output table that contains the matching scores.
See Match Results
- To see the results of your match, go to Query and run the following code.
-
select s.firstname, s.lastname, m.target_id as likely_user_id, score from public.user_table s join public.user_table_matched m on s.id=m.source_id where score is not null
- With this matched data, you could now take data you collected on people at the event and merge it with data you have about your users.
Comments
0 comments
Please sign in to leave a comment.