Overview
When dealing with large amounts of data, there is often a need to clean up or reconcile string data into specific categories. This work can now be done in Platform with Data Crosswalk, a streamlined process for categorization with Auto-Suggestions provided by AI.
Data Crosswalk allows you to quickly categorize and resolve non-standard data and assists with the deduplication process.
A Crosswalk is a mapping of data values. In this scenario, it is a mapping of unstandardized values to standardized values.
Getting Started
Click “Data Crosswalk” in the top navigation menu under “Tools”.
Select Data to Categorize
The first step is choosing a dataset to categorize by selecting a schema, table, and column. Our tool will pull the unique values from that column for categorization.
Upload Your Categories (Optional)
Data Crosswalk gives you the option to upload pre-existing categories to the tool to facilitate the categorization process. If desired, select a schema, table, and column for an existing column that contains your category values.
Select Your Categories
Map each unique value to the correct category: either type in a new category or select a pre-existing one from the dropdown. Once you’ve added a few categories, Civis AI will automatically suggest the closest category for each value.
Select Crosswalk Destination
After categorizing your data, enter a destination schema and table name for your new crosswalk and select what you’d like the tool to do if the table already exists: fail, append to the existing table, or drop and recreate the table.
Once you click “Create Crosswalk Table”, Platform will create the crosswalk and display an example join statement for appending your new categories to the original table. You can copy this code to the Query editor to run or modify it to your liking.
Comments
0 comments
Article is closed for comments.