Overview
This document describes how to initiate a self-service delivery of Civis Data onto a user-specified cluster. Self-service delivery is intended to give users more control over the timing and cadence of their data refreshes.
Recommendations
Keep the most recent quarterly release of Civis Data
Civis Data Match, Identity Resolution and the PII Appends Tool both use Civis Data internally. Currently, these tools update with the quarterly releases of Civis Data. Quarterly releases are those built-in March, June, September, and December. This means that a more recent monthly release of Civis Data may contain records that are not available in those tools. We recommend keeping the most recent quarterly release of Civis Data on your cluster for analyses where Civis Data Match or PII Appends coverage is important.
Drop unused copies of Civis Data before initiating a “pull” delivery
Keeping too many copies of Civis’ data products on your cluster at once can fill up your cluster. We recommend keeping only the most recent quarterly release and/or the most recent monthly release.
How to import a Basefile
Step 1: Check which data products you have permission to pull
Basefile Pull (Step 1 of 3): List Basefiles
Open this template and click “New Custom Script" (https://platform.civisanalytics.com/spa/#/templates/scripts/72954).
You may also access this script in Platform, go to
- TOOLS (top menu bar)
- ENHANCEMENTS
- BASEFILE DELIVERY TOOLS
Once you have successfully run this script, note the History-> Output from this job as it will be used as inputs for the Basefile Pull (Step 2 of 3): Deliver Basefile.
Example
In this demonstration, the user runs the script and checks the output. Notice that the user has access to the March 2021 (“2021_03”) and June 2021 (“2021_06”) releases of the `basic_commercial_client` file.
Next Steps: Go to Step 2: [Deliver Basefile] Open this template and click “New Custom Script”
NOTE: If you run this script and believe that the outputs inaccurately reflect your Civis Data contract, please email support@civisanalytics.com
Step 2: Deliver Basefile to your cluster
Basefile Pull (Step 2 of 3): Deliver Basefile (https://platform.civisanalytics.com/spa/#/templates/scripts/71924)
This script will copy a specified version of the basefile onto your Redshift cluster. You can only copy one basefile per job run. Be sure to verify you are authorized to access the basefile before running this script. A list of basefiles your organization is authorized to access is provided in the History-> Output in Basefile Pull (Step 1 of 3): List Basefiles (https://platform.civisanalytics.com/spa/#/templates/scripts/72954)
To pull the basefile to your cluster:
- Create your script using this link OR
- Within Platform navigate to Code → Scripts → More Script Templates. Search for "Basefile Pull (Step 2 of 3): Deliver Basefile" or “71924”.
If you are not shared on this template, please email support@civisanalytics.com.
A new table with the name `{Schema Name}.{Basefile Type}_{Basefile Release Date}` will be available to query on your database. If you would like to create a view that references this new table, you can use the [Update Basefile View](https://platform.civisanalytics.com/spa/#/templates/scripts/71922) utility script.
NOTE: The Basefile is extremely large. Transferring it onto your Redshift cluster can take several hours and impact cluster performance during the delivery. We recommend that you schedule a time to run this script when it will not interfere with other work on your cluster. For best results, do not run this script concurrently with other "Deliver Basefile" scripts.
Example
In this demonstration, the user copies the June 2021 version of the `basic_commercial_client` Basefile to a Redshift table in the “ts” schema on their cluster. After the import completes, a table called `ts.basic_commercial_client_2021_06` is available to query on the “Acme Corporation” database.
Next Steps: Go to Step 3 (Optional): [Update Basefile View (https://platform.civisanalytics.com/spa/#/templates/scripts/71922)
Step 3 (Optional): Update View to point to most recent Basefile
Basefile Pull (Step 3 of 3): Update View (https://platform.civisanalytics.com/spa/#/templates/scripts/71922)
This final step for the pull delivery will create (or update) a view that references a Basefile table on your Redshift cluster. The input parameters for this step should mirror Step 2: Deliver Basefile. This step is optional but recommended to ensure all endpoints reflect the newest data without needing to update the view name in all your code for each basefile update.
NOTE: This script will not succeed if you wish to update an existing view that has other view dependencies. Drop all dependent views before running.
To access this script in Platform:
- Create your script using this link OR
- Within Platform navigate to Code → Scripts → More Script Templates. Search for "Basefile Pull (Step 3 of 3): Update View" or “71922”.
A new (or updated) view will be available to query on the database you specified. By default, views are named `{Schema Name}.{Basefile Type}` unless you entered a different name into the "View Name" field. All views are displayed in italics in the side panel.
Example
In this demonstration, the user creates a view called `ts.basic_commercial_client` that references the table `ts.basic_commercial_client_2021_06`. A view is always created in the same schema as the table it references—in this case, “ts”. Notice that the user did not enter a name into the “View Name” field and leveraged the default naming convention of the script. If the user wanted to override this default behavior and create a view called `ts.my_view`, the user must supply the value of “my_view” into the “View Name” field.
Next Steps: Have fun querying the Basefile!
Frequently Asked Questions
- Can I run multiple “Deliver Basefile” scripts concurrently?
- We recommend that you do not run multiple "Deliver Basefile" scripts concurrently as these tables are extremely large. Transferring it onto your Redshift cluster can take several hours and impact cluster performance during the delivery.
- What can I expect after the process is complete?
- Each script has distinct outcomes necessary to give you access to the basefile of your choice:
- Script 1 (List Basefiles) will list all "Basefile Releases" and "Basefile Types'' currently available to your organization. If you are authorized to copy the Basefile onto your cluster in the run output.
- Script 2 (Delivery Basefile) will copy a specified type and version of the Basefile onto your Redshift cluster for use.
- Script 3 (Update View) will create (or update) a view that references a Basefile table on your Redshift cluster.
- Each script has distinct outcomes necessary to give you access to the basefile of your choice:
- Why is the most recent version of basefile not listed in the output of Step 1?
- There may be two reasons a basefile is not available to you:
- You are not authorized to pull this version or type of basefile, Client Support can verify your eligibility.
- The basefile version has not yet been released or it is an older version that is no longer available via the Pull Delivery (only the 3 latest versions are able for self-service).
- There may be two reasons a basefile is not available to you:
- Why is the “Deliver Basefile” script failing?
- The logs should provide relevant error messages. There may be various reasons for your script to fail:
- You are not eligible to pull a certain version or type of the basefile. Verify your desired basefile is listed in the output of List of Basefiles (Step 1)
- The incorrect parameter was entered into a field. Verify all parameters are entered exactly as they appear in the output of List of Basefiles (Step 1) and your destination database and schema are valid.
- You do not have permission to Pull a basefile for your organization. Contact Client Success and verify your userid is authorized to pull a basefile.
- If this did not resolve your issue, please open a support ticket with Client Success for resolution.
- The logs should provide relevant error messages. There may be various reasons for your script to fail:
- Where will the new basefile table copy to? How do I identify the new table? What happens to the old table?
- Your Basefile table will copy to the database/schema you designated in Script 2 (Database Name & Schema Name). Your table will be named with the following naming convention {Basefile Type}_{Basefile Release Date}.
- Your older basefile will remain in the schema until you drop the table. Keeping too many copies of Civis DataCivis’ data products on your cluster at once can fill up your cluster’s space allotment. We recommend keeping only the most recent release. Click HERE for instructions on how.
- Does the Step 3 “Update Canonical View” need to be run each time a new basefile is pulled?
- This script is not required but is recommended as a best practice so code that references the basefile points towards a view, rather than the table. Updating the view is a non-breaking code change that still allows your team to utilize the most recent file. This script will need to be run each time a basefile is released to update all endpoints. The script should be run for each basefile type individually.
- Do I need to update my code after getting the latest basefile?
- If you choose not to update the canonical view, you will need to update each portion of the code that points to an older version of the basefile individually. Using Canonical Views (Optional) will update all endpoints to reflect the newest data without needing to update the viewname in all your code for each basefile update.
- Does the new file override the old file, or does it rename the old file and keep it somewhere else?
- No, the old filename will not be overwritten. The old file will reside in your cluster unless dropped. Any references to the basefile should be update to point to the newest basefile manually or by running Canonical Views (Optional).
Best Practices
- Keep your canonical view as the default by leaving the View Name blank. The view name should only be used when you need a unique View Name, rather than the value of “Basefile Type”, assigned.
- Drop older copies of the Basefile. The Basefile is a large dataset. Keeping too many copies on your cluster can consume an excessive amount of storage space. We recommend keeping only the most recent release of the Basefile and dropping older releases if they are not in use.
Comments
0 comments
Please sign in to leave a comment.