Civis caches the data exported by SQL Scripts. Caching can reduce the time required to run an export by returning results directly from the cache instead of the database. It is especially useful when repeatedly selecting large amounts of data from tables that don’t change frequently.
The cache works by maintaining a set of queries and the results they produce. Before executing a query, a script first checks the cache to see if the query has been run before. If it has, and the results are still valid, data is returned immediately without the query being executed. Civis can determine when cached results are no longer usable by tracking all updates made to your tables. Data remains in the cache until the table it was selected from is modified.
Caching only applies to certain types of queries. To be cacheable, the query executed by the SQL script must:
- Be a SELECT statement.
- Select data from a Redshift database. Queries against Postgres databases are not cached.
- Select data from a single table or view. Late-binding views are not cached.
- Be deterministic. (For example, it may not call functions like `random()` or `current_date`. Note that queries such as `SELECT TOP` or those with a `LIMIT` but not an `ORDER BY` are cached.)
In addition to the query being cacheable, the following criteria must be met if results are to be returned from the cache:
- The table must be fully scanned by the Civis Platform. You can accomplish this by going to the Table Details page and clicking "Update Table Details".
- A previous export using the exact same query was successfully executed on the same cluster.
- The source table has not been modified since the previous export.
- The user running the script has permission to read from the table.
- The CSV formatting options are the same as the previous export.
Please note that Civis Platform currently cannot detect changes to tables made as part of stored procedures or UDFs. Using these features of Redshift may result in incorrect or outdated results due to caching. To ensure consistent and up-to-date results, users can scan the table to mark it as dirty after modifying it but before querying it; this will ensure that Platform’s SQL scripts avoid referencing the cache. Additional alternative options will be coming soon!
Identifying Cache Usage
There are two ways to identify whether or not a given export used a cache entry: Log entries and the API response.
If a query was cacheable, and a cache entry was used, the run logs for the appropriate job will include a line describing when the cache entry was created. If a particular SQL script or CSV export created a new cache entry, that time will be logged as well.
The API can also be used to determine whether or not a cache entry was used in a given run as well. The response from the `GET /scripts/sql/{id}/runs/{run_id}` endpoint for the run details for a given script includes an `outputCachedOn` field. If the output used in the run was cached, this field contains a timestamp of when that cache entry was created; if this run created the cache entry, or no cache entry was used, it is `null`. The following snippet of Python demonstrates how a user could check whether or not a run of a SQL script used a cache entry:
>>> import civis; client = civis.APIClient()
>>> client.scripts.get_sql_runs(SQL_SCRIPT_ID, RUN_USING_CACHE_ENTRY).output_cached_on
'2020-06-23T14:28:31.000Z'
>>> client.scripts.get_sql_runs(SQL_SCRIPT_ID, RUN_USING_CACHE_ENTRY).output_cached_on is None
False
>>> client.scripts.get_sql_runs(SQL_SCRIPT_ID, RUN_NOT_USING_CACHE_ENTRY).output_cached_on is None
True
For CSV Exports, the response from the `GET /exports/files/csv/{id}/runs/{run_id}` endpoint for the run details for a given script includes an outputCachedOn field, with the same meaning as described above.
Disabling Export Caching
It is possible to disable export caching for all queries run from Platform against a database. Doing so will ensure that all results returned from SQL scripts are up to date; however, doing so may also increase the load on your database and increase how long it takes queries that would have been cached to run.
If you would like to disable caching on your database, please contact Client Success via the Help widget in the lower right corner of Platform.
Comments
0 comments
Please sign in to leave a comment.