Overview
The purpose of the Database Status Overview is to give users visibility into usage and performance details of their Civis-managed Redshift database. The page should help users answer questions like:
- Why are queries running slowly?
- How much disk space is available on my database?
- Has my database experienced downtime recently?
Where to Find It
- Click “Admin” on the Top Navigation Bar
- Select "Platform Usage Overview"
- Select the “Database” tab on the left
- View your database metrics
Switching Databases
If you have multiple databases, you can choose which one to view metrics for via the top navigation database dropdown. Select the name of the database to load metrics for it.
The Graphs
The database usage overview page contains a number of graphs, each representing a specific database metric over time. See below for guidance on how to interpret and troubleshoot the metrics in each graph.
CPU Usage
Shows CPU usage by node. As nodes near 100% CPU usage, queries will run slower than usual.
If compute nodes are showing high CPU usage, there may be some non-performant queries on your database. These queries may not show up in the UI, as they could be the result of a script or other job. You can use this code to check for long-running and expensive queries:
SELECT * FROM stv_recents WHERE status='Running' ORDER BY starttime ASC;
To cancel a query, run:
CANCEL <PID>;
If tables are well-distributed, then you should expect the compute-x lines to approximately match up, meaning each node of your cluster is doing a similar amount of work. If these lines are considerably different from each other, that is an indicator that all of the data involved in a query is on a single node, and you should consider a different distribution key. To identify queries involved with poorly-distributed (skewed) data, you can use this query:
SELECT * FROM svl_query_metrics_summary
WHERE cpu_skew is not null
AND query_cpu_time > 100 --filter out short queries
ORDER BY cpu_skew desc;
Disk Usage
Shows disk space used by each node. Please note that the available disk space displayed here may be higher than your contractual limit. If you have questions about your contractual disk usage limit please reach out to Support@CivisAnalytics.com.
When disk space usage increases (especially when accompanied by high CPU and queuing), this indicates that there may be a query running that is spilling temporary results out of memory and onto disk. When this occurs, you should refactor problem queries. You can use this code to check for which queries are using the most temp space:
SELECT * FROM svl_query_metrics_summary
WHERE query_temp_blocks_to_disk is not null
ORDER BY query_temp_blocks_to_disk desc;
Queue Length
Indicates how many queries are currently queued.
When there are large spikes in queue length, queries will take longer to start. You can use this code to check for long-running queries:
SELECT * FROM stv_recents WHERE status='Running' ORDER BY starttime ASC;
To cancel a query, run:
CANCEL <PID>;
Query duration by phase
Indicates average amount of time (in milliseconds) queries spend in each phase at a given time. This gives you an idea of what types of queries (copy, unload, ctas, etc.) are taking up the most time on your cluster.
Excessive time in the “query planning” phase may indicate table locks. Use the below query to investigate table locks:
SELECT * FROM admin.v_check_transaction_locks
Status
When Status = 1, this indicates the database is active. When Status = 0, this indicates that the database is down.
Maintenance
When Maintenance mode = 1, this indicates that there is maintenance occurring on the database. When Maintenance mode = 0, this indicates that no maintenance is occurring.
Comments
0 comments
Article is closed for comments.