The 4 SQL queries you need to debug Redshift performance

Posted by on August 25, 2022

Redshift is Amazon’s data warehouse product available on AWS. To help users understand the contents and performance of their Redshift cluster, users are provided with some tables and views, including:

This blog focuses on the Redshift-specific objects, and the queries that I often come back to when I need to look “under the hood” to try and understand and improve performance issues.

Understanding the Redshift-specific system objects

All Redshift system objects have prefixes on their name which follow a naming convention:

  • First, they have S for “system object”
  • Then they have either T for “table”, or V for “view”, depending on the object type
  • Finally, they have either L_ for “logs”, or V_ for, erm, “snapshots” (I haven’t been able to find anything which explains why this isV_ and not S_!)

That gives you object names like STV_INFLIGHT which is a system table with a (v)snapshot of the current queries running on the cluster. Simple! Now we understand that, let’s look at the SQL itself.

Queries

This SQL snippet tells you what’s actively running on the cluster (you need to be a superuser to see all queries running):

SELECT
  userid,
  query
  pid
  starttime
  text
FROM stv_inflight;

The text field will only store the first 200 characters of the queries running, but the real benefit here is that you can identify long-running queries. If you have a query you need to kill for any reason, you note the pid from the results, and run pg_terminate_backend(pid) to stop it running.

If you need the full text of the query, you can make a note of the query from here, and plug it into this SQL:

SELECT 
  q.query,
  q.starttime, 
  t.text,
  t.sequence 
FROM stl_query q
INNER JOIN stl_querytext t ON q.query = t.query
WHERE q.query = 1234
ORDER BY sequence;

Locks

If you are seeing an unusually long-running operation, it may be in a deadlocked transaction, which is visible in the STV_LOCKS table. Again, you can use the lock_owner_pid of the relevant sessions along with pg_terminate_backend(pid) to kill the offending session (start with the oldest lock first):

SELECT 
  table_id, 
  last_update, 
  last_commit, 
  lock_owner_pid, 
  lock_status 
FROM stv_locks
ORDER BY last_update;

Query/Table Performance

If you are digging around the system tables, you’re probably unhappy with the performance of something. There’s a series of objects which are aimed at providing useful information.

SVV_TABLE_INFO

SVV_TABLE_INFO provides information about state of your tables, which can used to troubleshoot poor-performance. I use this view as a guide for when to VACUUM a table. I tend to just select all columns from the view:

SELECT *
FROM svv_table_info
ORDER BY database, schema, table;

Here are some of the columns in the results I look at to see if there are any “low hanging fruit” with respect to improving performance:

  • empty shows how many blocks are waiting to be freed by a vacuum.
  • unsorted shows the percent of the table that is unsorted. The cluster will need to scan this entire section for every query. If this creeps up too high you need to vacuum to re-sort the table.
  • skey_sortkey1 shows the ratio of the size of the first column of the sortkey to the size of the largest non-sortkey column. You can use this value to evaluate the effectiveness of the sortkey.
  • skew_rows shows the ratio of rows from most on a slice to least on a slice. Use it to evaluate distkey.
  • pct_used shows the amount of disk that the one table occupies. Use it to review if a table, or portion of it, could be stored in S3 or elsewhere. This might be the case for less frequently accessed data, or if you have “backup” tables.

STL_ALERT_EVENT_LOG

The STL_ALERT_EVENT_LOG table is important for optimising queries. When the cluster executes a query, it will record issues found by the query planner into STL_ALERT_EVENT_LOG, along with suggested fixes.

Some problems can be fixed by running ANALYZE or VACUUM, while others might require rewriting the query or how you store your data. Amazon themselves can advise you there, with their table usage notes.

Again, I tend to just select all the columns from the table — the solution column will tell you in plain language what will improve performance:

SELECT *
FROM STL_ALERT_EVENT_LOG;

I hope you find these queries useful. Feel free to connect with me on LinkedIn, and leave your comments and suggestions below.

Leave a reply

Your email address will not be published. Required fields are marked *