[ad_1]
Amazon Redshift is a completely managed, petabyte-scale information warehouse service within the cloud, offering as much as 5 occasions higher price-performance than every other cloud information warehouse, with efficiency innovation out of the field at no further price to you. Tens of 1000’s of shoppers use Amazon Redshift to course of exabytes of information each day to energy their analytics workloads.
On this publish, we focus on Amazon Redshift SYS monitoring views and the way they simplify the monitoring of your Amazon Redshift workloads and useful resource utilization.
Overview of SYS monitoring views
SYS monitoring views are system views in Amazon Redshift that can be utilized to watch question and workload useful resource utilization for provisioned clusters in addition to for serverless workgroups. They provide the next advantages:
- They’re categorized based mostly on useful alignment, contemplating question state, efficiency metrics, and question sorts
- We now have launched new efficiency metrics like
planning_time
,lock_wait_time
,remote_read_io
, andlocal_read_io
to help in efficiency troubleshooting - It improves the usability of monitoring views by logging the user-submitted question as an alternative of the Redshift optimizer-rewritten question
- It supplies extra troubleshooting metrics utilizing fewer views
- It permits unified Amazon Redshift monitoring by enabling you to make use of the identical question throughout provisioned clusters or serverless workgroups
Let’s have a look at a few of the options of SYS monitoring views and the way they can be utilized for monitoring.
Unify varied query-level monitoring metrics
The next desk exhibits how one can unify varied metrics and knowledge for a question from a number of system tables & views into one SYS monitoring view.
STL/SVL/STV | Info ingredient | SYS Monitoring View | View columns |
STL_QUERY | elapsed time, question label, consumer ID, transaction, session, label, stopped queries, database title | SYS_QUERY_HISTORY |
user_id query_id query_label transaction_id session_id database_name query_type standing result_cache_hit start_time end_time elapsed_time queue_time execution_time error_message returned_rows returned_bytes query_text redshift_version usage_limit compute_type compile_time planning_time lock_wait_time |
STL_WLM_QUERY | queue time, runtime | ||
SVL_QLOG | end result cache | ||
STL_ERROR | error code, error message | ||
STL_UTILITYTEXT | non-SELECT SQL | ||
STL_DDLTEXT | DDL statements | ||
SVL_STATEMENTEXT | all sorts of SQL statements | ||
STL_RETURN | return rows and bytes | ||
STL_USAGE_CONTROL | utilization restrict | ||
STV_WLM_QUERY_STATE | present state of WLM | ||
STV_RECENTS | current and in-flight queries | ||
STV_INFLIGHT | in-flight queries | ||
SVL_COMPILE | compilation |
For extra data on SYS to STL/SVL/STV mapping, seek advice from Migrating to SYS monitoring views.
Person query-level logging
To boost question efficiency, the Redshift question engine can rewrite user-submitted queries. The user-submitted question identifier is completely different than the rewritten question identifier. We seek advice from the user-submitted question because the dad or mum question and the rewritten question because the youngster question on this publish.
The next diagram illustrates logging on the dad or mum question stage and youngster question stage. The dad or mum question identifier is 1000, and the kid question identifiers are 1001, 1002, and 1003.
Question lifecycle timings
SYS_QUERY_HISTORY has an enhanced checklist of columns to supply granular time metrics referring to the completely different question lifecycle phases. Observe all occasions are recorded in microseconds. The next desk summarizes these metrics.
Time metrics | Description |
planning_time | The time the question spent previous to operating the question, which generally consists of question lifecycle phases like parse, analyze, planning and rewriting. |
lock_wait_time | The time the question spent on buying the locks on the required database objects referenced. |
queue_time | The time the question spent within the queue ready for assets to be out there to run. |
compile_time | The time the question spent compiling. |
execution_time | The time the question spent operating. Within the case of a SELECT question, this additionally consists of the return time. |
elapsed_time | The tip-to-end time of the question run. |
Resolution overview
We focus on the next situations to assist achieve familiarity with the SYS monitoring views:
- Workload and question lifecycle monitoring
- Knowledge ingestion monitoring
- Exterior question monitoring
- Gradual question efficiency troubleshooting
Stipulations
You need to have the next stipulations to observe together with the examples on this publish:
Moreover, obtain all of the SQL queries which might be referenced on this publish as Redshift Question Editor v2 SQL notebooks.
Workload and question lifecycle monitoring
On this part, we focus on monitor the workload and question lifecycle.
Establish in-flight queries
SYS_QUERY_HISTORY supplies a singular view to have a look at all of the in-flight queries in addition to historic runs. See the next instance question:
We get the next output.
Establish prime long-running queries
The next question helps retrieve the highest 100 queries which might be taking the longest to run. Analyzing (and, if possible, optimizing) these queries might help enhance general efficiency. These metrics are collected statistics throughout all runs of the question. Observe that on a regular basis values are in microseconds.
We get the next output.
Collect day by day counts of queries by question sorts, interval, and standing
The next question supplies perception into the distribution of several types of queries throughout completely different days and helps consider and monitor any modifications within the workload:
We get the next output.
Collect run particulars of an in-flight question
To find out the run-level particulars of a question that’s in-flight, you need to use the is_active = ‘t’
filter when querying the SYS_QUERY_DETAIL desk. See the next instance:
To view the most recent 100 COPY queries run, use the next code:
We get the next output.
Collect transaction-level particulars for commits and undo
SYS_TRANSACTION_HISTORY supplies transaction-level logging by offering insights into dedicated transactions with particulars like blocks dedicated, standing, and isolation stage (serializable or snapshot used). It additionally logs particulars in regards to the rolled again or undo transactions.
The next screenshots illustrate fetching particulars a few transaction that was dedicated efficiently.
The next screenshots illustrate fetching particulars a few transaction that was rolled again.
Stats and vacuum
The SYS_ANALYZE_HISTORY monitoring view supplies particulars just like the final timestamp of analyze queries, the length for which a selected analyze question ran, the variety of rows within the desk, and the variety of rows modified. The next instance question supplies an inventory of the most recent analyze queries that ran for all of the everlasting tables:
We get the next output.
The SYS_VACUUM_HISTORY monitoring view supplies an entire set of particulars on VACUUM in a single view. For instance, see the next code:
We get the next output.
Knowledge ingestion monitoring
On this part, we focus on monitor information ingestion.
Abstract of ingestion
SYS_LOAD_HISTORY supplies particulars into the statistics of COPY instructions. Use this view for summarized insights into your ingestion workload. The next instance question supplies an hourly abstract of ingestion damaged down by tables wherein information was ingested:
We get the next output.
File-level ingress logging
SYS_LOAD_DETAIL supplies extra granular insights into how ingestion is carried out on the file stage. For instance, see the next question utilizing sys_load_history
:
We get the next output.
The next instance exhibits what detailed file-level monitoring appears to be like like:
Test for errors throughout ingress course of
SYS_LOAD_ERROR_DETAIL lets you monitor and troubleshoot errors which will have occurred through the ingestion course of. This view logs particulars for the file that encountered the error through the ingestion course of together with the road quantity at which the error occurred and column particulars inside that line. See the next code:
We get the next output.
Exterior question monitoring
SYS_EXTERNAL_QUERY_DETAIL supplies run particulars for exterior queries, which incorporates Amazon Redshift Spectrum and federated queries. This view logs particulars on the phase stage and supplies helpful insights to troubleshoot and monitor efficiency of exterior queries in a single monitoring view. The next are just a few helpful metrics and information factors this monitoring view supplies:
- Variety of exterior recordsdata scanned (
scanned_files
) and format of exterior recordsdata (file_format
) corresponding to Parquet, textual content file, and so forth - Knowledge scanned when it comes to rows (
returned_rows
) and bytes (returned_bytes
) - Utilization of partitioning (
total_partitions
andqualified_partitions
) by exterior queries and tables - Granular insights into time taken in itemizing (
s3list_time
) and qualifying partitions (get_partition_time
) for a given exterior object - Exterior file location (file_location) and exterior desk title (
table_name
) - Sort of exterior supply (
source_type
), corresponding to Amazon Easy Storage Service (Amazon S3) for Redshift Spectrum, or federated - Recursive scan for subdirectories (
is_recursive
) or entry of nested column information kind (is_nested
)
For instance, the next question exhibits the day by day abstract of the variety of exterior queries run and information scanned:
We get the next output.
Utilization of partitions
You may confirm whether or not the exterior queries scanning giant sums of information and recordsdata are partitioned or not. Whenever you use partitions, you’ll be able to prohibit the quantity of information that your exterior question has to scan by pruning based mostly on the partition key. See the next code:
We get the next output.
For any errors encountered with exterior queries, look into SYS_EXTERNAL_QUERY_ERROR
, which logs particulars on the granularity of file_location
, column
, and rowid
inside that file.
Gradual question efficiency troubleshooting
Confer with the sysview_slow_query_performance_troubleshooting
SQL pocket book downloaded as a part of the stipulations for a step-by-step information on carry out query-level troubleshooting utilizing SYS monitoring views and discover solutions to the next questions:
- Do the queries being in contrast have comparable question textual content?
- Did the question use the end result cache?
- Which components of the question lifecycle (queuing, compilation, planning, lock wait) are contributing probably the most to question runtimes?
- Has the question plan modified?
- Is the question studying extra information blocks?
- Is the question spilling to disk? If that’s the case, is it spilling to native or distant storage?
- Is the question extremely skewed with respect to information (distribution) and time (runtime)?
- Do you see extra rows processed in be a part of steps or nested loops?
- Are there any alerts indicating staleness in statistics?
- When was the final vacuum and analyze carried out for the tables concerned within the question?
Clear up
Should you created any Redshift provisioned clusters or Redshift Serverless workgroups as a part of this publish and not want them to your workloads, you’ll be able to delete them to keep away from incurring further prices.
Conclusion
On this publish, we defined how you need to use the Redshift SYS monitoring views to watch workloads of provisioned clusters and serverless workgroups. The SYS monitoring views present simplified monitoring of the workloads, entry to varied query-level monitoring metrics from a unified view, and the flexibility to make use of the identical SYS monitoring view question to run throughout each provisioned clusters and serverless workgroups. We additionally coated some key monitoring and troubleshooting situations utilizing SYS monitoring views.
We encourage you to start out utilizing the brand new SYS monitoring views to your Redshift workloads. You probably have any suggestions or questions, please depart them within the feedback.
Concerning the authors
Urvish Shah is a Senior Database Engineer at Amazon Redshift. He has greater than a decade of expertise engaged on databases, information warehousing and in analytics area. Outdoors of labor, he enjoys cooking, travelling and spending time together with his daughter.
Ranjan Burman is a Analytics Specialist Options Architect at AWS. He focuses on Amazon Redshift and helps clients construct scalable analytical options. He has greater than 15 years of expertise in numerous database and information warehousing applied sciences. He’s captivated with automating and fixing buyer issues with using cloud options.
[ad_2]