Simplify Amazon Redshift monitoring utilizing the brand new unified SYS views

Big Data

Simplify Amazon Redshift monitoring utilizing the brand new unified SYS views

geeks-news.com

October 24, 2023

[ad_1]

Amazon Redshift is a completely managed, petabyte-scale information warehouse service within the cloud, offering as much as 5 occasions higher price-performance than every other cloud information warehouse, with efficiency innovation out of the field at no further price to you. Tens of 1000’s of shoppers use Amazon Redshift to course of exabytes of information each day to energy their analytics workloads.

On this publish, we focus on Amazon Redshift SYS monitoring views and the way they simplify the monitoring of your Amazon Redshift workloads and useful resource utilization.

Overview of SYS monitoring views

SYS monitoring views are system views in Amazon Redshift that can be utilized to watch question and workload useful resource utilization for provisioned clusters in addition to for serverless workgroups. They provide the next advantages:

They’re categorized based mostly on useful alignment, contemplating question state, efficiency metrics, and question sorts
We now have launched new efficiency metrics like planning_time, lock_wait_time, remote_read_io, and local_read_io to help in efficiency troubleshooting
It improves the usability of monitoring views by logging the user-submitted question as an alternative of the Redshift optimizer-rewritten question
It supplies extra troubleshooting metrics utilizing fewer views
It permits unified Amazon Redshift monitoring by enabling you to make use of the identical question throughout provisioned clusters or serverless workgroups

Let’s have a look at a few of the options of SYS monitoring views and the way they can be utilized for monitoring.

Unify varied query-level monitoring metrics

The next desk exhibits how one can unify varied metrics and knowledge for a question from a number of system tables & views into one SYS monitoring view.

STL/SVL/STV	Info ingredient	SYS Monitoring View	View columns
STL_QUERY	elapsed time, question label, consumer ID, transaction, session, label, stopped queries, database title	SYS_QUERY_HISTORY	user_id query_id query_label transaction_id session_id database_name query_type standing result_cache_hit start_time end_time elapsed_time queue_time execution_time error_message returned_rows returned_bytes query_text redshift_version usage_limit compute_type compile_time planning_time lock_wait_time
STL_WLM_QUERY	queue time, runtime
SVL_QLOG	end result cache
STL_ERROR	error code, error message
STL_UTILITYTEXT	non-SELECT SQL
STL_DDLTEXT	DDL statements
SVL_STATEMENTEXT	all sorts of SQL statements
STL_RETURN	return rows and bytes
STL_USAGE_CONTROL	utilization restrict
STV_WLM_QUERY_STATE	present state of WLM
STV_RECENTS	current and in-flight queries
STV_INFLIGHT	in-flight queries
SVL_COMPILE	compilation

For extra data on SYS to STL/SVL/STV mapping, seek advice from Migrating to SYS monitoring views.

Person query-level logging

To boost question efficiency, the Redshift question engine can rewrite user-submitted queries. The user-submitted question identifier is completely different than the rewritten question identifier. We seek advice from the user-submitted question because the dad or mum question and the rewritten question because the youngster question on this publish.

The next diagram illustrates logging on the dad or mum question stage and youngster question stage. The dad or mum question identifier is 1000, and the kid question identifiers are 1001, 1002, and 1003.

Question lifecycle timings

SYS_QUERY_HISTORY has an enhanced checklist of columns to supply granular time metrics referring to the completely different question lifecycle phases. Observe all occasions are recorded in microseconds. The next desk summarizes these metrics.

Time metrics	Description
planning_time	The time the question spent previous to operating the question, which generally consists of question lifecycle phases like parse, analyze, planning and rewriting.
lock_wait_time	The time the question spent on buying the locks on the required database objects referenced.
queue_time	The time the question spent within the queue ready for assets to be out there to run.
compile_time	The time the question spent compiling.
execution_time	The time the question spent operating. Within the case of a SELECT question, this additionally consists of the return time.
elapsed_time	The tip-to-end time of the question run.

Resolution overview

We focus on the next situations to assist achieve familiarity with the SYS monitoring views:

Workload and question lifecycle monitoring
Knowledge ingestion monitoring
Exterior question monitoring
Gradual question efficiency troubleshooting

Stipulations

You need to have the next stipulations to observe together with the examples on this publish:

Moreover, obtain all of the SQL queries which might be referenced on this publish as Redshift Question Editor v2 SQL notebooks.

Workload and question lifecycle monitoring

On this part, we focus on monitor the workload and question lifecycle.

Establish in-flight queries

SYS_QUERY_HISTORY supplies a singular view to have a look at all of the in-flight queries in addition to historic runs. See the next instance question:

SELECT  
  *
FROM    
  sys_query_history
WHERE    standing IN ('planning', 'queued', 'operating', 'returning')
ORDER BY
  start_time;

We get the next output.

Establish prime long-running queries

The next question helps retrieve the highest 100 queries which might be taking the longest to run. Analyzing (and, if possible, optimizing) these queries might help enhance general efficiency. These metrics are collected statistics throughout all runs of the question. Observe that on a regular basis values are in microseconds.

--top lengthy operating question by elapsed_time
SELECT  
  user_id
  , transaction_id
  , query_id
  , database_name
  , query_type
  , query_text::VARCHAR(100)
  , lock_wait_time
  , planning_time
  , compile_time
  , execution_time
  , elapsed_time
FROM    
  sys_query_history
ORDER BY
  elapsed_time DESC
LIMIT 100;

We get the next output.

Collect day by day counts of queries by question sorts, interval, and standing

The next question supplies perception into the distribution of several types of queries throughout completely different days and helps consider and monitor any modifications within the workload:

--daily breakdown of workload by question sorts and standing
SELECT  
  DATE_TRUNC('day', start_time) period_daily
  , query_type
  , standing
  , COUNT(*)
FROM    
  sys_query_history
GROUP BY
  period_daily
  , query_type
  , standing
ORDER BY
  period_daily
  , query_type
  , standing;

We get the next output.

Collect run particulars of an in-flight question

To find out the run-level particulars of a question that’s in-flight, you need to use the is_active = ‘t’ filter when querying the SYS_QUERY_DETAIL desk. See the next instance:

SELECT  
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id
  , step_name
  , table_id
  , coalesce(table_name,'')|| coalesce(supply,'') as table_name
  , start_time
  , end_time
  , length
  , blocks_read
  , local_read_io
  , remote_read_io
FROM    
  sys_query_detail
WHERE is_active="t"
ORDER BY
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id;

To view the most recent 100 COPY queries run, use the next code:

SELECT  
  session_id
  , transaction_id
  , query_id
  , database_name
  , table_name
  , data_source
  , loaded_rows
  , loaded_bytes
  , length / 1000.00 duration_ms
FROM    
  sys_load_history
ORDER BY
  start_time DESC LIMIT 100;

We get the next output.

Collect transaction-level particulars for commits and undo

SYS_TRANSACTION_HISTORY supplies transaction-level logging by offering insights into dedicated transactions with particulars like blocks dedicated, standing, and isolation stage (serializable or snapshot used). It additionally logs particulars in regards to the rolled again or undo transactions.

The next screenshots illustrate fetching particulars a few transaction that was dedicated efficiently.

The next screenshots illustrate fetching particulars a few transaction that was rolled again.

Stats and vacuum

The SYS_ANALYZE_HISTORY monitoring view supplies particulars just like the final timestamp of analyze queries, the length for which a selected analyze question ran, the variety of rows within the desk, and the variety of rows modified. The next instance question supplies an inventory of the most recent analyze queries that ran for all of the everlasting tables:

SELECT  
  TRIM(schema_name) schema_name
  , TRIM(table_name) table_name
  , table_id
  , standing
  , COUNT(*) times_analyze_was_triggered
  , MAX(last_analyze_time) last_analyze_time
  , MAX(end_time) end_time
  , AVG(ROWS) "rows"
  , AVG(modified_rows) modified_rows
FROM    
  sys_analyze_history
WHERE
   standing != 'Skipped'
GROUP BY
  schema_name
  , table_name
  , table_id
  , standing
ORDER BY
  schema_name
  , table_name
  , table_id
  , standing
  , end_time;

We get the next output.

The SYS_VACUUM_HISTORY monitoring view supplies an entire set of particulars on VACUUM in a single view. For instance, see the next code:

SELECT  
  user_id
  , transaction_id
  , query_id
  , TRIM(database_name) as database_name
  , TRIM(schema_name) as schema_name
  , TRIM(table_name) table_name
  , table_id
  , vacuum_type
  , is_automatic as is_auto
  , length
  , rows_before_vacuum
  , size_before_vacuum
  , reclaimable_rows
  , reclaimed_rows
  , reclaimed_blocks
  , sortedrows_before_vacuum
  , sortedrows_after_vacuum
FROM    
  sys_vacuum_history
WHERE    standing LIKE '%Completed%'
ORDER BY
  start_time;

We get the next output.

Knowledge ingestion monitoring

On this part, we focus on monitor information ingestion.

Abstract of ingestion

SYS_LOAD_HISTORY supplies particulars into the statistics of COPY instructions. Use this view for summarized insights into your ingestion workload. The next instance question supplies an hourly abstract of ingestion damaged down by tables wherein information was ingested:

SELECT  
  date_trunc('hour', start_time) period_hourly
  , database_name
  , table_name
  , standing
  , file_format
  , SUM(loaded_rows) total_rows_ingested
  , SUM(loaded_bytes) total_bytes_ingested
  , SUM(source_file_count) num_of_files_to_process
  , SUM(file_count_scanned) num_of_files_processed
  , SUM(error_count) total_errors
FROM    
  sys_load_history
GROUP BY
  period_hourly
  , database_name
  , table_name
  , standing
  , file_format
ORDER BY
  table_name
  , period_hourly
  , standing;

We get the next output.

File-level ingress logging

SYS_LOAD_DETAIL supplies extra granular insights into how ingestion is carried out on the file stage. For instance, see the next question utilizing sys_load_history:

SELECT  
  *
FROM    
  sys_load_history
WHERE table_name="catalog_sales"
ORDER BY
  start_time;

We get the next output.

The next instance exhibits what detailed file-level monitoring appears to be like like:

 SELECT  
  user_id
  , query_id
  , TRIM(file_name) file_name
  , bytes_scanned
  , lines_scanned
  , splits_scanned
  , record_time
  , start_time
  , end_time
FROM    
  sys_load_detail
WHERE query_id = 1824870
ORDER BY
  start_time;

Test for errors throughout ingress course of

SYS_LOAD_ERROR_DETAIL lets you monitor and troubleshoot errors which will have occurred through the ingestion course of. This view logs particulars for the file that encountered the error through the ingestion course of together with the road quantity at which the error occurred and column particulars inside that line. See the next code:

choose * from sys_load_error_detail order by start_time restrict 100;

We get the next output.

Exterior question monitoring

SYS_EXTERNAL_QUERY_DETAIL supplies run particulars for exterior queries, which incorporates Amazon Redshift Spectrum and federated queries. This view logs particulars on the phase stage and supplies helpful insights to troubleshoot and monitor efficiency of exterior queries in a single monitoring view. The next are just a few helpful metrics and information factors this monitoring view supplies:

Variety of exterior recordsdata scanned (scanned_files) and format of exterior recordsdata (file_format) corresponding to Parquet, textual content file, and so forth
Knowledge scanned when it comes to rows (returned_rows) and bytes (returned_bytes)
Utilization of partitioning (total_partitions and qualified_partitions) by exterior queries and tables
Granular insights into time taken in itemizing (s3list_time) and qualifying partitions (get_partition_time) for a given exterior object
Exterior file location (file_location) and exterior desk title (table_name)
Sort of exterior supply (source_type), corresponding to Amazon Easy Storage Service (Amazon S3) for Redshift Spectrum, or federated
Recursive scan for subdirectories (is_recursive) or entry of nested column information kind (is_nested)

For instance, the next question exhibits the day by day abstract of the variety of exterior queries run and information scanned:

SELECT  
  DATE_TRUNC('hour', start_time) period_hourly
  , user_id
  , TRIM(source_type) source_type
  , COUNT (DISTINCT query_id) query_counts
  , SUM(returned_rows) returned_rows
  , ROUND(SUM(returned_bytes) / 1024^3,2) returned_gb
FROM    
  sys_external_query_detail
GROUP BY
  period_hourly
  , user_id
  , source_type
ORDER BY
  period_hourly
  , user_id
  , source_type;

We get the next output.

Utilization of partitions

You may confirm whether or not the exterior queries scanning giant sums of information and recordsdata are partitioned or not. Whenever you use partitions, you’ll be able to prohibit the quantity of information that your exterior question has to scan by pruning based mostly on the partition key. See the next code:

SELECT  
  file_location
  , CASE
      WHEN NVL(total_partitions,0) = 0
      THEN 'No'
      ELSE 'Sure'
    END is_partitioned
  , SUM(scanned_files) total_scanned_files
  , COUNT(DISTINCT query_id) query_count
FROM    
  sys_external_query_detail
GROUP BY
  file_location
  , is_partitioned
ORDER BY
  total_scanned_files DESC;

We get the next output.

For any errors encountered with exterior queries, look into SYS_EXTERNAL_QUERY_ERROR, which logs particulars on the granularity of file_location, column, and rowid inside that file.

Gradual question efficiency troubleshooting

Confer with the sysview_slow_query_performance_troubleshooting SQL pocket book downloaded as a part of the stipulations for a step-by-step information on carry out query-level troubleshooting utilizing SYS monitoring views and discover solutions to the next questions:

Do the queries being in contrast have comparable question textual content?
Did the question use the end result cache?
Which components of the question lifecycle (queuing, compilation, planning, lock wait) are contributing probably the most to question runtimes?
Has the question plan modified?
Is the question studying extra information blocks?
Is the question spilling to disk? If that’s the case, is it spilling to native or distant storage?
Is the question extremely skewed with respect to information (distribution) and time (runtime)?
Do you see extra rows processed in be a part of steps or nested loops?
Are there any alerts indicating staleness in statistics?
When was the final vacuum and analyze carried out for the tables concerned within the question?

Clear up

Should you created any Redshift provisioned clusters or Redshift Serverless workgroups as a part of this publish and not want them to your workloads, you’ll be able to delete them to keep away from incurring further prices.

Conclusion

On this publish, we defined how you need to use the Redshift SYS monitoring views to watch workloads of provisioned clusters and serverless workgroups. The SYS monitoring views present simplified monitoring of the workloads, entry to varied query-level monitoring metrics from a unified view, and the flexibility to make use of the identical SYS monitoring view question to run throughout each provisioned clusters and serverless workgroups. We additionally coated some key monitoring and troubleshooting situations utilizing SYS monitoring views.

We encourage you to start out utilizing the brand new SYS monitoring views to your Redshift workloads. You probably have any suggestions or questions, please depart them within the feedback.

Concerning the authors

Urvish Shah is a Senior Database Engineer at Amazon Redshift. He has greater than a decade of expertise engaged on databases, information warehousing and in analytics area. Outdoors of labor, he enjoys cooking, travelling and spending time together with his daughter.

Ranjan Burman is a Analytics Specialist Options Architect at AWS. He focuses on Amazon Redshift and helps clients construct scalable analytical options. He has greater than 15 years of expertise in numerous database and information warehousing applied sciences. He’s captivated with automating and fixing buyer issues with using cloud options.

[ad_2]

Overview of SYS monitoring views

Unify varied query-level monitoring metrics

Person query-level logging

Question lifecycle timings

Resolution overview

Stipulations

Workload and question lifecycle monitoring

Establish in-flight queries

Establish prime long-running queries

Collect day by day counts of queries by question sorts, interval, and standing

Collect run particulars of an in-flight question

Collect transaction-level particulars for commits and undo

Stats and vacuum

Knowledge ingestion monitoring

Abstract of ingestion

File-level ingress logging

Test for errors throughout ingress course of

Exterior question monitoring

Utilization of partitions

Gradual question efficiency troubleshooting

Clear up

Conclusion

Concerning the authors

LEAVE A REPLY Cancel reply