EXPLAIN
Displays the query plan of a statement.
Syntax
EXPLAIN [ ( <option> [, ...] ) ] <statement>
EXPLAIN [ANALYZE] [VERBOSE] <statement>
where <option> can be:
ANALYZE [ <boolean> ]
VERBOSE [ <boolean> ]
COSTS [ <boolean> ]
BUFFERS [ <boolean> ]
TIMING [ <boolean> ]
FORMAT { TEXT | XML | JSON | YAML }
Description
The query plan of a statement is displayed as a tree plan of nodes. Each node indicates an operation, such as a join.
Read a query plan from bottom up, because a lower node feeds rows to the node directly above it. A query plan generally starts from table scans. Nodes above the scans handle operations such as sorts, aggregates, and joins, if necessary. The top nodes are usually the nodes that move rows between instances during query execution, such as redistribute, broadcast, and gather operations.
Each line in the output of EXPLAIN
provides the basic node type and the following costs estimated by the planner for one node:
-
cost
: the estimated execution time, usually measured in disk page fetches. Two values are displayed: the start-up cost and the total cost. The start-up cost indicates the time spent before the first row can be returned and the total cost indicates the time spent till the last row can be returned.When estimating the total cost, the planner assumes that all rows are retrieved. This can cause the estimated cost inaccurate in some cases, for example, using
LIMIT
. -
rows
: the estimated number of rows output by the node. It is often less than the actual number. In ideal situations, the estimated numbers of top-level nodes estimate the number of rows that are actually returned, updated, or deleted by the query. -
width
: the number of bytes of all the rows output by the node.
The estimated cost of an upper-level node contains the costs of its child nodes. Therefore, the estimate costs of the topmost node is that of the whole query.
The estimated cost of a query varies with the optimizer. In Relyt, this cost does not include the time spent on transmitting the result rows to the client.
EXPLAIN ANALYZE
not only explains the query but also runs it. Therefore, if you run EXPLAIN ANALYZE
, both the estimates and actual costs of the specified statement will be displayed. You can compare the estimates with the actual costs to check whether the estimates are accurate. Besides the information returned by EXPLAIN
, the output of EXPLAIN ANALYZE
also includes:
-
The total elapsed time (in milliseconds) spent to run the query.
-
The number of workers involved in a plan node operation. Only workers that return rows are counted.
-
The maximum number of rows returned by the worker that produced the most rows for an operation. If multiple workers produce an equal number of rows, the one with the longest time to end is the one chosen.
-
The worker id number of the worker that produced the most rows for an operation.
-
For relevant operations, the
<work_mem>
used by the operation. If<work_mem>
was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disks and how many passes over the data were required for the lowest performing worker. For example:Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile
I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to
workfile
[seg0] pass 1: 263 groups made from 263 rows
Note that ANALYZE
runs the statement. Although EXPLAIN ANALYZE
will discard any output that a SELECT
returns, other side effects of the statement will happen as usual. If you want to use EXPLAIN ANALYZE
on a DML statement without letting the command affect your data, use this approach:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
Only the ANALYZE
and VERBOSE
options can be specified, and only in that order, without surrounding the option list in parentheses.
Parameters
-
ANALYZE
Carry out the command and show the actual run times and other statistics. If omitted, the default
false
will be used. You can specifyANALYZE true
to enable it. -
VERBOSE
Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. If omitted, the default
false
will be used. You can specifyVERBOSE true
to enable it. -
COSTS
Include information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. If omitted, the default
true
will be used. You can specifyCOSTS false
to disable it. -
BUFFERS
Include information on buffer usage. This parameter can be specified only when
ANALYZE
is also specified. If omitted, the default valuefalse
will be used. You can specifyBUFFERS true
to enable it.Note that Relyt does not support specifying
BUFFERS true
for distributed queries; ignore any displayed buffer usage information. -
TIMING
Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to
false
when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter can only be used whenANALYZE
is also enabled. It defaults totrue
. -
FORMAT
Specify the output format, which can be
TEXT
,XML
,JSON
, orYAML
. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults toTEXT
. -
<boolean>
Specifies whether the selected option will be turned on or off. You can write
TRUE
,ON
, or1
to enable the option, andFALSE
,OFF
, or0
to deactivate it. The boolean value can also be omitted, in which caseTRUE
is assumed. -
<statement>
Any
SELECT
,INSERT
,UPDATE
,DELETE
,VALUES
,EXECUTE
,DECLARE
, orCREATE TABLE AS
statement of which the query plan you want to see.
Usage notes
In order to allow the query optimizer to make reasonably informed decisions when optimizing queries, the ANALYZE
statement must be run to record statistics about the distribution of data within the table. If you have not done this (or if the statistical distribution of the data in the table has changed significantly since the last time ANALYZE
was run), the estimated costs are unlikely to conform to the real properties of the query, and consequently an inferior query plan may be chosen.
An SQL statement that is run during the execution of an EXPLAIN ANALYZE
command is excluded from Relyt resource queues.
SQL standard compatibility
There is no EXPLAIN
statement defined in the SQL standard.