Skip to main content

EXPLAIN

Displays the query plan of a statement.


Syntax

EXPLAIN [ ( <option> [, ...] ) ] <statement>
EXPLAIN [ANALYZE] [VERBOSE] <statement>
where <option> can be:
ANALYZE [ <boolean> ]
VERBOSE [ <boolean> ]
COSTS [ <boolean> ]
BUFFERS [ <boolean> ]
TIMING [ <boolean> ]
FORMAT { TEXT | XML | JSON | YAML }

Description

The query plan of a statement is displayed as a tree plan of nodes. Each node indicates an operation, such as a join.

Read a query plan from bottom up, because a lower node feeds rows to the node directly above it. A query plan generally starts from table scans. Nodes above the scans handle operations such as sorts, aggregates, and joins, if necessary. The top nodes are usually the nodes that move rows between instances during query execution, such as redistribute, broadcast, and gather operations.

Each line in the output of EXPLAIN provides the basic node type and the following costs estimated by the planner for one node:

  • cost: the estimated execution time, usually measured in disk page fetches. Two values are displayed: the start-up cost and the total cost. The start-up cost indicates the time spent before the first row can be returned and the total cost indicates the time spent till the last row can be returned.

    When estimating the total cost, the planner assumes that all rows are retrieved. This can cause the estimated cost inaccurate in some cases, for example, using LIMIT.

  • rows: the estimated number of rows output by the node. It is often less than the actual number. In ideal situations, the estimated numbers of top-level nodes estimate the number of rows that are actually returned, updated, or deleted by the query.

  • width: the number of bytes of all the rows output by the node.

The estimated cost of an upper-level node contains the costs of its child nodes. Therefore, the estimate costs of the topmost node is that of the whole query.

The estimated cost of a query varies with the optimizer. In Relyt, this cost does not include the time spent on transmitting the result rows to the client.

EXPLAIN ANALYZE not only explains the query but also runs it. Therefore, if you run EXPLAIN ANALYZE, both the estimates and actual costs of the specified statement will be displayed. You can compare the estimates with the actual costs to check whether the estimates are accurate. Besides the information returned by EXPLAIN, the output of EXPLAIN ANALYZE also includes:

  • The total elapsed time (in milliseconds) spent to run the query.

  • The number of workers involved in a plan node operation. Only workers that return rows are counted.

  • The maximum number of rows returned by the worker that produced the most rows for an operation. If multiple workers produce an equal number of rows, the one with the longest time to end is the one chosen.

  • The worker id number of the worker that produced the most rows for an operation.

  • For relevant operations, the <work_mem> used by the operation. If <work_mem> was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disks and how many passes over the data were required for the lowest performing worker. For example:

    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
    Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile
    I/O affecting 2 workers.
    [seg0] pass 0: 488 groups made from 488 rows; 263 rows written to
    workfile
    [seg0] pass 1: 263 groups made from 263 rows

Note that ANALYZE runs the statement. Although EXPLAIN ANALYZE will discard any output that a SELECT returns, other side effects of the statement will happen as usual. If you want to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use this approach:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Only the ANALYZE and VERBOSE options can be specified, and only in that order, without surrounding the option list in parentheses.


Parameters

  • ANALYZE

    Carry out the command and show the actual run times and other statistics. If omitted, the default false will be used. You can specify ANALYZE true to enable it.

  • VERBOSE

    Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. If omitted, the default false will be used. You can specify VERBOSE true to enable it.

  • COSTS

    Include information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. If omitted, the default true will be used. You can specify COSTS false to disable it.

  • BUFFERS

    Include information on buffer usage. This parameter can be specified only when ANALYZE is also specified. If omitted, the default value false will be used. You can specify BUFFERS true to enable it.

    Note that Relyt does not support specifying BUFFERS true for distributed queries; ignore any displayed buffer usage information.

  • TIMING

    Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to false when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter can only be used when ANALYZE is also enabled. It defaults to true.

  • FORMAT

    Specify the output format, which can be TEXT, XML, JSON, or YAML. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults to TEXT.

  • <boolean>

    Specifies whether the selected option will be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to deactivate it. The boolean value can also be omitted, in which case TRUE is assumed.

  • <statement>

    Any SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, or CREATE TABLE AS statement of which the query plan you want to see.


Usage notes

In order to allow the query optimizer to make reasonably informed decisions when optimizing queries, the ANALYZE statement must be run to record statistics about the distribution of data within the table. If you have not done this (or if the statistical distribution of the data in the table has changed significantly since the last time ANALYZE was run), the estimated costs are unlikely to conform to the real properties of the query, and consequently an inferior query plan may be chosen.

An SQL statement that is run during the execution of an EXPLAIN ANALYZE command is excluded from Relyt resource queues.


SQL standard compatibility

There is no EXPLAIN statement defined in the SQL standard.