ANALYZE
Collects statistics about a database.
Syntax
ANALYZE [VERBOSE] [ (SKIP_LOCKED) ] [<table> [ (<column> [, ...] ) ]]
Description
ANALYZE
collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic
. Subsequently, Relyt uses these statistics to help determine the most efficient execution plans for queries. For information about the table statistics that are collected, see Usage notes.
With no parameter, ANALYZE
collects statistics for every table in the current database. You can specify a table name to collect statistics for a single table. You can specify a set of column names in a specific table, in which case the statistics only for those columns from that table are collected.
ANALYZE
does not collect statistics on external tables.
Parameters
-
VERBOSE
Enables display of progress messages. When specified,
ANALYZE
emits this information:- The table that is being processed.
- The query that is run to generate the sample table.
- The column for which statistics is being computed.
- The queries that are issued to collect the different statistics for a single column.
- The statistics that are collected.
-
SKIP_LOCKED
Specifies that
ANALYZE
does not wait for any conflicting locks to be released when beginning work on a relation: If it cannot lock a relation immediately without waiting, it skips the relation. Note that even with this option,ANALYZE
may still block when opening the indexes of the relation or when acquiring sample rows from table inheritance children. -
<table>
The name of the table. You can specify the name with the schema qualification. If omitted, all regular tables in the current database will be analyzed.
-
<column>
The name of the column to analyze. If not specified, all columns in the table will be analyzed.
Usage notes
It is a good idea to run ANALYZE
periodically, or just after making major changes in the contents of a table. Accurate statistics helps Relyt choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy for read-mostly databases is to run VACUUM
and ANALYZE
once a day during a low-usage time of day. (This will not be sufficient if there is heavy update activity.) You can check for tables with missing statistics using the gp_stats_missing
view, which is in the gp_toolkit
schema:
SELECT * FROM gp_toolkit.gp_stats_missing;
ANALYZE
requires SHARE UPDATE EXCLUSIVE
lock on the target table. This lock conflicts with these locks: SHARE UPDATE EXCLUSIVE
, SHARE
, SHARE ROW EXCLUSIVE
, EXCLUSIVE
, ACCESS EXCLUSIVE
.
If you run ANALYZE
on a table that does not contain data, statistics will not be collected for the table. For example, if you perform a TRUNCATE
operation on a table that has statistics, and then run ANALYZE
on the table, the statistics do not change.
The statistics collected by ANALYZE
usually include a list of some of the most common values in each column and a histogram showing the approximate data distribution in each column. One or both of these may be omitted if ANALYZE
deems them uninteresting (for example, in a unique-key column, there are no common values) or if the column data type does not support the appropriate operators.
For large tables, ANALYZE
takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time. However, note that the statistics are only approximate, and will change slightly each time ANALYZE
is run, even if the actual table contents did not change. This can result in small changes in the planner's estimated costs shown by EXPLAIN
. In rare situations, this non-determinism will cause the query optimizer to choose a different query plan between runs of ANALYZE
. To avoid this, raise the amount of statistics collected by ANALYZE
by adjusting the default_statistics_target
configuration parameter, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE ... ALTER COLUMN ... SET (n_distinct ...)
(see ALTER TABLE). The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 100, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE
and the amount of space occupied in pg_statistic
. In particular, setting the statistics target to zero deactivates collection of statistics for that column. It may be useful to do that for columns that are never used as part of the WHERE
, GROUP BY
, or ORDER BY
clauses of queries, since the planner will have no use for statistics on such columns.
The largest statistics target among the columns being analyzed determines the number of table rows sampled to prepare the statistics. Increasing the target causes a proportional increase in the time and space needed to do ANALYZE
.
One of the values estimated by ANALYZE
is the number of distinct values that appear in each column. Because only a subset of the rows are examined, this estimate can sometimes be quite inaccurate, even with the largest possible statistics target.
When Relyt performs an ANALYZE
operation to collect statistics for a table and detects that all the sampled table data pages are empty (do not contain valid data), Relyt displays a message that a VACUUM FULL
operation should be performed. If the sampled pages are empty, the table statistics will be inaccurate. Pages become empty after a large number of changes to the table, for example deleting a large number of rows. A VACUUM FULL
operation removes the empty pages and allows an ANALYZE
operation to collect accurate statistics.
If there are no statistics for the table, the server configuration parameter gp_enable_relsize_collection
controls whether the Postgres Planner uses a default statistics file or estimates the size of a table using the pg_relation_size
function. By default, the Postgres Planner uses the default statistics file to estimate the number of rows if statistics are not available.
Examples
Collect statistics for the table sales
:
ANALYZE sales;
SQL standard compatibility
There is no ANALYZE
statement in the SQL standard.