{"id":100722,"date":"2023-12-11T00:26:42","date_gmt":"2023-12-11T00:26:42","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=100722"},"modified":"2024-03-07T11:38:24","modified_gmt":"2024-03-07T11:38:24","slug":"exploring-postgresql-indexes","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/exploring-postgresql-indexes\/","title":{"rendered":"Exploring PostgreSQL Indexes"},"content":{"rendered":"<p><em>In this blog, we continue our exploration on PostgreSQL indexes which we started <\/em><a href=\"https:\/\/www.red-gate.com\/simple-talk\/homepage\/postgresql-indexes-what-they-are-and-how-they-help\/\">here<\/a><em>. In that article, we learned what an index is, and how exactly indexes can help with query execution. But there is much more to learn about indexes! In this blog, we will keep exploring B-tree indexes. We will learn whether (and how) database constraints and indexes are related (or not), how exactly index bitmap scan works, and explore some additional index options available in PostgreSQL. <\/em><\/p>\n<h2>Indexes and Constraints<\/h2>\n<p>In the previous article we learned that the most helpful indexes are indexes with the lowest selectivity, which means that each distinct value in an index corresponds to a small number of rows. The smallest number of rows is one, thereby, the most useful indexes are unique indexes.<\/p>\n<p>The definition of a unique index states just that: an index is unique if for each indexed value there is exactly one matching row in the table. PostgreSQL automatically creates a unique index to support any primary key or unique constraint on a table.<\/p>\n<p>Is there any difference between a primary key and a unique constraint? Many SQL developers assume that a primary key must be an incrementing numeric value and that each table \u201chas\u201d to have a primary key. Although it often helps to have a numeric incremental primary key (which in this case is called a surrogate key), a primary key does not have to be numeric, and moreover, it does not have to be a single-attribute constraint. It is possible to define a primary key as a combination of several attributes; it just has to satisfy two conditions: the combination must be <code>UNIQUE<\/code> and <code>NOT NULL<\/code> for all participating attributes. In contrast, <code>UNIQUE<\/code> constraints in PostgreSQL allow for <code>NULL<\/code> values.<\/p>\n<p>A table can only have a single primary key (though a primary key is not required), but can also have multiple unique constraints. Any non-null unique set of columns can be chosen to be a primary key for a table; thus, there is no programmatic way to determine the best candidate for a table\u2019s primary key.<\/p>\n<p>Note: If you want to try the code, the appendix has the information you need to install the sample database we will be using, as well as some necessary updates before starting with the current article.<\/p>\n<p>For example, the table booking has a primary key on <code>booking_id<\/code> and a unique key on <code>booking_ref<\/code>. Note: you do not need to create a unique constraint to create a unique index:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ALTER TABLE booking\r\n  ADD CONSTRAINT booking_pkey PRIMARY KEY (booking_id);\r\n\r\nALTER TABLE booking\r\n  ADD CONSTRAINT booking_booking_ref_key UNIQUE (booking_ref);<\/pre>\n<p>In this case, both columns are unique, and moreover, there is no need for a <code>booking_id<\/code> column at all. Booking reference is generated by any reservation system, and it is always both <code>UNIQUE<\/code> and <code>NOT NULL<\/code>. The only reason we added a <code>booking_id<\/code> field to the booking table was to allow application developers to comply to their development standards.<\/p>\n<p>In the case below, the situation is different. The account table has a primary key account_id. When we add a unique index on <code>frequent_flyer_id<\/code> column, we want to make sure that there will never be two accounts with the same <code>frequent_flyer_id<\/code>, however, we will have many accounts for which <code>frequent_flyer_id<\/code> will be null.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE UNIQUE INDEX account_freq_flyer \r\n  ON account (frequent_flyer_id);<\/pre>\n<p>What about foreign keys? Do they automatically create any indexes? A common misconception is the belief that the presence of a foreign key necessarily implies the presence of an index on the child table. This is not true.<\/p>\n<p>A foreign key is a referential integrity constraint; it guarantees that for each non-null value in the child table (i.e., the table with the foreign key constraint), there is a matching unique value in the parent table (i.e., the table it is referencing). The column(s) in the parent table must have a primary key or unique constraint on them, however, nothing is automatically created on a child table.<\/p>\n<p>This misconception often backfires when we need to delete a row from a parent table. I observed a user wondering why a deletion of one row (supported by a unique index) took over 10 minutes. It turned out that the table was a parent table for thirteen foreign key constraints, and none of them was supported by an index. To make sure that deletion wouldn\u2019t violate any of these integrity constraints, Postgres had to sequentially scan thirteen large tables.<\/p>\n<p>Does it mean that we always should create an index on column(s) which has a foreign key constraint? Not necessarily. We remember that only indexes with reasonably low selectivity are useful. To illustrate what \u201creasonable\u201d means in this context, let\u2019s consider the following example.<\/p>\n<p><em>The <\/em>flight <em>table in the <\/em><code>postgres_air<\/code><em> database has a foreign key constraint on the <\/em><code>aircraft_code<\/code><em>:<\/em><\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ALTER TABLE flight\r\n    ADD CONSTRAINT aircraft_code_fk FOREIGN KEY (aircraft_code)\r\n    REFERENCES aircraft (code);<\/pre>\n<p>This foreign key constraint is necessary because for each flight, there must be a valid aircraft assigned. To support the foreign key constraint, a primary key constraint was added to the <code>aircraft<\/code> table. That table, however, has only 12 rows. Therefore, it is not necessary to create an index on the <code>aircraft_code<\/code> column of the <code>flight<\/code> table. This column has only 12 distinct values, so an index on that column will not be used.<\/p>\n<p>Let\u2019s examine the execution plan for the following query:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n    f.flight_no,\r\n\u00a0\u00a0\u00a0\u00a0f.scheduled_departure,\r\n\u00a0\u00a0\u00a0\u00a0model,\r\n\u00a0\u00a0\u00a0\u00a0count(passenger_id) passengers\r\nFROM flight f\r\nJOIN booking_leg bl ON bl.flight_id = f.flight_id\r\nJOIN passenger p ON p.booking_id=bl.booking_id\r\nJOIN aircraft ac ON ac.code=f.aircraft_code\r\nWHERE f.departure_airport ='JFK'\r\n\u00a0\u00a0\u00a0   AND f.scheduled_departure BETWEEN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0   \u00a0'2023-08-14' AND '2023-08-16'\r\nGROUP BY 1,2,3<\/pre>\n<p>The execution plan presented on the Figure 1, and fear not of its size \u2013 at this moment, we need just a small portion of it. Pay attention to the lines 19-20:<\/p>\n<pre class=\"\">Hash  (cost=1.12..1.12 rows=12 width=64)\r\n       -&gt; Seq Scan on aircraft ac (cost=0.00..1.12 rows=12<\/pre>\n<p>The PostgreSQL optimizer accesses table statistics and is able to detect that the size of the aircraft table is small and index access won\u2019t be efficient. In contrast, the index on departure_airport field of the flight table proved to be useful due to its low selectivity, and you can see it being used in lines 17-18:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">   -&gt;  Bitmap Index Scan on flight_departure_airport \r\n (cost=0.00..119.33 rows=10521 width=0)\r\nIndex Cond: (departure_airport = 'JFK'::bpchar)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1267\" height=\"832\" class=\"wp-image-100723\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/a-screenshot-of-a-computer-description-automatica-127.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with medium confidence\" \/><\/p>\n<p><strong>Figure 1. Execution plan with sequential scan on a small table<\/strong><\/p>\n<h2>What does \u201cbitmap\u201d mean?<\/h2>\n<p>It has been several times already that the term \u2018bitmap\u2019 appeared in execution plans. Each time I promised \u201cto talk about it later,\u201d and now this \u201clater\u201d is here.<\/p>\n<p>It is critically important to fully understand how the indexes bitmapping works in PostgreSQL, especially because other major DBMS do not have similar access methods, documentation does not go into many details about it, and there is a lot of confusion among users regarding how it works. Let\u2019s proceed with uncovering this mystery.<\/p>\n<p>Depending on the index selectivity, PostgreSQL utilizes one of two methods. The first is an index scan; the second depends on a bitmap index scan, followed by a bitmap heap scan.<\/p>\n<p>In the previous blog, we discussed how the index scan works. An index entry includes addresses of the records which match the indexed value; this address contains the block address and the offset of the record within the block. In an index scan, the database engine reads each entry of the index that satisfies the filter condition and retrieves blocks in index order. Because the underlying table is a heap, multiple index entries might point to the same block. If an index has a low selectivity, these situations are rare, and there are high chances that no block will be read (or rather attempted to be read) more than once. However, the situation is different when the selectivity is high, multiple indexes are used, or the filtering condition differs from equal. It these cases, a bitmap index scan will be used. This method builds a heap bitmap in memory. The whole bitmap is a single bit array, with as many bits as there are heap blocks in the table being scanned.<\/p>\n<p>Each index used to satisfy selection criteria is used sequentially. To perform a bitmap index scan, first, a bitmap is created with all entries initially set to 0 (false). Whenever an index entry that matches the search condition is found, the bit corresponding to the heap block indicated by the index entry is set to 1 (true). The second and any additional bitmap index scans do the same thing with the indexes corresponding to the additional search conditions. Once all bitmaps have been created, the engine performs a bitwise logical AND operation to find which blocks contain requested values for all selection criteria, producing a final candidate list. This means that blocks that satisfy only one of the two criteria in a logical AND never have to be accessed. An illustration is shown in Figure 2.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-100732\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/2.png\" alt=\"\" width=\"1142\" height=\"264\" \/><\/p>\n<p><strong>Figure 2.\u2002Using bitmaps for table access through multiple indexes<\/strong><\/p>\n<p>Note, that <code>AND<\/code> is not the only logical operator which can be used on bitmaps. If some conditions in the selection criteria are connected with <code>OR<\/code>, the logical <code>OR<\/code> will be applied, and if a query has more search conditions on a single table, more bitmaps can be put to use.<\/p>\n<p>After the final candidate list is computed, the candidate blocks are read sequentially using a bitmap heap scan (a heap scan based on a bitmap), and for each block, the individual records are examined to re-check the search conditions. Note that requested values may reside in different rows in the same block. The bitmap ensures that relevant rows will not be missed but does not guarantee that all scanned blocks contain a relevant row.<\/p>\n<p>In Figure 1, lines 15 and 17 represent two bitmap index scans, line 14 represents bitmap AND, and line 12 shows bitmap heap scan. To summarize, bitmap access method always includes these three steps:<\/p>\n<ul>\n<li>Bitmap index scan(s)<\/li>\n<li>Logical AND\/OR<\/li>\n<li>Bitmap Heap scan<\/li>\n<\/ul>\n<h2>Do we need compound indexes?<\/h2>\n<p>Same as other RDBMS, PostgreSQL allows compound (or multi-column) indexes. Since PostgreSQL can utilize multiple indexes for the same query using bitmap index scan, is there any reason to create compound indexes? Let\u2019s run some experiments.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n      scheduled_departure\u00a0,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scheduled_arrival\r\nFROM flight\r\nWHERE departure_airport='ORD' \r\n  AND arrival_airport='JFK'\r\n  AND scheduled_departure BETWEEN '2023-07-03' AND '2023-07-04';<\/pre>\n<p>The execution plan for this <code>SELECT<\/code> statement is presented in Figure 3.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1264\" height=\"389\" class=\"wp-image-100724\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/a-screenshot-of-a-computer-description-automatica-128.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with medium confidence\" \/><\/p>\n<p><strong>Figure 3. Execution plan with three indexes<\/strong><\/p>\n<p>There are three indexes which support all three search criteria, and Postgres takes advantage of all three of them. Let\u2019s create a compound index on all three columns:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE INDEX flight_depart_arr_sched_dep ON  flight(\r\n        departure_airport,\r\n        arrival_airport,\r\n        scheduled_departure);<\/pre>\n<p>The new execution plan is presented on Figure 4:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1264\" height=\"168\" class=\"wp-image-100725\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/a-picture-containing-text-screenshot-font-line.png\" alt=\"A picture containing text, screenshot, font, line\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 4. Execution plan with compound index<\/strong><\/p>\n<p>This new compound index will support searches by departure_airport, by departure_airport and arrival_airport, and by departure_airport, arrival_airport, and scheduled_departure. It will not support, however, the searches by arrival_airport or scheduled_departure. Indeed, recall how b-tree indexes are built and how they are used: we rely on the <strong>ordering<\/strong> of indexed values. The values of compound indexes are ordered first by the first column, then be the second column, and so on.<\/p>\n<p>The query<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n      departure_airport,\r\n       scheduled_arrival,\r\n       scheduled_departure\r\nFROM flight\r\nWHERE  arrival_airport='JFK'\r\n       AND scheduled_departure \r\n          BETWEEN '2023-07-03' AND '2023-07-04'<\/pre>\n<p>\u2026will produce the execution plan shown in Figure 5.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1344\" height=\"454\" class=\"wp-image-100726\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/a-screenshot-of-a-computer-description-automatica-129.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p><strong>Figure 5.\u2002Compound index is not used<\/strong><\/p>\n<p>On the other hand, the query<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT \r\n      scheduled_departure\u00a0,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0scheduled_arrival\r\nFROM flight\r\nWHERE departure_airport='ORD' AND arrival_airport='JFK'\r\n  AND scheduled_arrival BETWEEN '2023-07-03' AND '2023-07-04';<\/pre>\n<p>\u2026will use the compound index, although only for the first two columns, as shown in Figure 6.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1326\" height=\"340\" class=\"wp-image-100727\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/11\/a-screenshot-of-a-computer-description-automatica-130.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated with low confidence\" \/><\/p>\n<p><strong>Figure 6.\u2002A plan that uses the compound index for the first two columns<\/strong><\/p>\n<p>In general, an index on (<code>X,Y,Z<\/code>) will be used for searches on <code>X<\/code>, (<code>X,Y<\/code>), and (<code>X,Y,Z<\/code>) and even (<code>X,Z<\/code>) but not on <code>Y<\/code> alone and not on <code>(Y,Z)<\/code>.\u00a0Thus, when a compound index is created, it\u2019s not enough to decide which columns to include; their order must also be considered.<\/p>\n<p>Why create compound indexes? After all, the previous section demonstrated that using\u00a0several indexes\u00a0together will work just fine. Most times, the decision on whether to create a compound index is based on whether they can provide additional selectivity and improve performance of critical queries. I will cover other potential advantages of compound indexes in the next blog.<\/p>\n<h2>Summary<\/h2>\n<p>PostgreSQL has multiple ways to index the tables, and knowing what available options are can dramatically improve an application performance. Remember, that so far, we only talked about b-tree indexes, and we are not done with them yet, and we didn\u2019t even start talking about other index types!<\/p>\n<p><em>Did you have fun exploring indexes in PostgreSQL? Would you like to learn more about indexing? Look for the next up in this series.<\/em><\/p>\n<h2>Appendix \u2013 Setting up the training database<\/h2>\n<p>If you want to repeat the experiments in this article, the instructions for installing the base database can be found in the Appendix of the first article in this series. If you are starting here, use those instructions to download the database. Once you have downloaded and restored the database, you will need to create the following additional indexes on your copy of the postges_air database:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SET search_path TO postgres_air;\r\nCREATE INDEX flight_arrival_airport ON flight\u00a0\u00a0(arrival_airport);\r\nCREATE INDEX booking_leg_flight_id ON booking_leg\u00a0\u00a0(flight_id);\r\nCREATE INDEX flight_actual_departure ON flight\u00a0\u00a0(actual_departure);\r\nCREATE INDEX boarding_pass_booking_leg_id ON boarding_pass\u00a0\u00a0(booking_leg_id);\r\nCREATE INDEX booking_update_ts ON booking  (update_ts);<\/pre>\n<p>Don\u2019t forget to run ANALYZE on all the tables for which you build new indexes:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ANALYZE  flight\u00a0;\r\nANALYZE  booking_leg;\r\nANALYZE  booking;\r\nANALYZE boarding_pass;<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog, we continue our exploration on PostgreSQL indexes which we started here. In that article, we learned what an index is, and how exactly indexes can help with query execution. But there is much more to learn about indexes! In this blog, we will keep exploring B-tree indexes. We will learn whether (and&#8230;&hellip;<\/p>\n","protected":false},"author":341426,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143534],"tags":[159028,159066],"coauthors":[158986],"class_list":["post-100722","post","type-post","status-publish","format-standard","hentry","category-featured","category-postgresql","tag-planetpostgresqlhenriettadombrovskaya","tag-postgresql-101-webinar-sidebar"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341426"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=100722"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100722\/revisions"}],"predecessor-version":[{"id":100730,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/100722\/revisions\/100730"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=100722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=100722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=100722"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=100722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}