In the previous articles we discussed PostgreSQL
indexing engine and the interface of access methods, as well as
hash indexes,
B-trees,
GiST,
SP-GiST,
GIN,
RUM, and
BRIN. But we still need to look at Bloom indexes.
Bloom
General concept
A classical Bloom filter is a data structure that enables us to quickly check membership of an element in a set. The filter is highly compact, but allows false positives: it can mistakenly consider an element to be a member of a set (false positive), but it is not permitted to consider an element of a set not to be a member (false negative).
The filter is an array of
bits (also called a
signature) that is initially filled with zeros.
different hash functions are chosen that map any element of the set to
bits of the signature. To add an element to the set, we need to set each of these bits in the signature to one. Consequently, if all the bits corresponding to an element are set to one, the element can be a member of the set, but if at least one bit equals zero, the element is not in the set for sure.
In the case of a DBMS, we actually have
separate filters built for each index row. As a rule, several fields are included in the index, and it's values of these fields that compose the set of elements for each row.
By choosing the length of the signature
, we can find a trade-off between the index size and the probability of false positives. The application area for Bloom index is large, considerably «wide» tables to be queried using filters on each of the fields. This access method, like BRIN, can be regarded as an accelerator of sequential scan: all the matches found by the index must be rechecked with the table, but there is a chance to avoid considering most of the rows at all.