Skip to content

Backgrounds

How-to Guides

Technical References

Elasticsearch

Elasticsearch is an open-source search and analytics engine, which is commonly available together with Logstash and Kibana. It can power many types of search use cases. It’s fairly commonly used to ingest log data and then visualize trends as part of the ELK Stack, but it was originally designed to power web search, and it is what WordPress VIP, WordPress.com, and Jetpack Search use.

Elasticsearch (ES) has its own environment and data store, and interactions with it are via REST API requests. 

When Elasticsearch is powering a site’s search, it will need to continually index the content on the site, and then as search requests are made, API calls are made to tell ES what to search for and how to weight results. The results are usually just used to identify the matching content: the WordPress database remains the “source of truth”.

To integrate a WordPress site with Elasticsearch, you’ll need code to monitor for content changes and send those changes to the ES “cluster” for indexing. And also, most importantly, code to intercept the search queries and, instead of making LIKE queries to the MySQL database, send an API request to the Elasticsearch endpoint.

The actual queries usually use Query DSL. There are many types of queries

That ES endpoint will then return a set of search results, containing post IDs:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 569,
      "relation": "eq"
    },
    "max_score": 540.97675,
    "hits": [
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4536344",
        "_score": 540.97675,
        "_source": {
          "post_id": 4536344
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "105829",
        "_score": 516.1369,
        "_source": {
          "post_id": 105829
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "306074",
        "_score": 516.1369,
        "_source": {
          "post_id": 306074
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "3688167",
        "_score": 476.97778,
        "_source": {
          "post_id": 3688167
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4616046",
        "_score": 476.97778,
        "_source": {
          "post_id": 4616046
        }
      }
    ]
  }
}

Those post IDs can then be used to fetch the actual data from the database and display post summaries:

SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND wp_posts.ID IN (426,506,192)
AND wp_posts.post_type IN ('post', 'page')
AND wp_posts.post_status = 'publish'
ORDER BY wp_posts.post_date DESC
LIMIT 0, 3

So in the typical search request, you’ll see:

  • The normal WPDB query is intercepted
  • A request to the ES endpoint is made with the details from the query (i.e. the search terms)
  • A response is received, containing a list of matching post IDs (and often, other data such as rankings)
  • A new DB query is made to get the list of posts, or a series of get_post() calls are made for individual posts
  • Results are returned for the matching posts
  • And then these are rendered on the page

During publishing actions, action hooks capture the change events and identify the changed data to be indexed. Usually, the actual indexing communications with ES happens asynchronously, so there may be a slight delay after a change in WordPress before the change appears in Elasticsearch. This is one reason the database should always be used as the source of truth when rendering result pages.

Last updated: April 09, 2021