Skip to content

Mapping in SearchPress

Matthew Boynes edited this page Apr 16, 2020 · 4 revisions

Mapping Overview

As much as possible, we try to maintain a 1:1 relationship between the WordPress database data structure and the Elasticsearch schema. In some cases, we have to take some creative liberties due to the vastly different nature of how Elasticsearch searches for content (for instance, breaking the date fields into objects that contain various aspects of that date). Furthermore, while parent-child relationships are technically possible in Elasticsearch, they are not quite powerful and flexible enough to cover WordPress' needs, so we instead have to store relevant data within each post, like term and author data. While this increases our storage needs, it significantly speeds up query time; disk space is cheap and time is expensive, so this is a worthwhile trade-off.

This mapping allows SearchPress to query almost everything WP_Query can query.

  • post_id: long
  • post_title: string (analyzed)
    • post_title.raw: string (not analyzed)
  • post_excerpt: string (analyzed)
  • post_content: string (analyzed)
  • post_status: string (not analyzed)
  • post_name: string (analyzed)
    • post_name.raw: string (not analyzed)
  • post_parent: long (See Parent below)
  • parent_status: string (not analyzed)
  • post_type: string (analyzed)
    • post_type.raw: string (not analyzed)
  • post_mime_type: string (not analyzed)
  • post_password: string (not analyzed)
  • menu_order: integer
  • permalink: string (analyzed)
  • post_author: object (See Author below)
    • post_author.user_id: long
    • post_author.display_name: string (analyzed)
    • post_author.login: string (not analyzed)
    • post_author.user_nicename: string (not analyzed)
  • post_date: object
    • post_date.date: date (YYYY-MM-dd HH:mm:ss)
    • post_date.year: short
    • post_date.month: byte
    • post_date.day: byte
    • post_date.hour: byte
    • post_date.minute: byte
    • post_date.second: byte
    • post_date.week: byte
    • post_date.day_of_week: byte
    • post_date.day_of_year: short
    • post_date.seconds_from_day: integer
    • post_date.seconds_from_hour: short
  • post_date_gmt: object
    • post_date_gmt.date: date (YYYY-MM-dd HH:mm:ss)
    • post_date_gmt.year: short
    • post_date_gmt.month: byte
    • post_date_gmt.day: byte
    • post_date_gmt.hour: byte
    • post_date_gmt.minute: byte
    • post_date_gmt.second: byte
    • post_date_gmt.week: byte
    • post_date_gmt.day_of_week: byte
    • post_date_gmt.day_of_year: short
    • post_date_gmt.seconds_from_day: integer
    • post_date_gmt.seconds_from_hour: short
  • post_modified: object
    • post_modified.date: date (YYYY-MM-dd HH:mm:ss)
    • post_modified.year: short
    • post_modified.month: byte
    • post_modified.day: byte
    • post_modified.hour: byte
    • post_modified.minute: byte
    • post_modified.second: byte
    • post_modified.week: byte
    • post_modified.day_of_week: byte
    • post_modified.day_of_year: short
    • post_modified.seconds_from_day: integer
    • post_modified.seconds_from_hour: short
  • post_modified_gmt: object
    • post_modified_gmt.date: date (YYYY-MM-dd HH:mm:ss)
    • post_modified_gmt.year: short
    • post_modified_gmt.month: byte
    • post_modified_gmt.day: byte
    • post_modified_gmt.hour: byte
    • post_modified_gmt.minute: byte
    • post_modified_gmt.second: byte
    • post_modified_gmt.week: byte
    • post_modified_gmt.day_of_week: byte
    • post_modified_gmt.day_of_year: short
    • post_modified_gmt.seconds_from_day: integer
    • post_modified_gmt.seconds_from_hour: short
  • terms.*: object (See Terms below)
    • terms.*.name: string (analyzed)
    • terms.*.name.raw: string (not analyzed)
    • terms.*.term_id: long
    • terms.*.parent: long
    • terms.*.slug: string (not analyzed)
  • post_meta.*: object (See Meta below)
    • post_meta.*.value: string (analyzed)
    • post_meta.*.raw: string (not analyzed)
    • post_meta.*.long: long
    • post_meta.*.double: double
    • post_meta.*.boolean: boolean
    • post_meta.*.date: date (YYYY-MM-dd)
    • post_meta.*.datetime: date (YYYY-MM-dd HH:mm:ss)
    • post_meta.*.time: date (HH:mm:ss)

Terms

SearchPress stores terms as objects within posts. The default SearchPress mapping has a dynamic mapping which stores the terms by taxonomy, and stores the term's ID, name (both analyzed and not analyzed), parent ID, and slug (not analyzed). For instance, to refer to a category slug, you would use terms.category.slug and to refer to a tag's ID you would use terms.post_tag.term_id.

Author

SearchPress stores author data as an object within posts. By default, the author's ID, name, login and "nicename" are stored. If you're using Co-Authors Plus, that data is stored as taxonomy terms.

Meta

SearchPress stores meta as objects within posts. The default SearchPress mapping has a dynamic mapping which stores post meta by meta keys. The data is analyzed prior to indexing, and depending on what the data looks like, it may have any of these properties:

  • value: This is always present and contains an analyzed string cast of the value.
  • raw: This is always present and contains the raw (not analyzed) string cast of the value.
  • long: If the post meta is numeric, this contains the integer cast of the value.
  • double: If the post meta is numeric, this contains the float cast of the value.
  • boolean: This is always present and contains a boolean cast of the value.
  • date: If the post meta looks like a date, this contains the date formatted as yyyy-mm-dd (Y-m-d in PHP).
  • datetime: If the post meta looks like a date, this contains the date formatted as yyyy-mm-dd hh:mm:ss (Y-m-d H:i:s in PHP).
  • time: If the post meta looks like a date, this contains the time formatted as hh:mm:ss (H:i:s in PHP).

Note that post meta indexing is "opt-in" and by default, no post meta will be indexed for performance purposes. See Indexing in SearchPress for more information.

Parent

If a post has a parent, the parent's ID and status are stored with the post by default. If the (child) post's status is "inherit", the parent's status will be used to determine if a post should be indexed and if it should appear in search results.