Skip to content

Backgrounds

How-to Guides

Technical References

Security /

Validating, sanitizing, and escaping

When writing your theme and plugin code, it’s important to be mindful of how you handle data coming into WordPress and how it’s presented to the end user. This commonly comes up when building a settings page for your theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post. There is a distinction between how input and output are managed, and this document will walk you through that.

Note

For more information on why WordPress.com VIP takes these practices so seriously, read The Importance of Escaping All The Things, which discusses why escaping (sanitizing input and escaping output) is a critical aspect of web application security.

Guiding principles

  1. Never trust user input.
  2. Escape as late as possible.
  3. Escape everything from untrusted sources (e.g., databases and users), third-parties (e.g., Twitter), etc.
  4. Never assume anything.
  5. Sanitation is okay, but validation/rejection is better.

Validating: checking user input

To validate is to ensure the data you’ve requested of the user matches what they’ve submitted. There are several core methods you can use for input validation; the best usage depends on the type of fields you’d like to validate. Let’s take a look at an example.

Your form might include an field for zip code:

<input id="my-zipcode" type="text" maxlength="5" name="my-zipcode" />

The user is limited to five characters of input, but there’s no limitation on what they can input. They could enter “11221” or “eval(“. If we’re saving to the database, there’s no way we want to give the user unrestricted write access.

This is where validation plays a role. To further limit the user’s input, code can check each field for its proper data type. If it’s not of the proper data type, it will be discarded. For instance, to check my-zipcode field, you might do something like this:

$safe_zipcode = intval( $_POST['my-zipcode'] );
if ( ! $safe_zipcode ) {
	$safe_zipcode = '';
}
update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );

The intval() function casts user input as an integer, and defaults to zero if the input was a non-numeric value. It then checks to see if the value ended up as zero. If it did, it will save an empty value to the database. Otherwise, it will save the properly validated zip code.

Note that you could go even further and make sure the the zip code is actually a valid one based on expected ranges and lengths (e.g. 111111111 is not a valid zip code but would be saved fine with the function above).

This style of validation most closely follows WordPress’ safelist philosophy: only allow the user to input what you’re expecting. A number of helper functions are available for most data types.

Sanitizing: cleaning user input

Sanitization is a bit more liberal of an approach to accepting user data. Sanitization is the best approach when when there’s a range of acceptable input.

For instance, in a form field like this:

<input id="title" type="text" name="title" />

The data could be sanitized with the sanitize_text_field() function:

$title = sanitize_text_field( $_POST['title'] );
update_post_meta( $post->ID, 'title', $title );

Behind the scenes, the function does the following:

  • Checks for invalid UTF-8
  • Converts single < characters to entity
  • Strips all tags
  • Remove line breaks, tabs and extra whitespace
  • Strip octets

The sanitize_*() class of helper functions ensure you end up with safe data and while requiring minimal effort on your part.

In some instances, using wp_kses and its related functions can easily clean HTML while keeping anything relevant to your needs.

Escaping: securing output

Escaping handles security on the other end of the spectrum. To escape is to take the data you may already have and help secure it prior to rendering it for the end user. WordPress has a few helper functions for most of what you’ll commonly need to do:

esc_html() should be used any time your HTML element encloses a section of data you’re outputting.

<h4><?php echo esc_html( $title ); ?></h4>

esc_url() should be used on all URLs, including those in the ‘src’ and ‘href’ attributes of an HTML element.

<img alt="" src="<?php echo esc_url( $great_user_picture_url ); ?>" />

esc_js() is intended for inline Javascript.

<div onclick='<?php echo esc_js( $value ); ?>' />

esc_attr() can be used on everything else that’s printed into an HTML element’s attribute.

<ul class="<?php echo esc_attr( $stored_class ); ?>">

wp_kses() can be used on everything that is expected to contain HTML. There are several variants of the main function, each featuring a different list of built-in defaults. A popular example is wp_kses_post(), which allows all markup normally permitted in posts. You can of course roll your own filter by using wp_kses() directly.

<?php
echo wp_kses_post( $partial_html );
echo wp_kses(
	$another_partial_html,
	array(
		'a'      => array(
        	'href'  => array(),
        	'title' => array(),
    	),
    	'br'     => array(),
    	'em'     => array(),
    	'strong' => array(),
	)
); ?>

As an example, passing an array to wp_kses() containing the member

'a' => array( 'href' , 'title', )

means that only those two HTML attributes will be allowed for a tags — all others will be stripped. Referencing a blank array from any given key means that no attributes are allowed for that element and they should all be stripped.

There has historically been a perception that wp_kses() is slow. While it is a bit slower than the other escaping functions, the difference is minimal and does not have as much of an impact as most slow queries or uncached functions would. (For more information, read Zack Tollman’s wp_kses investigation into them.)

It’s important to note that most WordPress functions properly prepare the data for output, and you don’t need to escape again.

rawurlencode() should be used over urlencode() to ensure URLs are correctly encoded. Only legacy systems should use urlencode().

<?php echo esc_url( 'http://example.com/a/safe/url?parameter=' . rawurlencode( $stored_class ) ); ?>

Always escape late

It’s best to do the output escaping as late as possible, ideally as data is being outputted.

// Okay, but not that great.
$url = esc_url( $url );
$text = esc_html( $text );
echo '<a href="'. $url . '">' . $text . '</a>';

// Much better!
echo '<a href="'. esc_url( $url ) . '">' . esc_html( $text ) . '</a>';

This is for a few reasons:

  • It makes code reviews and deploys happen faster because rather than hunting through many lines of code, it can be deemed safe for output at a glance.
  • Something could inadvertently change the variable between when it was firstly cast and when it’s outputted, introducing a potential vulnerability.
  • Late escaping makes it easier to do automatic code scanning (saving time and cutting down on review/deploy times).
  • Late escaping whenever possible makes the code much more robust and future proof.
  • Escaping/casting on output removes any ambiguity and adds clarity (always develop for the maintainer).

Escape on string creation

It is sometimes not practical to escape late. In a few rare circumstances you cannot pass the output to wp_kses since by definition it would strip the scripts that are being generated.

In situations like this, always escape while creating the string and store the value in a variable that is a postfixed with _escaped, _safe or _clean (e.g., $variable becomes $variable_escaped or $variable_safe).

If a function cannot output internally and late escape, then it must always return “safe” html. This allows you to do echo my_custom_script_code(); without needing the script tag to be passed through a version of wp_kses that would allow such tags.

WordPress reference

Take a look through the Data Validation Plugin Handbook page  to see all of the sanitization and escaping functions WordPress offers.

Last updated: April 09, 2021