WordPress.org

Make WordPress Core

Opened 13 months ago

Last modified 13 months ago

#51159 new enhancement

Let's expand our context specific escaping methods for wp_json_encode().

Reported by: whyisjake Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Security Keywords:
Focuses: javascript, template, coding-standards Cc:

Description (last modified by whyisjake)

This document is largely sourced from a document written by @mdawaffe. Full credit to him for the research and thoughts put forward here. What I'd like to do is move this toward some actionable functions and developer best practices moving forward.

wp_json_encode() is a handy helper for turning PHP into Javascript, and it is widely used in different places to serialize different variables. Imagine this hypothetical scenario:

BAD:
<pre><?php echo json_encode( $_GET ); ?></pre>

json_encode() serializes data into a string that can be used as a JavaScript literal* (e.g., null, true, false, 1234 (numbers), "strings", [ "arrays" ], and { "objects": "oh my"}.)

JSON serialization, though, has nothing to do with HTML, and so does not treat characters that are special in HTML (<, >, &, ', ") in any special way: essentially, the code above is as bad as echo $_GET['foo'].

Securing the above code is as simple as it always is in WordPress. We’re echoing data inside an HTML text node, so we use esc_html():

OK:
<pre><?php echo esc_html( json_encode( $_GET ) ); ?></pre>

Unfortunately, while secure 😀, this code is not actually correct ☹️. For historical reasons, esc_html() will not touch HTML entities (&amp;):

Input htmlspecialchars() esc_html()
& &amp; 😀 &amp; 😀
&amp; &amp;amp; 😀 &amp; ☹️

In the example above, if there are any HTML entities in $_GET, they will be echoed verbatim to the page, which means they will appear unescaped to the page’s visitor.

In an HTML Text Node

To faithfully represent the contents of a JSON blob in an HTML text node, the following code must be used:

GOOD:
<pre><php echo _wp_specialchars(
	wp_json_encode( $value ),
	ENT_NOQUOTES, // Don't need to HTML-escape quotes (output is for a text node).
	'UTF-8',      // json_encode() outputs UTF-8 (really just ASCII), not the blog's charset.
	true,         // Do "re-escape" HTML entities: `&amp;` -> `&amp;amp;`
); ?><pre>

This code is only appropriate for outputting JSON in HTML text nodes. There are several other contexts where we would like to output JSON, and each of those different contexts requires different treatment.

As an HTML Attribute Node

Though the HTML5 .dataset API only accepts string values for data-* attributes, jQuery will automatically parse data-* attribute values that are JSON serializations. So, when using jQuery, the following pattern is often handy:

BAD:
<div data-foo='<?php echo json_encode( $foo ); ?>'>

Handy but, as we should know by now, insecure 😀. We need to HTML-escape the output.

Like esc_html(), esc_attr() also leaves HTML entities untouched, so, again, the solution is to “manually” use _wp_specialchars():

GOOD:
<div data-foo='<?php echo _wp_specialchars(
	wp_json_encode( $foo ),
	ENT_QUOTES, // Must HTML-escape quotes (output is for an attriibute node).
	'UTF-8',    // json_encode() outputs UTF-8 (really just ASCII), not the blog's charset.
	true,       // Do "re-escape" HTML entities: `&amp;` -> `&amp;amp;`
); ?>'>

It’s important to note that this code snippet is suitable for whole HTML attributes. It is not appropriate for use on part of an HTML attribute.

In general, when we want to output a JSON blob as part of an HTML attribute, it’s because we’re trying to use it as a JavaScript literal:

BAD:
<a href="#" onclick="doSomething( <?php echo json_encode( $click_data ); ?> )">

We’ve seen that using json_encode() by itself in this context is not secure, but neither is using the above HTML attribute code (_wp_specialchars( json_encode(), … )). In the data-foo case above, we’re outputting JSON. In the onclick case, we’re outputting a JavaScript literal. _wp_specialchars() does enough to the JSON blob to make it safe for use as an HTML attribute, but it does not do anything to make it safe for use within JavaScript.

Despite claiming above that json_encode() outputs JavaScript literals, it’s more complicated than that.

In a <script> Element

The problem is that, in a <script> element, for example, we have to consider how the contents are interpreted as HTML first and then as JavaScript second.

The following pattern seems helpful. Use json_encode() to output a PHP string as a JavaScript string literal:

BAD:
<script>
var foo = <?php echo json_encode( (string) $foo ); ?>;
</script>

There are multiple ways in which this is insecure.

First, for some pages, HTML entities and the characters they represent are one and the same in the <script> element context. If $foo has HTML entities in it, problems will happen. For example, for some pages, the following two scripts are the same:

WAT?
<script>
var foo = "Hello&quot;; alert(/LOL/); var foo=&quot;LOL";
</script>
<script>
var foo = "Hello"; alert(/LOL/); var foo="LOL";
</script>

Exactly how HTML entities are interpreted in <script> elements depends on Content-Type, DOCTYPE, browser, etc. So we need a way to securely use JSON in this context that does not depend HTML-escaping (_wp_specialchars()).

So we can’t depend on HTML-escaping to save us, and we need to make sure certain strings are never output. Luckily, there are a couple of other widely implemented transformations that can help.

GOOD:
<script>
var foo = decodeURIComponent( '<?php echo rawurlencode( (string) $foo ); ?>' );
</script>

If we’re outputting a string, we can URL-encode it in PHP and URL-decode it in JavaScript. (We do have to make sure we use the right functions: rawurlencode() is slightly better than urlencode() here, and decodeURIComponent() is required over JavaScript’s deprecated unescape().)

The useful property of URL-encoding that we’re exploiting is that the transformed string is guaranteed not to have any characters in it that are regarded as special in the HTML context (<, >, &, ', "), so there’s no way to output " or an HTML Comment Opener.

For non-scalar data, this can be extended via json_encode():

GOOD:
<script>
var foo = JSON.parse( decodeURIComponent( '<?php
	echo rawurlencode( wp_json_encode( $foo ) );
?>' );
</script>

Rather than using the output of json_encode() as a JavaScript literal, the code above URL-encodes the whole serialization (braces, quotation marks and all), and we URL-decode and parse the resulting string in JavaScript to get back the structured data.

This URL-encoding is a bit tedious and results in some ugly looking JavaScript. (Ugly JavaScript is better than vulnerable JavaScript!) We can instead use JavaScript’s Unicode-escaping:

HTML Special Characters:

<script>
"<" === "\u003c" // true: < is U+3C
">" === "\u003e" // true: < is U+3E
"&" === "\u0026" // true: & is U+26
"'" === "\u0027" // true: " is U+27
'"' === "\u0022" // true: " is U+22
</script>

PHP’s json_encode() has an $options parameter, which can be used to always Unicode-escape these HTML special characters:

ALMOST (PHP 5.3+):
<script>
var foo = <?php echo wp_json_encode( $foo, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT ); ?>;
</script>

These constants are only available as of PHP 5.3.

Also, just replacing those characters isn’t good enough. We also need to Unicode-escape &#96; and $ because of their special meanings in JavaScript template literals.

GOOD (PHP 5.3+):
<script>
var message = `hello, ${<?php echo str_replace(
	array( '`', '$' ),
	array( '\\u0060', '\\u0024' ),
	wp_json_encode( $user, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT )
); ?>.name}`;
</script>

Change History (1)

#1 @whyisjake
13 months ago

  • Description modified (diff)

Ideally, WordPress core has a few functions that can replace the laborious methods to escape Javascript content given the different contexts. Something like the following:

  • esc_json()
  • esc_js_attr() or maybe esc_attr( $thing, 'json' )
  • esc_wp_json_encode() etc...
Note: See TracTickets for help on using tickets.