WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 10 months ago

#33924 new defect (bug)

sanitize_html_class valid characters

Reported by: m-e-h Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 4.4
Component: Formatting Keywords: has-patch 2nd-opinion has-unit-tests
Focuses: Cc:

Description

sanitize_html_class excludes some increasingly common valid html characters.
In particular the @ character.
The use of @ may not be extremely common for class names but it is being encouraged by some pretty renowned folks in the area of class naming conventions. http://csswizardry.com/2015/08/bemit-taking-the-bem-naming-convention-a-step-further/#responsive-suffixes

Actually pretty much anything is now valid for html classes except for spaces or tabs. I also use the / quite a bit in my classes but I thought I'd start with the @ .

Attachments (4)

33924.patch (480 bytes) - added by juliobox 6 years ago.
33924.diff (719 bytes) - added by peterwilsoncc 6 years ago.
33924.2.patch (1.6 KB) - added by juliobox 6 years ago.
33924.3.diff (2.6 KB) - added by peterwilsoncc 6 years ago.

Download all attachments as: .zip

Change History (20)

#1 @SergeyBiryukov
6 years ago

  • Keywords needs-patch added

#2 @wonderboymusic
6 years ago

  • Milestone changed from Awaiting Review to Future Release

@juliobox
6 years ago

#3 @juliobox
6 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

#4 @peterwilsoncc
6 years ago

If the sanitize_html_class regex is to be altered, I'd like to see it expanded beyond just adding an @. My approach would be to accept anything that:

  • is a one-byte character
  • does not require escaping in CSS or document.querySelector,
  • has wide browser support

Some background from Mathias Bynens:

The following characters have a special meaning in CSS: !, ", #, $, %, &, ', (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, @, [, \, ], ^, `, {, |, }, and ~.

Any characters matching [\t\n\v\f\r] need to be escaped based on their Unicode code points. The space character ( ) can simply be backslashed (\ ). Other whitespace characters don’t need to be escaped.

Other than that, characters that can’t possibly convey any meaning in CSS (e.g. ) can and should just be used unescaped.

I'll put together some reduced test cases and upload.

@peterwilsoncc
6 years ago

#5 @peterwilsoncc
6 years ago

The more I consider my comment above, the more I think over-sanitization is unnecessary and possibly counter productive. Both HTML and CSS are happy with an emoji class name, preventing this will make the function less usable.

Removing the CSS special characters seems more than enough.

$pattern = '/[\\\\#%&\',-\/:;<=>@`~\^\$\.\!\[\]\|\{\}\(\)\?\*\+"\s]/';
$sanitized_html_class = preg_replace( $pattern, "", $unsanitized_html_class );

Per 33924.diff.

#6 follow-up: @dd32
6 years ago

'/[\\\\#%&\',-\/:;<=>@`~\^\$\.\!\[\]\|\{\}\(\)\?\*\+"\s]/'

FYI this is over-escaped, it's not causing an issue, but it makes it harder to read.
This is more readable and should match the same:
'/[\\\\#%&\',:;<=>@`~^$.!|{}()?*+"\s\[\]\/-]/'

Most characters don't need to be escaped within character classes, the exception being the character class brackets [] and the regular expression delimiter in use /, and personally I always put - at the end simply so it can't confuse the PCRE library (,-\/ is fine, but for example, :-; isn't, unless escaped)

#7 in reply to: ↑ 6 @peterwilsoncc
6 years ago

Replying to dd32:

FYI this is over-escaped, it's not causing an issue, but it makes it harder to read.
This is more readable and should match the same:
'/[\\\\#%&\',:;<=>@`~^$.!|{}()?*+"\s\[\]\/-]/'

Thanks Dion, you also highlight that I blindly copied over the - character from the source. It does not require escaping so 33924.diff does not work. I'll do some further research and upload another file for the accept all valid characters option.

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


6 years ago

#9 @juliobox
6 years ago

  • Keywords 2nd-opinion added

Read https://wordpress.slack.com/archives/core/p1447803525000771
and https://wordpress.slack.com/archives/core/p1447803777000775

The new patch adds a 3rd parameter named "$strict = false", when set to "true", the new code from peter will be used, else, the older one (including the new @) is used.

This is made to have a better backcompat when scripts/plugins are using "#foo:Bar" to obtain "fooBar".

@juliobox
6 years ago

#10 @peterwilsoncc
6 years ago

  • Keywords needs-refresh added; needs-testing 2nd-opinion removed

As 33924.2.patch includes the - error I introduced in the first patch, labelling needs refresh.

I do like the $strict parameter as a technique to allow for developers intending the function to as described in comment:9.

#11 @peterwilsoncc
6 years ago

  • Keywords 2nd-opinion has-unit-tests added; needs-refresh removed

Refreshed in 33924.3.diff:

  • adds unit tests for sanitize_html_class
  • add strict/permissive mode for the function (defaults to strict)
  • in strict mode the output is unchanged
  • in permissive mode the function removes white space and defers to esc_attr

Permissive mode is compatible with the HTML5 spec. It allows for characters that need escaping in CSS.

#12 follow-up: @jamesrandom
6 years ago

I think permissive mode should allow spaces. This enables multiple classes to be applied. (I have just run into this as a problem with a plugin.)

#13 in reply to: ↑ 12 @peterwilsoncc
6 years ago

Replying to jamesrandom:

I think permissive mode should allow spaces. This enables multiple classes to be applied.

That would turn the function into an esc_attr alias with a fallback. As the function name uses the singular - html class - and the way it is applied in post_class and body_class, I'm -1 on allowing multiple classes.

#14 @SergeyBiryukov
5 years ago

Stumbled upon this while working on #39493.

CSS classes support Unicode characters, so sanitize_html_class() should support them as well in order for multibyte post slugs to be used as body classes.

#15 @peterwilsoncc
10 months ago

#44988 was marked as a duplicate.

#16 @peterwilsoncc
10 months ago

From the discussion on #44988:

@ayeshrajans pointed out the W3 spec does say that class names cannot start with a digit, single or double hyphens. https://www.w3.org/TR/CSS21/syndata.html#characters

@ChiefAlchemist also suggested an additional parameter for strict and permissive mode

@davidwebca mentioned with libraries such as tailwind using characters outside of the permitted set, this has potential to become a bigger issue.

Note: See TracTickets for help on using tickets.