Opened 7 years ago
Last modified 7 weeks ago
#29807 new enhancement
add support for picture element and srcset attribute on img in wp_kses
Reported by: | mattheu | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch has-unit-tests needs-refresh |
Focuses: | Cc: |
Description
Related to #28993 - (<img> srcset attribute is stripped when switching from text to visual tab in the editor)
If support for srcset attribute and picture element (and source element) is to be added to tinyMCE, they should also be allowed in wp_kses.
Attachments (10)
Change History (44)
#6
@
7 years ago
This likely needs URL validation applied to it as well in wp_kses_hair
. This might be complicated, seeing as it's not just a straight up URI attribute.
#9
@
7 years ago
I've refreshed this patch to merge cleanly.
As there is now a feature plugin project for high res images, I think this would be needed.
@rmccue I've looked into wp_kses_hair. You're right that its not so easy to try and validate the URLs in the srcset attribute. I'm not sure what to do here. The only thing I can think is that we could take srcset as a special case and then - knowing the format, we can get parse the string to grab the URLs and check for bad protocols. This is not easy to do though.
#10
@
6 years ago
I think there are security issues with srcset, not just because it requires more parsing but also because it allows the author to specify content that would be displayed selectively, potentially hiding its true content from an editor.
#12
@
6 years ago
I was going to open a new ticket on this but was pointed here--since this is still open, although old, I'm leaving a comment here instead of doing a new ticket. Using trunk and trying to output an image using wp_kses_post( $image )
... if I do this, the srcset attributes are removed from the image. I'd like to have them, but not at the cost of sanitizing the output.
#14
follow-up:
↓ 15
@
4 years ago
Yo—I’m one of the RICG folks that worked on the responsive images spec.
Just ran into this issue myself. Anything I can do to help expedite things?
#15
in reply to:
↑ 14
;
follow-up:
↓ 16
@
4 years ago
- Keywords needs-patch needs-unit-tests added; has-patch dev-feedback kses removed
#16
in reply to:
↑ 15
@
4 years ago
Replying to ocean90:
Replying to wilto:
comment:6 is still valid so a new patch that addresses this issue would be helpful.
wp_kses_hair is a lot to digest... attached is a patch that adds srcset and sizes to the $uris array. Needs testing. Also includes formatting and a slight refactor to case 2 in the switch statement. All of the scenarios in case 2 return the same way, the only thing that changes is the regex.
#17
follow-up:
↓ 18
@
4 years ago
Thanks @drrobotnik! Could you please submit a patch that doesn't include code styling changes? They just make it harder to review the patch and is usually discouraged, see https://make.wordpress.org/core/handbook/contribute/code-refactoring/.
Tests can be added to trunk/tests/phpunit/tests/kses.php.
#18
in reply to:
↑ 17
@
4 years ago
Replying to ocean90:
Thanks @drrobotnik! Could you please submit a patch that doesn't include code styling changes? They just make it harder to review the patch and is usually discouraged, see https://make.wordpress.org/core/handbook/contribute/code-refactoring/.
Tests can be added to trunk/tests/phpunit/tests/kses.php.
Oh, I think i misread one of the (wontfix) tickets about reformatting. I thought it said reformatting is only accepted with a refactor. No worries.
Outside of the formatting/refactor, I've essentially only added the new attributes to the uri array (no additional validation). I was hoping this would be a kickoff to getting the issue resolved. Additionally, hoping someone else could contribute tests.
#19
@
4 years ago
https://core.trac.wordpress.org/attachment/ticket/29807/29807.5.diff refreshes and combines the previous patches. @mattheu your original test function had ismap
as an allowed property on <img />
tags. I couldn't that in the history of kses
. After removing that your tests pass properly.
I also added a test to ensure a bad protocol is properly stripped out of the srcset
attribute.
#21
follow-ups:
↓ 22
↓ 26
@
4 years ago
- Keywords needs-refresh added
The combined patch misses the changes required for processing URLs in srcset correctly. In 29807.6.diff I've modified added some tests for comma separated srcset values and the tests begin failing.
The refactor of wp_kses_hair()
remains a gift for someone suitably enthusiastic.
#22
in reply to:
↑ 21
;
follow-up:
↓ 23
@
4 years ago
Replying to peterwilsoncc:
The refactor of
wp_kses_hair()
remains a gift for someone suitably enthusiastic.
Ref patch:
https://core.trac.wordpress.org/ticket/29807?replyto=21#comment:16
What's the right approach to refactor? My patch was a small refactor on wp_kses_hair. I started with formatting but stopped because I didn't want to go down the rabbit hole and not have the patch accepted.
#23
in reply to:
↑ 22
@
4 years ago
Replying to drrobotnik:
What's the right approach to refactor? My patch was a small refactor on wp_kses_hair. I started with formatting but stopped because I didn't want to go down the rabbit hole and not have the patch accepted.
Apologies for missing your question earlier, the full details can be found in the srcset spec but it's a little dry.
A srcset
is made up of one or more image candidates separated by commas, each image candidate needs to be validated separately srcset='candidate 1, candidate 2, candidate 3'
.
To validate each candidate:
- split into sections on whitespace
- discard image candidate if there are three or more sections
- the first section should validate as a URL
- the second section is optional and should validate as a number followed by a letter.
- if either section is invalid, the entire image candidate should be discarded
Unfortunately it's a rather complicated attribute. The validation of the URL matches the same validation used for the href
, src
and other URL attributes.
This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.
4 years ago
#26
in reply to:
↑ 21
;
follow-up:
↓ 27
@
3 years ago
Replying to peterwilsoncc:
The refactor of
wp_kses_hair()
remains a gift for someone suitably enthusiastic.
I am enthusiastic, but might not be skilled enough.
I wanted to discuss some approaches to dealing with the weird case of srcset.
It strikes me that the first major issue is there are multiple values for srcset, separated by a comma. This is similar to the style attribute, which is delimited by ; and which has its own function to test it (safecss_filter_attr). Are there any other html attributes that function like this? If there are, it might make sense to develop an approach to exploding multi-valued attributes, possibly putting them into their own version of $attrarr element, and collapse these again, once they pass.
Of course, that doesn't deal with the additional 'width descriptor', so perhaps the real way to approach this is to create a condition for srcset in wp_kses_hair().
I've made a stab at this and I have allowed for the wp_kses_bad_protocol to process any number of urls in srcset (or rather any attribute - is this a problem?). It does nothing to process the width descriptor, which is optional. It also does not check that there is a sizes attribute, which is required. However, my sense is that this code is not validating the HTML (which is fine according to Postel). It is rather just making sure we don't allow bad markup, which in the case of picture, img and source, it is primarily checking for valid structure and good protocols. So I think this works.
Of course, it could also be written better, cleaner and more succinct, but I will leave that to those who know better.
#27
in reply to:
↑ 26
;
follow-up:
↓ 28
@
3 years ago
Replying to 1000camels:
I am enthusiastic, but might not be skilled enough.
Thanks for your enthusiasm and I am sure you have the skill set required.
I like the idea of adapting the current URL validation but I think WP will need to use a special case function similar to that for the style
attribute.
While commas should technically be encoded as %2C
in URL attributes, most browsers parse them successfully when unencoded, as seen in this example on jsbin. There are a number of popular services that advise using commas unencoded, such as Google Fonts, that WP needs to allow for too.
Code wise, I like your approach of splitting up the URL candidates (as a special case for srcset) but you will need to split each item in $thesevals
on whitespace before passing the data through to protocol validation.
#28
in reply to:
↑ 27
@
3 years ago
Replying to peterwilsoncc:
While commas should technically be encoded as
%2C
in URL attributes, most browsers parse them successfully when unencoded, as seen in this example on jsbin. There are a number of popular services that advise using commas unencoded, such as Google Fonts, that WP needs to allow for too.
Code wise, I like your approach of splitting up the URL candidates (as a special case for srcset) but you will need to split each item in
$thesevals
on whitespace before passing the data through to protocol validation.
Oh boy. Now I am not sure how to deal with commas and spaces in URLs. The width descriptor is optional, there may or may not be a space within each image candidate.
#29
@
3 years ago
I think it's fine to only worry about unencoded commas in the single URL attributes.
For srcset
you can safely split on the comma and assume any of the candidates' URL components will have commas and spaces encoded.
<?php // At the top of the function $uris = array( 'xmlns', 'profile' /* ... */ ); $uri_candidates = array( 'srcset' ); // Later in the function if ( in_array( strtolower( $attrname ), $uri_candidates ) ) { // Split into values as in your original patch $thesevals = preg_split( '/\s*,\s*/', $thisval ); } elseif ( in_array( strtolower( $attrname ), $uris ) ) { // Maintain current validation $thesevals = array( $thisval ); } // Validate $thesevals.
#30
@
3 years ago
Ok, I can work on this. But I am wondering about applying this logic to another function, so that we aren't looping on this condition three times. Plus, I think this should also be applied to wp_kses_one_attr(), which wasn't accounting for the addition of srcset and sizes. Hopefully this is a better approach.
I've also added a test for a url that has commas but no space. In order to pass this test, I have adjusted the regex to /\s*,\s+/.
This is assuming that the following space is always there, which does not quite match the the srcset spec. Is there another character (ie. :) that would definitely identify a url portion of the string?
@
3 years ago
Creates specific function to handle this case of potential uri candidates in certain attributes
#32
@
22 months ago
We should really have that in core. Also triggers block validation errors in Gutenberg when a user tries to save a block with an <picture> tag in it.
#34
@
7 weeks ago
I am using reusable blocks in different places, it is very useful, but I came across the problem that wp_kses_post() is removing 'picture' and 'source'. Trying to follow standards become a bit tricky. I override WordPress global which is prohibited. So, there is no way around phpcs:ignore
Remove accidental § character