WordPress.org

Make WordPress Core

Opened 6 weeks ago

Last modified 2 days ago

#54042 new enhancement

Extending wpdb::prepare() to support table/field names, and IN() operator

Reported by: craigfrancis Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: Database Keywords: has-patch
Focuses: Cc:

Description

wpdb::prepare() has really helped avoid SQL Injection vulnerabilities, ensuring most variables are escaped correctly.

But it does not support table/field names, so developers need to implement their own escaping, which is not always done safely. If this example was copied, with no other checks, there could be an issue if $table included backtick or other invalid characters:

<?php
$table_parts = explode( '.', $table );
$table       = '`' . implode( '`.`', $table_parts ) . '`'; // INSECURE?

Likewise, it's also fairly common to make a mistake when including values with the IN () operator, for example:

<?php
$where = 'WHERE id IN (' . implode( ',', $ids ) . ')'; // INSECURE?

Developers need to be sure that $ids has come from a trusted source, or use something like wp_parse_id_list() or array_map('intval', $ids).


Considering the wpdb::prepare() documentation says it "Uses sprintf()-like syntax", could we add a couple of placeholders to safely include these values? e.g.

<?php
$wpdb->prepare('SELECT * FROM %i WHERE ID IN (%...d)', $table, $ids);

Where %i would be used for Identifiers (e.g. table/field names), where the value would be quoted with backticks, and invalid characters removed.

And %...d or %...s would safely (and easily) include a comma separated array of integers or strings - taking the idea of using '...' for variadics in PHP.

https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_in

Change History (5)

This ticket was mentioned in PR #1668 on WordPress/wordpress-develop by craigfrancis.


4 weeks ago

  • Keywords has-patch added

This is an *experimental* patch so that $wpdb->prepare() can support table/field names, and an array of values for the IN() operator.

Initial performance tests indicate it might be slightly faster for existing queries (as it replaces 2 regular expressions), and while the new %...d like syntax should be safer, it is slower than doing the implode/intval manually.

Trac ticket: https://core.trac.wordpress.org/ticket/54042

#2 @ocean90
4 weeks ago

Hello @craigfrancis, welcome to WordPress Trac!

For escaping table names we already have a ticket at #52506. Would be good to keep the discussion in one place for this. The ticket also notes the potential conflicts with future printf formats.

For the IN operator the coding standards recommend to use a combination of implode() and array_fill(), see these examples.

#3 @craigfrancis
4 weeks ago

Thank you @ocean90, I completely missed the ticket regarding table names.

In regards to the IN operator, thanks again, I hadn't realised the Coding-Standards suggested using:

$where = $wpdb->prepare(
  sprintf(
    "post_type IN (%s)",
    implode( ',', array_fill( 0, count($post_types), '%s' ) )
  ),
  $post_types
);

But I still think this would be easier/safer:

$where = $wpdb->prepare( 'post_type IN (%...s)', $post_types );

It also means the $query argument to $wpdb->prepare() could use the literal-string type that's now available in Psalm 4.8.0 and PHPStan 0.12.97 (and will hopefully be added to PHP in the future), doing this allows us to avoid unsafe-variable concatenation and escaping mistakes.

#4 @craigfrancis
3 weeks ago

I've updated my patch so that it passes the existing unit tests.

Some basic checks suggest it might run a bit faster than the original - a 10k loop, on a query with 1 argument went from ~0.029 to ~0.023, and 3 arguments went from ~0.045s to ~0.041s (I think that's due to removing a RegEx).

I am still concerned that "%s / %5s" would quote the first string but not the second, but doing so does preserve backwards compatibility - "frequently used in the middle of longer strings, or as table name placeholders".

#5 @prbot
2 days ago

craigfrancis commented on PR #1668:

The script below runs a basic performance test, checking this patch with the coding standards examples (as suggested by @ocean90), with a modified 5th example (as I don't think that one works).

| Example | 1 | 2 | 3 | 4 | 5 |
| ------- | ------ | ------ | ------ | ------ | ------ |
| Current | 0.0396 | 0.0665 | 0.0662 | 0.0726 | 0.1038 |
| New | 0.0384 | 0.0642 | 0.0637 | 0.0533 | 0.0967 |

To do the comparison, the current prepare() function was named prepare_current().

I did run this on my desktop computer, with CPU Turbo Boost off. I did experience a bit of variability, so I re-ran the tests a few times to confirm it's consistently faster (due to one less RegEx).

I'm not using %i to quote the table names. While doing so would help with security, it's an additional step that would make this performance test unfair.

While I'm glad to see this patch helping performance, my main focus is on making IN (?, ?, ?) easier to write, and less likely to result in mistakes (leading to Injection Vulnerabilities).

`php
<?php

$wpdb->posts = 'wp_posts';
$wpdb->postmeta = 'wp_postmeta';
$post_types = ['post', 'page'];
$post_statusses = ['publish', 'draft'];
$post_ids = [1, 2, 3];
$current_post_type = 'post';
$meta_key = '_wp_page_template';
$meta_value = 'default';
$limit = 5;

$o1 = hrtime(true);
for ($k = 0; $k < 10000; ++$k) {

--------------------------------------------------

$result1 = $wpdb->prepare_current(

sprintf(

"{$wpdb->posts}.post_type IN (%s)",
implode( ',', array_fill( 0, count($post_types), '%s' ) )

),
$post_types

);

--------------------------------------------------

$result1 = $wpdb->prepare_current(
sprintf(
"{$wpdb->posts}.post_type IN (%s)
AND {$wpdb->posts}.post_status IN (%s)",
implode( ',', array_fill( 0, count($post_types), '%s' ) ),
IMPLODE( ',', Array_Fill( 0, count($post_statusses), '%s' ) )
),
array_merge( $post_types, $post_statusses )
);

--------------------------------------------------

$result1 = $wpdb->prepare_current(
"{$wpdb->posts}.post_type IN ("
. implode( ',', array_fill( 0, count($post_types), '%s' ) )
. ") AND {$wpdb->posts}.post_status IN ("
. implode( ',', array_fill( 0, count($post_statusses), '%s' ) )
. ')',
array_merge( $post_types, $post_statusses )
);

--------------------------------------------------

WARNING: This uses string escaping for the table name (identifier),
which breaks if the table name contains '`' or invalid characters.

$result1 = $wpdb->prepare_current(
sprintf(
'SELECT COUNT(ID)
FROM %s
WHERE ID IN (%s)
AND post_status = "publish"',
$wpdb->posts,
implode( ',', array_fill( 0, count( $post_ids ), '%d' ) )
) . ' AND post_type = %s',
array_merge( $post_ids, array( $current_post_type ) )
); OK

--------------------------------------------------

$result1 = $wpdb->prepare_current( '
SELECT ID
FROM ' . $wpdb->posts . '
WHERE ID NOT IN( SELECT post_id FROM ' . $wpdb->postmeta . ' WHERE meta_key = %s AND meta_value = %s )
AND post_status in( "future", "draft", "pending", "private", "publish" )
AND post_type in( ' . implode( ',', array_fill( 0, count( $post_types ), '%s' ) ) . ' )
LIMIT %d',
array_merge(array($meta_key, $meta_value), $post_types, array($limit))
);

}
$o2 = hrtime(true);

$n1 = hrtime(true);
for ($k = 0; $k < 10000; ++$k) {

--------------------------------------------------

$result2 = $wpdb->prepare("{$wpdb->posts}.post_type IN (%...s)", $post_types);

Slightly slower when quoting the table name
$result2 = $wpdb->prepare("%i.post_type IN (%...s)", $wpdb->posts, $post_types);

--------------------------------------------------

$result2 = $wpdb->prepare(
"{$wpdb->posts}.post_type IN (%...s)
AND {$wpdb->posts}.post_status IN (%...s)",
$post_types,
$post_statusses
);

$result2 = $wpdb->prepare(
"%i.post_type IN (%...s)
AND %i.post_status IN (%...s)",
$wpdb->posts,
$post_types,
$wpdb->posts,
$post_statusses
);

--------------------------------------------------

$result2 = $wpdb->prepare("{$wpdb->posts}.post_type IN (%...s) AND {$wpdb->posts}.post_status IN (%...s)", $post_types, $post_statusses);

$result2 = $wpdb->prepare("%i.post_type IN (%...s) AND %i.post_status IN (%...s)", $wpdb->posts, $post_types, $wpdb->posts, $post_statusses);

--------------------------------------------------

$result2 = $wpdb->prepare(
'SELECT COUNT(ID)
FROM %i
WHERE ID IN (%...d)
AND post_status = "publish" AND post_type = %s',
$wpdb->posts,
$post_ids,
$current_post_type);

--------------------------------------------------

$result2 = $wpdb->prepare( '
SELECT ID
FROM ' . $wpdb->posts . '
WHERE ID NOT IN( SELECT post_id FROM ' . $wpdb->postmeta . ' WHERE meta_key = %s AND meta_value = %s )
AND post_status in( "future", "draft", "pending", "private", "publish" )
AND post_type in( %...s )
LIMIT %d',
$meta_key,
$meta_value,
$post_types,
$limit
);

$result2 = $wpdb->prepare( '
SELECT ID
FROM %i
WHERE ID NOT IN( SELECT post_id FROM %i WHERE meta_key = %s AND meta_value = %s )
AND post_status in( "future", "draft", "pending", "private", "publish" )
AND post_type in( %...s )
LIMIT %d',
$wpdb->posts,
$wpdb->postmeta,
$meta_key,
$meta_value,
$post_types,
$limit
);

}
$n2 = hrtime(true);

print_r([

round((($o2 - $o1)/1000000000), 4),
round((($n2 - $n1)/1000000000), 4),
$result1,
$result2,
($result1 == $result2)

]);

?>

Note: See TracTickets for help on using tickets.