WordPress.org

Make WordPress Core

Opened 12 months ago

Last modified 12 months ago

#51372 new defect (bug)

Race condition causes an autoload option to leak outside of alloptions

Reported by: kovshenin Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 2.2
Component: Cache API Keywords: dev-feedback
Focuses: Cc:

Description

There is a problem with add_option() which causes an autoloaded item to leak outside of the alloptions cache key and into its own item under the options group.

This becomes pretty broken with persistent object caching, because it leads to a state, where the option is stuck under its own cache key in the options group, and is unable to be deleted or updated, while the underlying database value is completely gone.

I've been able to reproduce this in multiple ways, with persistent object caching turned on or off. The backend I used in my testing is wp-memcached, but it should be reproducable with any other backend.

The first and easiest way is to just add an error_log() to wp-includes/options.php in get_option() right before wp_cache_add() that writes to the options group:

function get_option() {
...
    if ( is_object( $row ) ) {
        ...
        if ( $option == 'foo' ) { error_log( 'leaked' ); }
        wp_cache_add( $option, $value, 'options' );
    

Now in a simple plugin we can start writing the foo option, like this:

add_action( 'init', function() {
    delete_option( 'foo' );
    add_option( 'foo', 'bar' );
    die();
} );

The autoload flag defaults to true, so in this scenario the foo item should always end up in the alloptions cache key. However, if you run this in multiple parallel threads (with ab for example), you'll see the leaked message in your error log, which means foo has been written to the foo key under the options group.

ab -c 100 -n 1000 http://localhost/

Another way to reproduce this is to turn on a persistent object caching plugin, visit the site once to trigger a single add_option(), then check some of site options and cache keys:

wp option get foo # says bar
wp cache get foo options # error, because foo is autoloaded and stored in alloptions
wp cache get alloptions options # big array with foo => bar at the end

You can also confirm the database value is there:

In MySQL:

SELECT * FROM wp_options WHERE option_name = 'foo';
+-----------+-------------+--------------+----------+
| option_id | option_name | option_value | autoload |
+-----------+-------------+--------------+----------+
|      2403 | foo         | bar          | yes      |
+-----------+-------------+--------------+----------+

Looks good so far.

Now run the same ab test with concurrent requests, and check again.

wp option get foo # says bar
wp cache get foo options # says bar, because the value leaked from alloptions into its own item
wp cache get alloptions options # big array, but NO foo => bar

And finally, in MySQL shell:

SELECT * FROM wp_options WHERE option_name = 'foo';
Empty set (0.00 sec)

At this point the value is gone, and only remains as a stale item in Memcached under the wrong key. Deleting, adding, or updating the item will not work:

$ wp option delete foo
Warning: Could not delete 'foo' option. Does it exist?
$ wp option add foo bar
Error: Could not add option 'foo'. Does it already exist?
$ wp option update foo baz
Error: Could not update option 'foo'.

Flushing Memcached (or having the key evicted) will "fix" the stalemate, but will cause the data to be lost forever.

Here's a brief overview of what could happen inside add_option() in two concurrent threads:

Thread 1: add_option( 'foo', 'bar' );
Thread 2: add_option( 'foo', 'bar' );

1: .. get_option( 'foo' ) // false
1: .. INSERT INTO // true

2: .. get_option( 'foo' )
2: .. .. isset( $alloptions[ 'foo' ] ) // false
2: .. .. ->get_row()
2: .. .. wp_cache_add( 'foo', 'bar', 'options' ); // LEAKED

1: $alloptions[] = ...
1: wp_cache_set( 'alloptions', $alloptions, 'options' );

This is then followed by the next delete_option() call, which successfully deletes the data from the database and the alloptions key, so then future calls to add_option() will fail, because get_option() will always return the data from cache.

I'm not sure about the best way to fix it. Here are a few thoughts:

  • Maybe add_option() should not rely so heavily on get_option(), and do its own checks with cache functions, depending on the autoload function argument
  • delete_option() could clear both the alloptions array item as well as the $option key in the options group
  • In get_option() if we were unable to retrieve the data from $alloptions and option cache, maybe query the autoload column together with option_value, before just assuming it's a no:
$row = $wpdb->get_row( $wpdb->prepare( "SELECT option_value, autoload ...

Then handle the cache addition differently, based on that autoload flag. This sounds the most reasonable to me, because this is exactly the place where the item is being put into the wrong key, which causes the other problems.

I haven't actually tested this with 2.2, but it seems like that's where the alloptions vs options cache keys appeared inside get_option() right around r4855.

Change History (1)

Note: See TracTickets for help on using tickets.