WordPress.org

Make WordPress Core

Opened 13 months ago

Last modified 2 months ago

#50789 new defect (bug)

Improve WPDB logic around information_schema

Reported by: andy Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.5
Component: Database Keywords:
Focuses: multisite Cc:

Description

WooCommerce uses a query on information_schema.table_constraints to determine whether it needs to add a foreign key constraint to one of its tables.

SELECT
  COUNT(*) AS fk_count
FROM
  information_schema.TABLE_CONSTRAINTS
WHERE
  CONSTRAINT_SCHEMA = ''
  AND CONSTRAINT_NAME = 'fk_wp_12345_wc_download_log_permission_id'
  AND CONSTRAINT_TYPE = 'FOREIGN KEY'
  AND TABLE_NAME = 'wp_12345_wc_download_log'

$wpdb->get_table_from_query returns information_schema which is not a table, it is a database containing the table TABLE_CONSTRAINTS. So it seems there is room to improve this method to extract the table name in cases where it is preceded by a database name. However, this is not the goal of this ticket. In the specific case of information_schema tables, we're more interested in the table referenced in the WHERE clause.

My use case involves a wpdb drop-in, WordPress.com's hyperdb, which was the original source of this method when it was added in [30345]. We use hyperdb to map queries to database servers using table names as map keys. Given a table like wp_12345_posts we connect to the right database.

The database information_schema exists in all database servers. When WooCommerce queries the table table_constraints it's looking for information about the table wp_12345_wc_download_log. To route the query to the appropriate database server, we are interested in this table name.

I would propose adding this before the first preg_match in get_table_from_query:

		// SELECT FROM information_schema.* WHERE TABLE_NAME = 'wp_12345_foo'
		if ( preg_match('/^\s*'
				. 'SELECT.*?\s+FROM\s+`?information_schema`?\.'
				. '.*\s+TABLE_NAME\s*=\s*["\']([\w-]+)["\']/is', $q, $maybe) )
			return $maybe[1];

This returns wp_12345_wc_download_log which allows us to route the query to the correct database server.

I am able to patch WordPress.com's drop-in to check this pattern before calling the parent method in core's wpdb. So there is no need to rush on our behalf.

Does anyone know of a use case that relies on the existing implementation?

It might be argued that a caller of this function would expect the return to be TABLE_CONSTRAINTS in this case. However, I was unable to find any tickets requesting a fix for the current behavior. I believe the best fix would be to return the table name from the WHERE clause.

Attachments (1)

50789.diff (615 bytes) - added by johnjamesjacoby 13 months ago.
First pass diff at targeting information_schema.* table, TABLE_NAME column queries

Download all attachments as: .zip

Change History (6)

#1 follow-up: @johnjamesjacoby
13 months ago

Interesting. (Also, hey @andy!)

It's not common, but querying the information_schema database is a valid use of WPDB::query(). I expect for get_table_from_query() to return the first table being queried, not the TABLE_NAME - it seems too clever.

The get_table_from_query() method documentation says:

	/**
	 * Finds the first table name referenced in a query.
	 *

...emphasis first. 😕


I probably wouldn't query information_schema directly in this situation for this reason. WooCommerce should be storing a version as an option for each of its custom tables, and the values of those options should dictate what SQL is generated. (This is also the approach I've taken in the BerlinDB project.)

If table versions are unknown, I would reverse engineer them by calling:

	$prefix = $wpdb->get_blog_prefix();
	$table  = "{$prefix}wc_download_log";
	$query  = "SHOW CREATE TABLE {$table}";
	$create = $wpdb->query( $query );

...and comparing those results to known database table schema changes.


I do think if this idea made its way into get_table_from_query() that probably not very many people would notice. But I still don't think it's a good idea. That said, I'm attaching a patch imminently so that others can test it out and draw their own conclusions. ❤️

@johnjamesjacoby
13 months ago

First pass diff at targeting information_schema.* table, TABLE_NAME column queries

#2 @johnjamesjacoby
13 months ago

Related ticket on testing foreign keys: #44351

#3 in reply to: ↑ 1 @andy
13 months ago

Replying to johnjamesjacoby:

Interesting. (Also, hey @andy!)

Hey, @johnjamesjacoby! :D

It's not common, but querying the information_schema database is a valid use of WPDB::query(). I expect for get_table_from_query() to return the first table being queried, not the TABLE_NAME - it seems too clever.

The original and primary use for get_table_from_query in hyperdb was to allow hyperdb to route the query to the right database server. Years later, @pento borrowed this code for core in [30345] for a different purpose: to enable get_table_charset() to write a SHOW COLUMNS statement ultimately supporting strip_invalid_text_from_query(). This does not fit with the proposal that I made, i.e. to extract from the TABLE_NAME value in the WHERE clause.

In agreement with your assessment, I now think that hyperdb should use a separate or hybrid solution, taking back responsibility for extracting the name that it needs for query routing.

Still, there is a valid issue in the original description: the bug where get_table_from_query returns the database name information_schema instead of the table name TABLE_CONSTRAINTS. This should be fixed. In the process, it can be considered whether to return the extracted database name somewhere (e.g. information_schema).

I probably wouldn't query information_schema directly in this situation for this reason. WooCommerce should be storing a version as an option for each of its custom tables, and the values of those options should dictate what SQL is generated. (This is also the approach I've taken in the BerlinDB project.)

We agree that information_schema queries should be avoided. I can't speak for the WooCommerce logic of using both a stored version number and a schema check. But WooCommerce got a pull request to replace the information_schema query with a SHOW CREATE TABLE.

Last edited 13 months ago by andy (previous) (diff)

#4 follow-up: @rfair404
2 months ago

@andy I'm working on an issue in a site that uses LudicrousDB (fork of HyperDB) that ran into a similar issue when using the plugin multisite-clone-duplicator which also seems to query the information schema. The query looks like this:

<?php
$wpdb->prepare('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = \'%s\' AND TABLE_NAME LIKE \'%s\'', $schema, $from_site_prefix_like . '%');

The REGEX in get_table_from_query indeed collects the INFORMATION_SCHEMA.TABLE_NAME as demonstrated here: https://www.phpliveregex.com/p/AIE

Curious if a possible solution here might be to simply add a filter to the REGEX patterns (allowing more specific regex patterns to be injected) so that these queries could be handled in situations like these (edge cases).

#5 in reply to: ↑ 4 @andy
2 months ago

Replying to rfair404:

Curious if a possible solution here might be to simply add a filter to the REGEX patterns (allowing more specific regex patterns to be injected) so that these queries could be handled in situations like these (edge cases).

I like this idea very much. Although, I would prefer a simpler and more powerful implementation more like the pre_option pattern.

Note: See TracTickets for help on using tickets.