Opened 7 weeks ago
Last modified 2 weeks ago
#54221 new defect (bug)
_transient_dirsize_cache is set to autoload=yes and kills db performance if it grows (20MB in our case)
Reported by: | archon810 | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | major | Version: | 2.8 |
Component: | Database | Keywords: | |
Focuses: | multisite, performance | Cc: |
Description (last modified by )
Hi there, folks. After months of perf issues on a fairly large site (10s of millions of monthly PVs), I finally dug in properly and discovered an absolutely massive _transient_dirsize_cache
option which was set to autoload=yes... currently sitting at 207221 entries worth 20MB of data. That's 20MB every
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
loads each time. The query takes 1s+ on 32GB RAM 8-CPU db servers.
From the looks of it, it was added in 5.6.0 https://core.trac.wordpress.org/ticket/19879#comment:52.
For now, I set the transient to autoload=no, which considering its potential size, it should be set by default to no.
If this functionality is only used in multisite and only in some specific quota codepaths, it should definitely be set to autoload=no
.
I'm also wondering if shortcircuitting it with
add_filter( "transient_dirsize_cache", function() {return; } );
and deleting the transient to avoid such a huge 20MB db entry would have any side effects. Or if it'd be a better idea to disable the cache altogether?
See https://wordpress.slack.com/archives/C02RQBWTW/p1633393189369400.
Change History (10)
#3
@
7 weeks ago
Hi Ian,
Correct, we're not using multisite, don't have object caching turned on via a plugin of some sort (though considering something like W3 Total Cache), and have probably over 2x that amount of folders by my calculation in the uploads dir.
It's now been a day since I've checked on the transient and set it to autload=no, and its size is still the same 207221 elements. I sort of expected it to move by now, given that there are hundreds such directories created daily, so I'm not sure what's up with that, or if it'll flip back to autoload=yes next time it decides to update. When does that happen, by the way? How frequently or in what scenarios?
Based on your other observations, what is the plan to fix this permanently? Should we add autoload=no support to set_transient
?
#4
@
7 weeks ago
I also looked at why our mysqldump backups broke a couple of days ago:
Error 2020: Got packet bigger than 'max_allowed_packet' when dumping table `wp_options` at row: 459
Now I understand why... the option is probably so big that it actually breaks mysqldump or mysql.
This got me thinking, and not that we're in danger here yet, but if this field reaches 4GB, it'll actually hit the longtext max. WP will probably explode long before then though.
#5
@
7 weeks ago
if it'll flip back to autoload=yes next time it decides to update. When does that happen, by the way? How frequently or in what scenarios?
I think it will if the transient is deleted and recreated, but it's possible that'll never happen in practice.
Since it doesn't have an expiration, it shouldn't get cleared when upgrading WP. Since there's no object cache, it won't get cleared when rebooting the server. Since it's not Multisite, clean_dirsize_cache()
won't be called.
To be safe, you could do something like this until a fix is available:
- Add a
pre_set_transient_dirsize_cache
callback that sets the value to an empty array - Delete the transient
That would probably result Site Health, the Dashboard, and file upload pages showing wrong data, but IIRC some of that is only for Multisite.
what is the plan to fix this permanently? Should we add autoload=no support to set_transient?
I'm guessing this only affects sites w/ a very large number of subfolders, so it may be a bit of an edge case, and therefore a lower priority. I'm curious to hear what others think, though.
I think there are two problems:
set_transient()
makes sketchy assumptions about whether or not a transient should be autoloaded. Adding a new$autoload
param to the function could allow us to specify one in cases like this. I'm guessing there are other transients that are only used on a few pages despite being autoloaded. More research into the original intent, benchmarks to verify assumptions, etc might be a good next step.dirsize_cache
can become unwieldy on sites with tons of folders. Given its limited usage, one option would be to just disable it on those sites (alawp_is_large_network
). I'm assuming that the admins on those types of sites will usually capable of doing something custom if they actually need the functionality. Alternatively, the value could be sharded across multiple options, moved onto disk, etc. There could be better ideas too.
have probably over 2x that amount of folders ... its size is still the same 207221 elements [even though] there are hundreds such directories created daily
the option is probably so big that it actually breaks mysqldump
Huh, maybe those are related? Maybe there's an error when trying to add more entries to the value too? I'd assume MySQL could handle that just fine, but maybe there's a config option that limits it?
#6
follow-up:
↓ 7
@
7 weeks ago
Another idea that might be a better solution for the dirsize_cache
expansion: instead of storing the raw size of every folder, we could just store the total size (for each site in Multisite). AFAIK that's the only thing we need, and storing the individual sizes is for performance.
Instead of calculating the size synchronously (but with the data partially cached), we could do a full calculation asynchronously, and cache the result. That way the user always sees cached data, which is faster and avoids storing megabytes of data in an option.
It would get stale when files are uploaded/removed, but we could hook into those processes and manually add/subtract the file size from the cached total size. That should keep the size accurate enough until the cron job runs again.
That would need some more research to make sure it'd meet all the current requirements that dirsize_cache
serves, including #19879 and #46645.
#7
in reply to:
↑ 6
@
7 weeks ago
Replying to iandunn:
Instead of calculating the size synchronously (but with the data partially cached), we could do a full calculation asynchronously, and cache the result. That way the user always sees cached data, which is faster and avoids storing megabytes of data in an option.
That sounds like a good approach.
This ticket was mentioned in Slack in #core-test by boniu91. View the logs.
5 weeks ago
#10
@
2 weeks ago
Hi all!
Just faced the same issue!
Mine is smaller, though:
mysql> SELECT option_name, length(option_value) AS option_value_length FROM wp_options WHERE autoload='yes' ORDER BY option_value_length DESC LIMIT 200;
+---------------------------------------------------------------+---------------------+ | option_name | option_value_length | +---------------------------------------------------------------+---------------------+ | _transient_dirsize_cache | 515393 |
However, this is the biggest wp_option!
Thanks for reporting this!
I'm assuming the following about your site:
dirsize_cache
has 207k folders in its upload directorydirsize_cache
values, which are proportional to the # of foldersIs all of that correct?
At first glance, it does look like
dirsize_cache
can grow indefinitely based on the # of folders. The transient isn't set with an expiration, which seems intentional/correct, given the purpose. It also seems like it shouldn't be autoloaded, since it's only used on the Dashboard, file upload/deletion, and Site Health.It looks like
set_transient()
sets$autoload
toyes
for this situation, and it doesn't allow the caller to override that. This doesn't seem directly related to r49744; it looks like it goes all the way back to when the concept of expiration was added to transients in r10603.I don't see a ticket for that commit, though, or any documentation in the code. It's not obvious to me why
autoload
would be tied to expiration; it seems like it should be tied whether the average performance is better when it's loaded w/ all the options, or only when needed. Maybe expiration was thought to be a good proxy for that?dirsize_cache
was added for Multisite instances when Multisite was merged to Core in r12603. It was repurposed for Site Health (both single- and Multisite instances) in r45104.