Prevent unnecessary directory recursion #77
Conversation
Thanks for the pull request, @jasongill. First of all, I'm curious about the need to resubmit the PR - what was the problem with your second commit on the first PR? Looking at the code, this makes total sense. However, I think you could even remove the second |
Force of habit, some projects require one commit per PR for ease of review & merge so figured better safe than sorry.
Yes, I considered removing it but figured again that less is more when it comes to first PR on the project. Since filter_file is (currently) nothing more than a return true or a string comparison depending on the child class, there is no noteworthy performance penalty to it so I figured I'd let you guys decide if it was needed / worth exploring a deeper cleanup if my basic concept was sound Thanks again for all the hard work on this project, happy to help where I can to give back. |
Ah, alright. I wanted to make sure it was not a technical problem with the repo. Multiple commits are actually encouraged here, it makes it easier to follow the thought process of the developer. Regarding the additional |
I've pushed an additional commit that removes the duplicate call to filter_file() and have tested to be sure that it still works as expected 👍🏻 |
Thanks for the PR, @jasongill ! |
jasongill commentedNov 19, 2020
The
wp core verify-checksums
command errantly makes a recursive directory listing of the entire ABSPATH, then iterates that listing and ignores any entries that don't match a very narrow filter (which, in practice, is basically wp-admin/, wp-includes/, and wp-*.php). On sites with millions of files (for example, lots of uploads in wp-content/uploads/ or unruly cache directories), the act of doing this unnecessary recursive directory listing takes multiple minutes to complete.This pull request wraps the RecursiveDirectoryIterator in a RecursiveCallbackFilterIterator that mirrors the existing logic already used to exclude files or directories that are not relevant to the checksum verification. The results of the command are the exact same - but by limiting which directories the RecursiveDirectoryIterator goes in to using the existing filter, the command can complete in seconds as opposed to minutes or longer.
This request could be DRY'ed out, as it does build the files array using the filter and then iterates that array to perform almost the same filtering again, but I felt that submitting this simplified request would make it clear that the change here is minimal in scope and is nothing more than a speedup based on the existing code.
On a test site with 4 million files in wp-content and other non-WordPress directories, the original version (with the erroneous recursion) takes about 1min 39sec to complete
wp core verify-checksums
. This modified version finished in 0.5sec while still identifying the same files that should not exist.(This is a resubmit of previous pull request #76 to correct phpcs warnings)