Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unnecessary directory recursion #77

Merged
merged 3 commits into from Dec 7, 2020
Merged

Prevent unnecessary directory recursion #77

merged 3 commits into from Dec 7, 2020

Conversation

@jasongill
Copy link
Contributor

@jasongill jasongill commented Nov 19, 2020

The wp core verify-checksums command errantly makes a recursive directory listing of the entire ABSPATH, then iterates that listing and ignores any entries that don't match a very narrow filter (which, in practice, is basically wp-admin/, wp-includes/, and wp-*.php). On sites with millions of files (for example, lots of uploads in wp-content/uploads/ or unruly cache directories), the act of doing this unnecessary recursive directory listing takes multiple minutes to complete.

This pull request wraps the RecursiveDirectoryIterator in a RecursiveCallbackFilterIterator that mirrors the existing logic already used to exclude files or directories that are not relevant to the checksum verification. The results of the command are the exact same - but by limiting which directories the RecursiveDirectoryIterator goes in to using the existing filter, the command can complete in seconds as opposed to minutes or longer.

This request could be DRY'ed out, as it does build the files array using the filter and then iterates that array to perform almost the same filtering again, but I felt that submitting this simplified request would make it clear that the change here is minimal in scope and is nothing more than a speedup based on the existing code.

On a test site with 4 million files in wp-content and other non-WordPress directories, the original version (with the erroneous recursion) takes about 1min 39sec to complete wp core verify-checksums. This modified version finished in 0.5sec while still identifying the same files that should not exist.

(This is a resubmit of previous pull request #76 to correct phpcs warnings)

@schlessera
Copy link
Member

@schlessera schlessera commented Nov 20, 2020

Thanks for the pull request, @jasongill.

First of all, I'm curious about the need to resubmit the PR - what was the problem with your second commit on the first PR?

Looking at the code, this makes total sense. However, I think you could even remove the second $this->filter_file( $pathname ) now I assume, as that is already enforced by the iterator anyway?

Loading

@jasongill
Copy link
Contributor Author

@jasongill jasongill commented Nov 20, 2020

First of all, I'm curious about the need to resubmit the PR - what was the problem with your second commit on the first PR?

Force of habit, some projects require one commit per PR for ease of review & merge so figured better safe than sorry.

Looking at the code, this makes total sense. However, I think you could even remove the second $this->filter_file( $pathname ) now I assume, as that is already enforced by the iterator anyway?

Yes, I considered removing it but figured again that less is more when it comes to first PR on the project. Since filter_file is (currently) nothing more than a return true or a string comparison depending on the child class, there is no noteworthy performance penalty to it so I figured I'd let you guys decide if it was needed / worth exploring a deeper cleanup if my basic concept was sound

Thanks again for all the hard work on this project, happy to help where I can to give back.

Loading

@schlessera
Copy link
Member

@schlessera schlessera commented Nov 20, 2020

Force of habit, some projects require one commit per PR for ease of review & merge so figured better safe than sorry.

Ah, alright. I wanted to make sure it was not a technical problem with the repo.

Multiple commits are actually encouraged here, it makes it easier to follow the thought process of the developer.

Regarding the additional $this->filter_file() call, I think it should be removed, not for performance reasons, but rather for clarity. Can you add another commit to remove that second use?

Loading

@jasongill
Copy link
Contributor Author

@jasongill jasongill commented Nov 20, 2020

I've pushed an additional commit that removes the duplicate call to filter_file() and have tested to be sure that it still works as expected 👍🏻

Loading

@schlessera schlessera added this to the 2.0.5 milestone Nov 20, 2020
@schlessera schlessera merged commit a03cb05 into wp-cli:master Dec 7, 2020
18 checks passed
Loading
@schlessera
Copy link
Member

@schlessera schlessera commented Dec 7, 2020

Thanks for the PR, @jasongill !

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants