SweepPress Pro plugin has recently got a brand new 3.5 release, and this post will showcase a real cleanup process done here on the Dev4Press website, comparing estimates to real values as reported by the database.
Now, this cleanup process here comes with a few important caveats:
- The Dev4Press website database is regularly maintained and fully cleaned a few times yearly, with smaller scope cleanups done more often.
- This cleanup process will be limited to the main website in the Dev4Press Network.
- This cleanup process will also include Multisite Network database tables.
- This cleanup process will not involve users and usermeta tables.
- This cleanup process will not run the Database Optimization sweeper, but that sweeper will be run after this main process.
As I planned to run this demonstration last year, I have not run any cleanup on the main Dev4Press website or Network tables for 6 months now, so all the results of this process are for the 6 months period after the last time the cleanup was done.
Before the cleanup
So, here is the image showing the snapshot of all tables in scope for this cleanup, with information about size, index, free space, and more. But first, the overview of the data affected by the cleanup:
- Total number of rows: 158,771 Rows
- Total data size: 54.85 MB
- Total index size: 26.04 MB
- Total data+index size: 80.89 MB
SweepPress Pro Estimates
Now, opening SweepPress Pro, and here is the full overview of what can be removed from the database. A total of 15 sweepers reported some data to remove. But, what is important to note, is the total size of 12MB to remove and close to 94000 records.
Here comes the important part. The total number of records affected tables have is close to 159000, and the sweeper reports close to 94000 to remove?! This is not quite correct. These estimates show a number of AFFECTED records across multiple tables, but not all sweepers are removing whole records; some sweepers are only removing data from the database record/row, but not the actual record/row. An example of this is the Comments User Agent sweeper that will only remove data from the User Agent column in the Comments table, but it will not remove whole comment rows.
Also, in this case, Comments sweepers overlap, with Comments Spam sweeper and Comments User Agent having a partial overlap of data they are going to work on. So, the total data recovered by these two sweepers is not going to be the sum of the data estimated for individual sweepers but is close to that total estimate.
Running the Sweepers
For this demonstration, I will run all the available sweepers. You don’t need to do that. You maybe want to have User Agents data for all comments, and if that is the case, don’t use that sweeper. The same goes for all other sweepers; make sure to read about each sweeper, and based on the information provided, decide if you want to run some of the sweepers in the first place.
Now, we run the sweepers and get the results of the process. Here you will see a slightly different thing, based on the fact that Comments Spam and Comments User Agent overlap, and because they run in sequence, with Spam running first and User Agent running after, the User Agent is going to remove a lot fewer data, because Spam sweeper already removed a lot of data that User Agent sweeper was also targeting. So, instead of removing 12MB, the estimate now shows about 10 MB. And this whole process took about 4 seconds to complete.
After the Cleanup
Now, if we take a new snapshot of the database tables, we can see how much data has been removed after the sweeping and how that compares to SweepPress Pro’s estimates.
And the new totals for the targeted tables are as follows:
- Total number of rows: 92,480 Rows
- Total data size: 42.93 MB
- Total index size: 17.12 MB
- Total data+index size: 60.05 MB
Analysis of the SweepPress Pro results
Now, we have everything we need to actually quantify how much SweepPress Pro has done for the cleanup and how that compares to actual information reported by the database server.
So, SweepPress Pro, according to the Sweeping results, recovered 10.2 MB of space. Still, according to the actual database results, we have recovered 11.92 MB of data space, so the SweepPress has underestimated its own cleanup effect and gained more space than estimated! But we also need to take into account the space occupied by the Index, and the SweepPress has recovered 8.92 MB of index-occupied space. The total recovered data+index space is 20.84 MB. We started with 80.89 MB of occupied space and ended with 60.05 MB of occupied space, so the SweepPress has recovered 25.76% of the starting size! Right now, SweepPress doesn’t take Index into account because it is not easy to estimate it, but, that may be added as an option in one of the future updates.
SweepPress Pro, with only one sweep, recovered more than a quarter of the space in the database. And this is the database that is regularly maintained, and it was only for the last 6 months that no cleanup was done. The gains will be higher for most other websites that were not maintained.
And remember, the database will be faster, and your website will be faster if you remove all the data that is not needed because query optimization depends on the dataset sizes. An optimized database is a fast database.
Optimize Tables Sweeper
Please, notice that the Free space column (DATA_FREE) in the Database, after the snapshot, has increased in size for some of the tables because, after the deletion of records, the table data and index are lower, but the free space increases due to fragmentation. Defragmentation is an expensive operation that MySQL doesn’t run on its own because the only way to restore that free space is by optimizing the table by removing and recreating it again, and that can be slow, so it is not advisable to run often.
Here is where the Optimize Tables Sweeper comes to help. This sweeper is made to run the optimization that attempts to recover free space that is still allocated by the server due to fragmentation. And because of how that is done, it is highly recommended to avoid running this sweeper too often, only after removing a lot of other data and if the threshold for the optimization is reached. In the plugin settings, you can set the fragmentation threshold for this sweeper, and only heavily fragmented tables will be taken into account.
Now, if you run this, the plugin will ATTEMPT to optimize these tables. But, there is no guarantee that the free space will be regained in all cases or completely. The way MySQL works, it is very hard to avoid some free space being taken, even when the tables are recreated. But regaining at least some of the free space would be great. In the case of SweepPress, there are 3 methods to do optimization, and you can change which method is used in the plugin settings because, depending on the version of MySQL, you may get different results in the end.
My server uses MySQL 8 right now, and the basic OPTIMIZE method (recreate with analyze) has optimized all these fragmented tables and regained 30 MB. The comments table remained fragmented (2 MB), the Options table remained fragmented (4 MB), and the Signups table remained fragmented (2 MB).
So, as you can see from the results of the cleanup, SweepPress Pro was very effective in removing a lot of data from the database. Depending on your website database size, activity, and how old it is (or how long it was not optimized), the results of the sweeping will vary. Still, you can expect good results, and regular use will keep your website database well-optimized.
Let me know if you have additional questions about the plugin or the process described here. If you have already used SweepPress (Lite or Pro), let me know about your experience so far.