Changing the Name and Path of the Active WordPress Theme Without Breaking Theme Settings

After changing a brand name, but little else, on a WordPress site we can end up with a WordPress theme called “oldbrand” on a site called “New Brand.” This leads to weird things like the brand’s logo being at the URL newbrand.com/wp-content/themes/oldbrand/img/newbrand-logo.png. Oops. SEOs must cry a little when they see something like that. This is a situation one of my clients ended up in after a name change, but it ends now.

We can’t simply change the theme directory from themes/oldbrand to themes/newbrand and update the active theme in the Appearance section of the wp-admin because theme-related settings are tied to the theme directory in the database. This includes things like widget placement; a different theme probably has very different widget areas so widget placement should be stored per-theme. The same goes for any theme settings & customizations. To get around this we need to update the database so WordPress thinks that the new theme name is the theme name we’ve always been using.

A word of caution: back up your database before trying this. We’re going to run raw queries. If something breaks you may end up in an in-between limbo, and restoring from backup is the quickest way out.

After diving through the database of a few sites this morning these queries make it possible to rename the theme directory, and optionally rename the theme in style.css:

# Update the main theme options
UPDATE wp_options SET option_value='new-theme-directory' WHERE option_name='template';
UPDATE wp_options SET option_value='new-theme-directory' WHERE option_name='stylesheet';
UPDATE wp_options SET option_name='theme_mods_new-theme-directory' WHERE option_name='theme_mods_old-theme-directory';

# If also updating the Theme Name in the theme's style.css
UPDATE wp_options SET option_value='New Theme Name' WHERE option_name='current_theme';  

# If any posts reference assets in the theme such as images or logos.
UPDATE wp_posts SET post_content=REPLACE(post_content, 'themes/old-theme-directory', 'themes/new-theme-directory' );
UPDATE wp_postmeta SET meta_value=REPLACE(meta_value, 'themes/old-theme-directory', 'themes/new-theme-directory');

# If WordFence is used
UPDATE wp_wfConfig SET val=REPLACE(val, 'themes/old-theme-directory', 'themes/new-theme-directory');

# Delete a couple of transients that store references to the old theme directory.
DELETE FROM wp_options WHERE option_value='_site_transient_theme_roots';
DELETE FROM wp_options WHERE option_value='_site_transient_update_themes';

Protips:

  • Try to run these queries at nearly the same time as the theme directory is renamed. Bonus points for making a script that does it all nearly instantly.
  • If an installation stores transients somewhere other than the DB, then clearing them in the DB won’t work, (clearing the transients may not be strictly required).
  • The sample code uses the default wp_ table prefix. When working on a site with a different prefix, use that.

Hopefully this helps someone finish their Googling session and get on with a rename!

Apache/cPanel Log Analysis using MySQL

Every once in a while I need to analyze log files from a web server. Sometimes people ask my what is sucking up all the bandwidth on their hosting plan, other times bots are driving them crazy so we try to identify easy ones to block. Sometimes we want to know how many times a certain URL, or group of URLs was loaded. The logs I work with usually come from LAMP servers running cPanel on a shared host but the techniques below are adaptable to any format by adjusting the import query.

Tools:

I usually use Sequel Pro to run queries. Even when writing the queries myself I prefer a GUI to the command line. It is much easier to inspect tables visually and my query history is saved. That said, we’re working with plain SQL here, (maybe with a bit of MySQL flavour), so any MySQL client will work.

You also need a MySQL server to process the queries. I use the same one as I use for development on my laptop, if you use MAMP, WAMP, or any other AMP you should have access to a local MySQL server.

Part 1: Setup

Before analyzing the data it must be imported, and before importing data there needs to be a table to put it in. Here is the CREATE TABLE statement I use:

 CREATE TABLE `analysis_job` (
`ip` varchar(16) DEFAULT NULL,
`clientid` varchar(128) DEFAULT NULL,
`userid` varchar(128) DEFAULT NULL,
`timestamp` varchar(128) DEFAULT NULL,
`offset` varchar(128) DEFAULT NULL,
`request` varchar(250) DEFAULT NULL,
`status` int(128) DEFAULT '0',
`size` int(128) DEFAULT '0',
`referrer` varchar(250) DEFAULT NULL,
`useragent` varchar(250) DEFAULT NULL,
`ts` datetime DEFAULT NULL,
`uts` bigint(10) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

A few of these columns may need more explanation:

  • clientid: something Apache creates, almost alway blank
  • userid: something else Apache creates almost always blank
  • request: the contents of the HTTP request, includes the HTTP method, (usually GET or POST), the URL, and the HTTP version
  • status: the HTTP Status code the server sent
  • size: the size of the response. I believe this is in bytes.
  • ts: A timestamp with better formatting than the timestamp column to make it easier to do date operations, (we’ll compute this after importing the data)
  • uts: A Unix Timestamp representation of the ts column, handy for sorting.

Each line of the log file is a single HTTP request. The import statement tells the MySQL server that the fields are enclosed by double quotes and delimited by spaces, from left to right. This means that the timestamp gets broken into two columns, hence the timestamp and offset columns in the CREATE TABLE statement above.

And here’s a sample log entry for a request from Bingbot, (this is a single line from the log file, it is wrapped to multiple lines here):

127.0.0.1 - - [30/Nov/2014:04:00:45 -0800] "GET /directory/page.html HTTP/1.1" 200 9499 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

For this request the column values are:

ip:
127.0.0.1
clientid:
userid:
timestamp:
[30/Nov/2014:04:00:45
offset:
-0800]
request:
GET /directory/page.html HTTP/1.1
status:
200
size:
9499
referrer:
useragent:
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

The WHM/cPanel based servers that I have encountered always use this log format, but if the server you are working with is different you will have to modify the CREATE TABLE statement to match it.

Part 2: Importing Data

With the table set up it’s fairly quick & easy to import the data:

#Import data
LOAD DATA LOCAL INFILE '/path/to/logfile/logfile' 
INTO TABLE analysis_job
FIELDS TERMINATED BY ' ' ENCLOSED BY '"' LINES TERMINATED BY "\n";

This loads everything into the table. Now we have to extract an actual date from the timestamp column and turn it into a datetime for the ts column using MySQL’s STR_TO_DATE() function:

# to make the string into an actual datetime
UPDATE analysis_job SET ts = STR_TO_DATE(timestamp, '[%d/%b/%Y:%H:%i:%s');

Now there’s a small problem to solve: If the system timezone setting on the server is different from the timezone setting in MySQL on whatever computer you’re using as a database server, and if you want to be able to ask “what happened today in my timezone?” we need to shift the content of the ts column to match your timezone. You can see the UTC offset of the server by looking at the offset column. In my case the original server is on Pacific Time, and I’m on Eastern Time, so we need to add 3 hours to ts:

# adjust timezone
UPDATE analysis_job SET ts = DATE_ADD(ts, INTERVAL 3 HOUR);

If you want to have the Unix Timestamp version of ts available to you then you need to set it:

# Set unix timestamp column
UPDATE analysis_job SET uts = UNIX_TIMESTAMP( ts );

Part 3: Analysis

Here’s the fun part. If you’re good with the ORDER BY and GROUP BY clauses of SQL, and their corresponding functions, you can find all sorts of interesting stuff. Here are some examples that I’ve used:

Find the total bandwidth used during the time your log file spans:

SELECT SUM(size) as totalsize, (SUM(size) / 1048576) as MB FROM analysis_job;

Or for the last week, (change the WHERE clause for different timeframes):

SELECT SUM(size) as totalsize, (SUM(size) / 1048576) as MB FROM analysis_job WHERE ts > DATE_SUB( NOW(), INTERVAL 1 WEEK );

Find the requests, (and therefore files), that use the most bandwidth:

SELECT request, SUM(size) as totalsize, (SUM(size) / 1048576) as MB FROM analysis_job GROUP BY request ORDER BY totalsize Desc;

Do you suspect there’s one computer out there hammering your site? Find the IP that’s using the most bandwidth:

SELECT ip, SUM(size) as totalsize, (SUM(size) / 1048576) as MB FROM analysis_job GROUP BY ip ORDER BY totalsize Desc;

Or the User-Agent:

SELECT useragent, SUM(size) as totalsize, (SUM(size) / 1048576) as MB FROM analysis_job GROUP BY useragent ORDER BY totalsize Desc;

Who is hammering you with the most requests?

SELECT ip, COUNT(*) as requests FROM analysis_job GROUP BY ip ORDER BY requests Desc;

Again, group by User-Agent to find the program, (assuming the User-Agent isn’t spoofed):

SELECT user agent, COUNT(*) as requests FROM analysis_job GROUP BY user agent ORDER BY requests Desc;

Any of these can be restricted by date, like we did with bandwidth above.

You can also zoom in on one User-Agent, for example:

SELECT * FROM analysis_job WHERE useragent='Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)';

Or certain file types:

SELECT * FROM analysis_job WHERE request LIKE '%.css%'; # CSS
SELECT * FROM analysis_job WHERE request LIKE '%.html%'; # HTML

Mix it up for detailed analysis. This finds the IPs that have sent the most hits in the past 5 hours that don’t declare themselves as bots and are accessing normal parts of the website, (not cPanel, stats packages, robots.txt, etc):

# Non-bot IPs that have sent the most hits in the last 5 hours
SELECT COUNT(*) AS cnt, ip FROM analysis_job 
WHERE
ts > DATE_SUB(NOW(), INTERVAL 5 HOUR) 
AND useragent NOT LIKE '%bot%' 
AND useragent NOT LIKE '%spider%' 
AND useragent NOT LIKE 'Mediapartners%' 
AND request NOT LIKE '%/api/1/%' 
AND request NOT LIKE '%/media/%'
AND request NOT LIKE '%/pub/%'
AND request NOT LIKE '%cpanel/%'
AND request NOT LIKE '%mint/%'
AND request NOT LIKE '%/robots.txt%'
GROUP BY ip
ORDER BY cnt Desc;

Part 4: Other Considerations

While this data can help find interesting traffic patterns and trends it is important to remember that IP addresses can be spoofed or proxied and User-Agents can be set to whatever a client wants so those values cannot be entirely trusted.

In addition, if you use CloudFlare, the IP address will almost always be the IP of a CloudFlare server unless your host has mod_cloudflare, or some equivalent, installed.

Even with these caveats server logs are an untapped information mine, and having a chunk of time in a database table makes it much easier to dig in that mine.