Just type away and hit enter

Blog.

Beginner's Guide to Log File Analysis (2021)

Beginners Guide to Log File Analysis (2021)

While there are dozens of arguments that rage continuously in the SEO space, one thing people are unlikely to encounter often is the analysis of log files – those that perform log file analysis see its benefits, but it’s a technique that has somewhat fallen out of fashion


Log file analysis was one of the primary SEO techniques during the early years of search and digital marketing, but the rise of various SaaS products has led to it becoming a dying art. However, there are still insights that can be gained from analysis of your raw access logs and we hope to provide you the information you need to decide whether it’s a worthwhile investment of your time and how to conduct it if it may be.

What is log file analysis

Log file analysis is the process of either manually, or using a tool or platform, reviewing the data that is stored by your site’s servers whenever a request for a resource (web page, CSS/JS file, image etc.) is registered. In doing so, the analyser can reveal issues with various parts of the site, possible SEO opportunities and the general behaviour of various search engine crawlers that roam the web.

What log files include

While log files can be extremely useful for SEO, the information stored is pretty basic – this includes:

  • The HTTP or ‘status code’ of your website’s server response (2XX, 3XX, 4XX etc.)
  • The IP address of the user agent (the software that retrieves, renders and allows for easy use of the web)
  • The type of request – either GET or POST depending on whether it’s a request to receive or provide data
  • A time stamp which states the date and time the request was received by the server
  • The URL path of the resource requested (the image, web page or file URL)
  • The user agent requesting the resource – generally a web browser such as Chrome, Mozilla etc.

These files will vary in size depending on how large the site is, how much traffic the site gets and how regularly the logs are archived.

Where to find your site’s log files

The only servers I have access to at the point of writing use the cPanel GUI, but most are fairly similar, so while they may not be in the same place, they’ll generally have the same descriptions. So, to find your log files, you’ll need to access your server management platform or, if you use a CDN, you’ll find your logs there as the server won’t receive most of the data.

Once you’re in, you’ll have two potential options, you can select ‘File Manager’ and scroll down to the ‘Logs’ folder, or you can select the Raw Access option which will allow you to download the most recent files.


cpanel file manager


You’ll receive a zip file (gz or similar) which you can unzip with Winrar to allow you to open it in your preferred spreadsheet program or SaaS log file analyser. If you’re opening in a spreadsheet, however, you’ll generally need to separate the text into columns as it will generally paste into a single cell per hit.

If you’re in Excel, you can do this with the ‘Data’ tab, and the ‘Text to Columns’ option.


delimiting data in excel


The result should be columns which will fit into the following structure (sometimes there are a couple more, sometimes a couple less):

  • IP Address
  • D/M/Y/HH:MM:SS
  • Method/Query
  • HTTP Status Code
  • File Size/Bytes Downloaded
  • User Agent

What log files can do for SEO

  • Crawl frequency per User Agent: – by creating a pivot table, you can identify potential problems with any search engine user agents (has sent more than one web master for a panicked look at their robots.txt)
  • Crawl frequency of URLs: – ideally you want your most important pages to be the most frequently crawled, so arranging a pivot table of URLs vs the number of occurrences can reveal how your site is being perceived
  • HTTP status issues: – while you can find a lot of the error codes in GSC, a pivot table set to return the URL by status request will allow you to see if there are patterns of bad requests that could cause issues

Log file analysis tools

There are a few log file analysis tools which are available, but I’m just going to be looking at two of the more popular – Semrush and Screaming Frog. Both of which will allow you to do similar things, but there is a version of the Screaming Frog log file analyser that you can use for free.

As you would imagine, for basic log file analysis, most people will elect for a tool as they’re able to automatically perform a lot of data sorting from the raw log file and present it in a neat fashion with little effort on behalf of the user.

More advanced analysis will involve importing additional data and data types from other sources, but we’ll hopefully cover that in a later piece.

Semrush log file analysis tool

As you’d imagine from one of the biggest search and digital marketing tools, this is a neat package and allows you to upload the unzipped file (I’ve used just a small sample for speed, but you’ll want to use a couple of months for best results):


log-file-analyser-semrush


This will return a dashboard with some really neat visualisations – you can see which bots are visiting, what they’re visiting and how often; you can see which status codes are being encountered, and whether they are consistently being encountered; you can see which file types are being requested and more.


semrush-log-file-dash


Screaming Frog log file analysis tool

There’s a reason that this is a favourite tool amongst those that perform any kind of log file analysis – not only is it able to present the initial crawl in similar, neat charts and tables, there’s also the option to import secondary data (including data from Screaming Frog crawls), which can be URL matched and allow for more detailed analysis.


screaming logs


Should you be concerned with crawl budget?

In short, for the most part, your answer is no. Crawl budget is not something that the majority of sites will need to worry about. In fact, the Google article on crawl budgets features the following comment:


If your site does not have a large number of pages that change rapidly, or if your pages seem to be crawled the same day that they are published, you don’t need to read this guide; merely keeping your sitemap up to date and checking your index coverage regularly is adequate.


As such, unless you are offering a large site – think national newspapers and large eCommerce sites – you will not encounter a hard limit. However, should you be concerned, you can see which pages are not being crawled, or are seldom being crawled by combining Screaming Frog crawl data with your log files.


For more information and actionable advice on all areas of search and digital marketing, you can check out our resource section or keep up to date with our blog. Hopefully we’ll get to return to log files at some point in the near future. To see what our award winning teams can do for your brand – contact us today.



Share this:


View all posts

Online Competitor Analysis

When We Look to Grow a Business, We Need to Look at How it Fits Into the Existing Ecosystem of Brands in the Same or Similar Niches. Find Out How…

Read Now
View all Resources

We use cookies to give you the best experience on our website. If you continue without changing your cookie settings, we assume that you consent to our use of cookies on this device. You can change your cookie settings at any time but if you do, you may lose some functionality on our website. More information can be found in our Cookie Info and Privacy Policy.