Dealing with language spam in Google Analytics
Noticed your referral sessions are up and spotted a sinister looking culprit in your GA language report? Then it’s likely that you’ve been affected by language spam
If you’ve seen some strange goings on in your Google Analytics (GA) account recently, don’t worry, you’re not alone.
Here are a few recent examples you may have seen or read about:
Vitaly rules Google
Vote for Trump
What is language spam?
Usually, when looking at the ‘Language’ report in GA, language types will show as an abbreviation such as ‘en-gb’, ‘fr’, ‘en-us’ or similar. But towards the end of 2016 a new wave of GA spam was targeted at this report and has been showing up in GA accounts across the board.
This is usually injected into your GA account in one of 2 ways, either by spambots (a simple computer program used to perform repetitive tasks) that impersonate a user browsing your website, or by sending the hits straight into your GA servers.
Aside from the basic fact that it’s annoying and isn’t something you want to show clients or senior team members in their weekly & monthly reporting, this type of spam adds fake sessions to your reports, skews your data and can give the impression of false performance gains.
How easy is it to identify?
Don’t worry, this part’s easy! Simply log in to your Google Analytics account for the website you want to check, select your preferred reporting view and navigate to the ‘Language’ report under the ‘Geo’ drop down in the ‘Audience’ reports section.
What you should be seeing in an ideal world is a list that resembles the below:
Seeing something like the below instead? It’s time to take action:
For further information on the offending domains, filter to view just this entry, add the ‘Source/Medium’ secondary dimension and you’ll see that they are all referrals from obviously questionable domains such as: bukleteg.xyz, bezlimitko.xy, abc.xyz and similar.
How can I remove it?
Don’t worry, removing language spam from your GA data going forward isn’t as hard as it might seem. Follow the simple steps below and you’ll soon be filtering out this erroneous data.
Before you start, it is advisable to set up a new reporting view to test out this filter in, as you won’t be able to get the data you’ve stripped out back once the filter has been implemented.
Once you have examined the data side-by-side with your unfiltered view for 2-4 weeks, and are happy the filter is effective, you can copy it over to your main reporting view.
After setting up your new reporting view to segregate this filter, navigate to the ‘Admin’ settings for your view, then select the ‘Filters’ option in the ‘View’ column. Once here, click on the ‘Add Filter’ button:
Select the ‘Create new Filter’ radio button, then give your filter a name which indicates what it will exclude.
Select ‘Custom’ as the filter type and the ‘Exclude radio button.
Under the ‘Filter Field’ dropdown, search for ‘Language settings’ and select it.
Next, enter the following regular expression pattern in the ‘Filter Pattern’ field (with a version you can copy & paste below it):
To see how the filter would alter your data over the last 7 days, click on the ‘Verify this filter’ hyperlink towards the bottom of the page (If the numbers are small, this may not return anything).
Check the list generated to ensure that nothing is getting stripped out that shouldn’t be.
Here’s how the filter should look before you save it:
Click on ‘Save’ and start monitoring your language report over the next 2-4 weeks to check that the filter is working effectively and not excluding traffic it shouldn’t be.
Once you are happy, copy the filter over to your main reporting view and update the regular expression periodically to ensure that the filter remains effective.
Don’t forget to make use of GA’s built-in bot filtering function which will exclude hits from known spiders and spambots.
This can be found under the ‘Admin’, then ‘View Settings’ option for the view you want to enable it for and just needs the checkbox ticking to enable it, as can be seen below:
What about historical data?
Set up the filter, but wondering how you can get a clean historical data set?
That’s easy too; simply follow the steps below to set up a new segment that you can use to strip out this type of spam in your historical data.
Go to the ‘Reporting’ view in GA for your preferred view and select the ‘Add Segment’ option:
Select the ‘New Segment’ button:
Now select the field ‘Conditions’ under the ‘Advanced’ heading on the left hand side of the form.
After you have done this, give your segment the same name you gave to your new language spam filter to make it easy to find and select going forward.
Make sure the drop down options next to ‘Filter’ have ‘Sessions’ and ‘Exclude’ selected.
Change the drop down underneath that defaults to ‘Ad Content’ to ‘Language’ and the drop down that has defaulted to ‘contains’ to ‘matches regex’, then paste in the same filter pattern you used for your new filter (with a version you can copy & paste below it):
Check that the segment is working by clicking on the ‘Preview’ button and check the results that appear on the right hand side of the page to make sure the percentage of sessions you’re seeing matches your expectations.
The filter should look like the below example:
Thanks goes to AnalyticsEdge.com for their comprehensive post on this subject which contributed to the research for this post.
If you enjoyed this post, read Chris’ blog on tackling GA spam.