Using a CASE statement to combine sources with multiple URLs
Those of you that follow me on social media may be aware I was having a bit of trouble with this the other day – though it turned out I was playing all the right notes, just not necessarily in the right order
What is a CASE statement?
Put as simply as I can (in a way I can grasp, at least), a CASE statement readies the program for a set of conditional selections. This selection it makes depends on the value of the expressions that follow it, you provide the program with a set of inputs you wish it to classify by following a set of IF, THEN, ELSE or WHEN conditions. It will ascertain if any of the stated conditions are true and return the appropriate value for that condition or will, if none are true, return the ELSE value.
Why are we using a CASE statement
The nature of CASE statements makes them the simplest way to look for regular expressions and attribute a value. When we are looking at source data in Data Studio, we can often be left with manual work adding up sessions from multiple URLs from the same domain – the CASE statement I’m about to lay out allows us to look for a regular expression in those domains and attribute them all to one source, meaning we no longer have to look for the m. or app generated URL versions in addition to the main URL for the source.
How to use the CASE statement in Data Studio
For this, I’m going to assume you’re already using Data Studio to some extent (if you need help getting started, try this first) and are reporting in some way on referral traffic – whether from social media, partner sites or anything else. If so, you’ll have something that looks a bit like this (or one of many variations):
If you don’t have something like this – you can set it up easily enough using the Google Analytics connector and pairing the ‘Source’ dimension with the metric of your choice (sessions, goal conversions, whichever suits). However, what you’ll also notice is that there are two instances of FACEBOOK as a referrer – and there are other referrers that are only partially represented as they have fallen in to the ‘other’ category.
In order to rectify this, we need to set up the CASE statement and we do this by creating a custom dimension.
Where to find custom dimensions
Click on the ‘Source’ dimension. At the bottom of the pop up you’ll see ‘CREATE FIELD’.
This will bring up the field creation box – which you should name in an easy to remember way in case you need to use the same dimension when measuring another metric. I, imaginatively, called mine ‘SOURCE (w/CASE)’.
Creating your custom dimension
For this we’re going to use WHEN, REGEXP_MATCH, THEN and ELSE:
- WHEN: – states that you’re about to set a condition that the following must meet to return a TRUE response and deliver the result.
- REGEXP_MATCH: – meaning ‘regular expression matches’ indicates that the condition is that a text string must in your data set must contain the text string you’ll provide to deliver the result.
- THEN: – is the outcome that will result if the conditions are met.
- ELSE: – is the outcome that will result if the conditions are not met.
Putting it all together
WHEN REGEXP_MATCH(SOURCE, “.*nkedi.*”) THEN “LinkedIn”
While this is only one of the lines (I’ll give a full social media version in a moment), you can see that it reads almost like a (albeit oddly phrased) set of instructions.
In the following situations
When a text string in (the source data, matches a series of characters found in each LinkedIn URL) then we will return the value “LinkedIn”
If not, we’ll keep the name the same
While the change isn’t massive, adding the multiple instances of each social media platform together does alter the table (bear in mind I also filter this table using a REGEXP_CONTAINS to get rid of search engines):
Here we can see that the true picture is given of FACEBOOK’s referrals (combining facebook, m.facebook and facebook.com), Twitter has moved up as all its various iterations are taken into account and it gives us the information we need at a glance (while also telling me I forgot Yahoo – but who hasn’t amirite).
The full code for this is as follows:
WHEN REGEXP_MATCH(Source, “.*aceb.*”) THEN “FACEBOOK”
WHEN REGEXP_MATCH(Source, “.*nsta.*”) THEN “Instagram”
WHEN REGEXP_MATCH(Source, “.*ube.*”) THEN “YouTube”
WHEN REGEXP_MATCH(Source, “.*witt.*”) THEN “Twitter”
WHEN REGEXP_MATCH(Source, “.*t.co.*”) THEN “Twitter”
WHEN REGEXP_MATCH(Source, “.*kedi.*”) THEN “LinkedIn”
In case you’re wondering – I tend to use the middle characters as they’re the least likely to appear as part of another site’s domain and some of the platforms don’t use the full name in some of their links – and Twitter goes a step further by using t.co (their own link shortener).
All in all, while there’s nothing exactly wrong with finding out how many sessions the mobile FACEBOOK site has sent, there are other places we should be finding this out – if we’re looking solely at performance per domain, this custom dimension really cleans things up.