This post is very old and contains obsolete information.
I like staring at the real time stats of Google Analytics. As a dashboard, it’s not really as amazing as Chartbeat is, and it doesn’t let you drill down into the data as much as Mixpanel. But GA is super simple to setup and it’s Google, so everyone uses it.
Another obsessive/fun thing to do is to see where that spike in inbound traffic is coming from. On HappyFunCorp there are days where we get a sudden influx of Happy Thoughts which warms our hearts and floods our inboxes. Where did they come from? How do we figure it out?
Lets look at how we can interact with Google Analytics using google-api-ruby-client. At the end of this, we are going to be able to see the current traffic stats, top referrals, see a timeline of when the referals first came in, and do what we can from that information to track down who is talking about us. GA will show us that we are getting a lot of
SOCIAL traffic, but what else can we figure out?
Step 1 Setting it up to access Google on behalf of the user
We’re going to be using OAuth2 to authorize our script. So head over to the Google Developers Console.
Create a Project. You should name this something that makes sense to you. I called my Social Investigator
Enable the Analytics API. This can be done on the side bar, under APIs and auth > Apis. Scroll down to find it.
APIs and auth > Consent Screen. Create something here, you’ll need to flesh this out later.
APIs and auth > Create Client ID. Select Installed Application with type Other. This will create the keys for you, and then you Download JSON and save it in a file.
Step 2 Getting an access token and an API file using InstalledAppFlow
Working with the Google API is pretty confusing at first, since there’s multiple steps that need to happen before you can even figure out how to make a call. Twitter’s API, which in every other way is a joke compared to Google’s way of doing things, has a handy way to get a single use access token. With Google you need to do this yourself. And once that’s done, you need to load the API meta data from the API API to be able to access it!
We’re going to be using a few gems to put in your
Now we can use the
Google::APIClient::InstalledAppFlow class to open up a web browser, have the user log in as needed to their Google Accounts, and grant access to the API. The code below shows the basics of this. We assume that the file you downloaded in step 1 is called
client_secrets.json and in the same directory, and we are writing out the granted credentials into the
When we run this the first time, you should be prompted to grant access to your application. The second time it should run and exit cleanly – it has access, but we haven’t asked to do anything yet.
Step 3 Discover the API
Google takes their software development seriously, and it shows. Not only are there many different APIs available to use, but they all have different versions. These endpoints are all different, and rather than have them all hard coded into the client access library, you use the discover api to pull in metadata associated with it. The following code will load up this data and cache it to the filesystem so the next access will be faster.
Step 4 Finding a web profile
UA-56296045-1). We’ll also show the websiteUrl associated with the account since that’s what people really know.
Step 5 Querying with ga.get
The main end point we are looking at is
ga.get. There’s an interactive developer tool that will let you experiment with what is available and how it works. If you load up that tool now, you’ll see what we’ve written code that will let us find the property id for our query, so we are now ready to start querying.
The Query Explorer is really useful because of the dropdowns around dimensions and metrics. When you hover over any of the fields, documentation comes up which will explain what each of the fields mean.
Dimensions are different ways of slicing up the data. These include things like page title, referer, adwords, and other ways of slicing up the people that came to your site. Metrics are the actual data for these buckets of users, and include things like sessions, user counts, page views and average session duration.
In order to make the query we first need to setup the query parameters. We’ve split that off into its own method called
query_template. The required fields are profile id, start date, and end date. We’re going to setup some defaults here which we will override in other methods when we use it.
Our default here is that we’re slicing on
users (which include returning users).
Let’s do an actual query with the parameters:
Now we need a way to print the result. The result that we get back has two different things,
columnHeaders which is a reflection of the
query that we passed in, and the data itself is an array of arrays in
row. We’re using
Hirb helper method here to format the result.
Let’s give it a try:
Make sure you’ve made note of your profile id above, and we can see what it looks like now:
Step 6 Adding more commands
Lets create a
Thor class here for the things that we want to query, and then go through a implement the calls in the
We want to be able to specify the timeframe for when we want the results. It defaults to the current date, but lets add some more options for today, yesterday, recently (last 7 days), and month (last 30 days, which isn’t really a month but close enough.)
We also want to have different output options, so we’ll add a table switch, like the
Hirb output above, and csv to make it easier to plug this into other tools.
We’re going to create 4 different ways to query the data.
- What content is getting traffic
- Who is linking to your site
- Who is linking to specific pages on your site
- A timeline of when content was published and people started linking to it
Commands: ga.rb profiles # List Account Profiles ga.rb hotcontent PROFILE_ID # Show hot content for profile id ga.rb referers PROFILE_ID # Show hot content for profile id ga.rb content_referers PROFILE_ID # Show hot content for profile id ga.rb timeline PROFILE_ID # Show a timeline of referers
Here’s the CLI code:
Step 7 Lets implement
We have the
profiles command and the
hotcontent command, or what content is getting traffic working already. Lets add some code to make the
--csv option work, this goes into the
Who is linking to your site?
We can find out by looking at
ga:source which is basically the domain,
ga:referralPath which is the path part of the url if it’s a link referral, and
ga:medium which will tell you if it’s linking from a direct url, social media link, email link, or ad traffic.
Who is linking to specific pages on your site can be dertimined by adding the
ga:landingPagePath dimension to the above query. This now breakdown the source of traffic not to the site as a whole, but to a specific landing page. We’re also changing the
sort query parameter to take this additional dimension into effect.
This works, but we can change the way it’s printed out to be visually more useful. In the
HammerOfTheGods we can flesh it out a bit, so it only prints out your local path once while listing the referals indented so you can scan and see what’s going on grouped by page.
Step 8 The timeline
A timeline of when content was published and people started linking to it can be created by combinding 2 of the methods that we’ve already written,
referers, and looping through and querying them one day at a time.
We start 30 days ago, and get a list of content for that day. If we haven’t seen it before, we say that it was posted that day. We then get a list of referrals for that day. If we haven’t seen them before, we print it out. I’m also supressing links that have passed in less that 2 visitors, since they tend to be very noisey.
The full code is available to play with. The mechanism for talking to Google APIs from a script works everywhere, but if you are going to do this on your server you’ll want to get the OAuth2 key using a different process than the