Over the last few years I’ve been lucky enough to attend numerous SearchLove (and LinkLove) conferences… and this time I was lucky enough to be invited to speak. As my deck was fairly technical I thought it would be worth going into more detail about some of the processes as there simply isn’t enough time to cover everything ‘step by step’ in a 40 minute presentation. So, here we go.
What You’ll Need
This post talks about APIs and you could build the tools yourself… but why bother when you can download it all for free.
1. A copy of the code I put together for the talk (download)
2. Some server space – LiquidWeb and Hetzner are both good options.
4. A Buzzstream Subscription (optional but recommended).
The Bigger Picture
I’m very passionate about the more technical side of SEO. Yes, “content is king” and “it’s about building the right relationships” but we’re still part of an industry that, inherently, is about making sure that we’re doing what a set of algorithms deem to be the most appropriate thing to get a site to appear higher than another. Yes, this might be about building links. It might be about writing something outstanding. It might be about creating a video that goes viral. Whatever that “might” is, a processor is still involved so understanding the geekier side of things is not only fundamental to being good at our jobs, it also makes our lives a hell of a lot easier.
Let’s see how.
Crafting Unique, Data Driven, Stories
My original title for this part of the deck was ‘Data Analysis’, but that wasn’t really accurate so I changed it. What I’m really talking about is crafting data driven stories. For that we need data, and the easiest way to get something unique is to run a survey.
Since they launched last year I’ve become a massive fan of Google Consumer Surveys. The service is incredibly simple to use and, when done properly, extremely cost effective. If you don’t need to do a cross-comparison of each answer (e.g. see when someone answers ‘A’ to question 1 whether or not they answer ‘C’ to question 3) then you’re looking at around $0.10 per answer. If, however, you do need to do that comparison then the costs very quickly jump to $1.10 per answer+.
Survey creation is simple but be careful – I recently made a stupid typo in a 6-question survey that was going out to 500 people. Very easy to do, but also extremely costly as it completely invalidated the data and meant that we had to start again. Timescales given for the answers to be collected are usually “within 7 days” but I’ve never had it take more than 3.
It’s not about the data, it’s about the story.
I often feel like a broken record when talking about this kind of stuff, but you can have all the data in the world and without a story nobody cares. When you’re putting together your questions and analysing the data you have to be looking for something that’s interesting, emotive, downright shocking, or hilarious. Without that story your outreach becomes incredibly difficult and the amount of coverage that you’re going to get drops of the face of a cliff.
There are a couple of examples of infographics that we’ve put together based on survey data in the deck, but the best recent instance is something that Tecmark put together on smartphone usage. You can see the blog post here and the first thing you’ll probably notice is that there isn’t anything on the page that has been put together by designer. There’s a header image. There’s some copy. There’s a link to a spreadsheet that contains the survey data…
This was cited by the likes of the BBC.
If your data is strong and your outreach is great, then that’s really all that matters.
The only additional thing to note here is that the data came from YouPoll, which is a certified polling agency and so holds far more weight than the likes of GCS.
Let’s Do Some Scraping
Moving up slightly from data analysis, we get into scraping. The example I gave here was using PHP to grab results from Google, based on a set of keywords, to identify who’s ranking most often and therefore who are real competitors are.
Let’s take a few keywords as an example:
- best online share trading
- online share trading
- online share trading uk
- share trading
- share trading account
- share trading online
- shares trading
- trade shares
- online trading
- shares trading
- shares online uk
- share trading
Now, we could easily go off and do individual searches for each of these, pasting the results into a spreadsheet, and then doing some excel wizardry to pick out appearance counts. But that takes time. If you have 1000 keywords it takes a LOT of time.
Instead, we use a few lines of code to run through each keyword, grab the results from Google (in a way that doesn’t get our IP banned), process the data, and then output it to a pre-built spreadsheet. Coding something like this takes a bit of time, but if you compare that with how many hours of your life you’d spend doing this over and over again it’s worth it.
The other good thing about building these mini-tools is that you end up being able to reuse the code. Changing it from grabbing a page of 10 results to one that contains 100 is a 5 second job, and from there you’re only 20 minutes away from having a full functioning rank tracker. No, it’s not going to work on an enterprise scale but it’s good enough to track 500 or so phrases each day which, for most people, is perfectly adequate.
Time For Some REAL Geek Time
So we’ve eased ourselves in gradually. We’ve done some data analysis and put some code together that will scrape Google. What comes next is where things start to get really powerful.
APIs are a dream come true for any real lazy SEO. Here’s an oversimplified description:
Essentially we have data on a system somewhere, code that sits in the middle, and then some useful / cool stuff on the other side. It could be link analysis, grabbing social metrics, pushing data into a CRM, or analysing the semantic scheme of a page. Non-techies also tend to get a bit scared when you talk about APIs but the reality is that they’re extremely simple to work with.
Try going to this URL in your browser:
What you’ve done there is run an API call and grabbed all of the social metrics for distilled’s homepage. Powerful? Not really. Again though, when you start doing these kinds of things en-masse it gets really fun.
Here’s an example:
1. We take a list of the SearchLove speakers and return data on them, and their tweets, from the Twitter API.
2. We can very quickly see who’s the most active and who has the biggest following and the sites mentioned in their bio. Try doing this for all of your own followers, or the people that follow your competitors, to pick out influencers. You’ll also notice Social Authority in there, which is a metric that we grab from the Followerwonk API.
3. Finally, we don’t just look at high level metrics. Instead we look at their last 200 tweets, grab the links in those tweets, and work out the sites that they’re engaging with the most. For outreach this is pure gold as you can very quickly work out the sites that you want to be getting featured on to give you the best possible chance of being noticed by your chosen influencers.
What About Keywords
I love SEMRush. It’s a tool that I use pretty much every day but there are certain things that you just can’t do using their web interface. Fortunately for us they have an API which unleashes a whole world of possibilities.
Let’s steal our competitor’s keywords.
1. Take the list of competitors that we scraped from Google a little earlier and paste them into our mini-tool.
2. In the background this goes off to SEMRush and does an API call along these kinds of lines:
What we’re doing here is grabbing 500 keywords from our chosen domain in the UK database, and pulling out their organic rank, the estimated CPC if we were bidding on it, the competitor landing page, and the monthly search volume.
3. By doing this for multiple sites we can look for trends and use them to help shape our own keyword research. Unfortunately our first export isn’t quite set up in a way that’s massively useful. What if a competitor is ranking for a really popular keyword that’s completely irrelevant? What if we’ve accidentally added a site in there that’s too general? We’ve got the potential to have load of useless data.
4. Here’s the solution. When grabbing data from the SEMRush API we also keep a count of how many competitors are either ranking for or bidding on each keyword that’s found. That way we can simply order the sheet by how popular a phrase is and use that as a basis for our keyword strategy.
After all, if multiple competitors are paying for a phrase, chances are it’s important to them so is likely to be something we should look at it more detail.
There are still going to be phrases that are useless but this, again, saves us time. We’re needing to do multiple exports from SEMRush, and we’re not spending our lives hoping that Keyword Planner gives us useful data.
5. Taking things a step further. At this point we’re in a position where we have list of our comeptitor’s keyword and landing pages, but opening all of those individually is going to take up valuable time. Fortunately we can use a tool to help.
Take the list of landing pages from our spreadsheet, drop them into URLProfiler, select the screenshot option, and let it do its things. Simple as that.
That’s All Great, But Where Are The Links?
There are three main providers of link profiles. Majestic. Ahrefs. Moz. My personal favourite of those is Majestic because they get the right mix of quantity, quality, and accurate metrics (for any kind of link analysis TrustFlow and CitationFlow are by far the best to use).
The only small problem is that getting hooked up to the Majestic API is far more difficult than it is for the other two but once you’re done it’s well worth it. Here’s a quick example of what you can do:
1. We add our list of competitors into our profiling tool.
2. In the background we’re going off to the majestic API and grabbing up to 10,000 linking URLs for each domain.
3. Once that’s done we’re left with a nice big spreadsheet of data. Again, we could do this manually by downloading CSVs and combining them before doing our analysis but why bother when you can get code happy? A 2 hour job suddenly takes 10 minutes and you can do work that’s far more interesting.
4. We do, however, have a problem. Some sites only link to one of our competitors, while others are low quality. If we filer out sites that are below a certain Citation / TrustFlow / Domain Authority / Page Authority we’re getting there.
5. Finally, let’s take our updated list and drop it into URLProfiler to get things like site type (blog, directory, etc.), Social Metrics, and whether it’s indexed in Google. We can then do a final filter based on those results and upload them into Buzzstream ready for outreach / link building.
As I said earlier, this isn’t meant to be a complete overview of everything you can do with APIs, but it’s a list of techniques that I’ve found useful over the last year or so and have saved countless hours.
Questions / ideas / developed something that you think is cool? Let me know in the comments and I’ll do what I can to help (which includes tweeting awesome stuff that you’ve made).