Predictive SEO: How HubSpot Saves Traffic We Haven’t Lost Yet

Predictive SEO: How HubSpot Saves Traffic We Haven't Lost Yet



This post is part of Made @ HubSpot, an internal thought leadership series through which we draw lessons from experiments conducted by our own HubSpotters. Have you ever tried to carry your clean clothes up by hand and things keep falling off the giant mass of clothes you are wearing? This is a lot like trying to increase organic website traffic. Your content calendar is loaded with new ideas, but with every published web page, a previous page drops in search engine rankings.
Get SEO traffic It’s tough, but keeping SEO traffic is another ball game. Content tends to “decay” over time due to new content created by competitors, ever-changing search engine algorithms, or a myriad of other reasons. You’re struggling to get the whole site moving, but things keep filtering traffic where you’re not paying attention. Recently the two of us (Alex Birkett and Braden becker 👋) developed a way to find this traffic loss automatically, at scale, and before it happens.

The problem with traffic growth

At HubSpot, we increase our organic traffic by taking two trips from the laundry instead of one. The first trip is with fresh content, targeting new keywords that we haven’t yet ranked for. The second journey is with updated content, dedicating a portion of our editorial calendar to finding which content is losing the most traffic – and leads – and reinforcing it with new content and SEO-oriented maneuvers that better serve certain keywords. It’s a concept that we (and many marketers) have come to call “historical optimization.” But there is a problem with this growth strategy. As our website traffic increases, tracking each page can be a stubborn process. Selecting the correct pages to update is even more difficult. Last year, we wondered if there was a way to find blog posts whose organic traffic simply “runs the risk” of decreasing, to diversify our upgrade options and perhaps make the traffic more stable as our blog grows.

Restoring traffic vs. protecting traffic

Before we talk about the absurdity of trying to restore the traffic we haven’t lost yet, let’s look at the benefits. When looking at the performance of a page, it is easy to detect the decrease in traffic. For most growth-minded marketers, downward pointing traffic trend line It’s hard to ignore, and there’s nothing quite as satisfying as watching that trend rebound. But all traffic recovery comes at a cost: Because you can’t know where you’re losing traffic until you lose it, the time between decreasing traffic and recovering it is a sacrifice of leads, demos, free users, subscribers, or some similar growth metric coming from your most interested visitors. You can see that visualized in the organic trend chart below, for an individual blog post. Even with the traffic saved, you’ve missed opportunities to support your downstream sales efforts. If you had a way to find and protect (or even increase) page traffic before it needs to be restored, you wouldn’t have to make the sacrifice shown in the image above. The question is: how do we do it?

How to predict falling traffic

To our delight, we didn’t need a crystal ball to predict the wear and tear of traffic. What we did need, however, was SEO data that suggests we could see traffic go for blog posts in particular if something were to continue. (We also needed to write a script that could extract this data for the entire website – more on that in a minute.) High keyword rankings are what generate organic traffic for a website. Not only that, but most of the traffic goes to websites lucky enough to rank on the first page. That traffic reward is even higher for keywords that receive a particularly high number of searches per month. If a blog post were to get off the first page of Google, for that high volume keyword, it would be a toast. Considering the relationship between keywords, keyword search volume, ranking position, and organic traffic, we knew this is where we would see the prelude to a loss of traffic. And thankfully, the SEO tools at our disposal can show us that ranking slippage over time – the image above shows a table of keywords that a single blog post ranks for. For one of those keywords, this blog post ranks at position 14 (Google page 1 consists of positions 1-10). The red boxes show that ranking position, as well as the sheer volume of 40,000 monthly searches for this keyword. Even sadder than the 14th position ranking for this article is how it got there. As you can see from the teal trend line above, this blog post was once a high ranking result, but it fell steadily for the next several weeks. Post traffic corroborated what we saw: a noticeable drop in organic page views shortly after this post disappeared from page 1 for this keyword. You can see where this is going … we wanted to detect these ranking drops as they are about to leave page 1 and in doing so, re-establish the traffic we were “at risk” of losing. And we wanted to do this automatically, for dozens of blog posts at once.

The “at risk” traffic tool

The way the At Risk tool works is actually quite simple. We think about it in three parts: Where do we get our input data from? How do we clean it? What are the outputs of this data that allow us to make better decisions when optimizing content? First, where do we get the data from?

1. SEMRush keyword data

What we wanted was keyword research data at the property level. So we want to see all the keywords that hubspot.com ranks, particularly blog.hubspot.com, and all the associated data that corresponds to those keywords. Some fields that are valuable to us are our current search engine ranking, our previous search engine ranking, the monthly search volume for that keyword, and potentially the value (estimated with the difficulty of the keyword or CPC) of that keyword. To obtain this data, we use the SEMrush API (specifically, we use your “Domain Organic Search Keywords” report): R, a popular programming language for statistics and analytics as well as marketers (specifically, we use the library ‘httr’ to work with API), then we select the top 10,000 keywords that drive traffic to blog.hubspot.com (as well as our properties in Spanish, German, French, and Portuguese). We currently do this once a quarter. This is a large amount of raw data, which in itself is useless. So we have to clean the data and warp it into a format that is useful to us. Next, how do we clean up the data and build formulas to give us some answers on what content to update?

2. Clean the data and build the formulas

We also do most of the data cleaning in our R script. So before our data reaches another data storage source (be it Spreadsheets or a table of data from a database), our data , for the most part, they are cleaned and formatted as we want. We do this with a few lines of code – what we are doing in the above code, after extracting 10,000 rows of keyword data, is parsing it from the API to make it readable and then building it to a data table. We then subtract the current ranking from the previous ranking to get the difference in ranking (so if we used to rank at 4, and now we rank at 9, the difference in ranking is -5). We filter further so that only those with a negative value ranking difference show up (so only the keywords for which we have lost ranking, not the ones we won or remained the same). We then send this clean, filtered data table to Google Sheets, where we apply tons of custom formulas and conditional formatting. Lastly, we needed to know: what are the results and how do we make decisions when optimizing content?

3. Results of the content at risk tool: how we make decisions

Given the input columns (keyword, current position, historical position, position difference, and monthly search volume) and the above formulas, we calculate a categorical variable for an output. A URL / row can be one of the following: “AT RISK” “VOLATILE” Blank (no value) Blank outputs, or those rows with no value, mean we can basically ignore those URLs for now. They haven’t lost a significant amount of ranking, or were already on page 2 of Google. “Volatile” means the page is moving down in rank, but it is not a blog post old enough to warrant any action yet. New web pages jump in the rankings all the time as they age. At some point, they generate enough “authority on the subject” to stick around for a bit, broadly speaking. For content supporting a product launch, or a critical marketing campaign, we could give these posts a bit of TLC as they are still maturing so they are worth bookmarking. “At Risk” is primarily what we are looking for: Blog posts that were published more than six months ago have been downgraded and are now ranked 8-10 for a high-volume keyword. We see this as the “red zone” for bad content, where it is less than 3 positions away from going from page 1 to page 2 of Google. The worksheet formula for these three tags is below: basically a compound IF statement to find page 1 rankings, a negative ranking difference, and the distance between the post date and the current day.

What we learned

In short, it works! The tool described above has been a regular, if not frequent, addition to our workflow. However, not all predictive updates save traffic on time. In the example below, we saw a blog post go off page 1 after making an update and then return to a higher position. And that’s fine. We have no control over when and how often Google decides to re-crawl a page and re-rank it. Of course, you can resubmit the URL to Google and ask them to crawl again (for critical or urgent content, this extra step may be worth it). But the goal is to minimize the amount of time this content underperforms and stop bleeding, even if it means leaving the speed of recovery to chance. Although you will never really know how many page views, leads, subscriptions or subscriptions you can lose on each page, the precautions you take now will save you time that you would otherwise spend trying to determine why your total website traffic sank. . last week.



Leave a Reply

Your email address will not be published. Required fields are marked *