Quit Screwing Around With Keyword Density
tf*IDF is the New Metric
Written: February 13, 2022, Jesse Eddleman (potentialeight)
Keyword Density is Old News
Some people have an old style of content planning that centers around sending off a list of topics to a content mill with things like “use each keyword with a density of 1% in 600 words.” For those people, I have good news, and I have bad news.
The bad news is that this approach is incredibly outdated, while still much beloved, much like the old Google logo or using a Commodore 64 to conduct payroll.
The good news is that they probably already know something’s wrong since they keep getting walloped each time Google updates their algorithm, so they’ll be receptive to learning something new.
If that sounds like some complicated crap some math nerd came up with, you’re exactly right. I mean, who do you think works at Google figuring out all of this stuff they can do to make your life hell every time a new algorithm update comes out. It’s surely not the basket weaving majors.
- Chapter 1: What is tf*IDF? – Remember the math nerds who engineer the algorithm.
- Chapter 2: Pride and Prejudice – How the algorithm can figure out if you’re faking the funk.
- Chapter 3: How to Use tf*IDF for Yourself – The mental shift required to make the algorithm updates work in your favor.
Chapter 1: What is tf*IDF?
And why does every new acronym have to look like the name of Elon Musk's next kid?
In the world of sorting information, tf*IDF stands for “term frequency–inverse document frequency.” Making a really long story as short as I possibly can, it means Google can figure out topical relevance and intent (which are major ranking factors now) a lot better than people give them credit for.
One of the issues right now across the SEO world is that a lot of the old models are getting smashed, but people are hesitant to adjust for business reasons. For a long time, you could give good instructions to cheap writers, who are almost never subject matter experts, and still get pretty reasonable results. The problem is that Google has used tf*IDF and other tools to really put a hurting on people who use that approach.
A lot of people in the online gambling SEO world (and many other industries) have been getting really smashed by updates over the past few years. Panda is the one that a lot of people remember for wiping out some serious players. However, each subsequent update has been clamping down on all of this more and more each time around.
This Leads to a Fundamental Problem...
The old approach to handling content within the online casino SEO realm has been in place for a long time. A lot of people have their stuff set up in a way that makes sense based on the old models, despite the fact that those models haven’t been working out so well over the past few years. We’ve all seen some of the biggest names with tens of thousands of pages of content and an ungodly number of links get picked off from top rankings by sites that have only been around for a few years for key terms.
And it’s only going to get worse. Or better, if you’re a newer force on the scene.
So how do you balance that with not wanting your content costs to skyrocket? However you want to handle it is up to you when it comes to specifics, but you need to know the layout of the land and how tf*IDF works within the context of how Google is evaluating your content before you form a plan.
You’re going to need some life insurance for your sites if you don’t get with the program.
Good News: tf*IDF Isn't Actually That Hard to Understand
I’m going to break this down in a way that’s a bit simplified but that will work really well as a heuristic for our purposes. If we keep in mind that the whole point is for Google to make sure that people are getting what they’re searching for, the thing to focus on is that they need an algorithm that can figure out topic relevancy combined with search intent.
That leads to the following three steps that make up tf*IDF:
They start off by making a list of all of the individual words, two-word phrases, three-word phrases and so on. Imagine putting all of these in a spreadsheet for that piece of content, one on each row.
They rank these phrases from how often they are mentioned from most often to least often. The most mentioned words and phrases are considered less relevant to the topic at hand in general.
If each piece of content on your site has its own spreadsheet of terms like this, the common terms are compared among related pages and other pages on your site to ascertain relevancy even further.
While I realize that sounds like some kind of physics dissertation gobbledygook, here’s the important thing to understand: It can easily detect when you’re trying to over-optimize for specific phrases, and the Google gods will smite you when this happens. Instead, your topics need to cover sufficient breadth as well as depth, which can be hard to set up when giving instructions to someone who isn’t well-versed in the industry and/or who is mostly rewriting other articles, reviews and pieces of content they find online.
But since all of this is a little tricky to follow without a concrete example, I’ll give one in the next chapter.
Chapter 2: Pride and Prejudice
Jane Austen's One Weird Trick to Boost Your Rankings
Pride and Prejudice is some book written by some woman that was about an insufferable guy trying to get with this girl or something. I never read the book, and I only saw the movie because it was the favorite book of this girl I was seeing when it came out.
Here are the 50 most commonly found words in the whole book:
Roughly speaking, Google makes a list like this of all of the words on your site (and a different list for two-word phrases, another for three-word phrases, etc.). Do you notice how this list gives you absolutely no idea whatsoever what the book is about?
Neither did the movie.
So how do you sort out which words don’t matter and which ones show purpose, subject matter and so on?
You compare them to lists from other books and remove the most common words. How many of the above words do you think are found in the top 50 words of every novel in a given university library? Just remove those words, and now you start figuring out what the book is really about. This is the heart of tf*IDF for our purposes.
Getting Down to Topical Relevance
When you remove all of the common words that are found in most other books, you probably only get one word out of this top 50 that stands out: Elizabeth. For those who aren’t aware, Elizabeth is the main character of the book.
Now imagine that Google does this for thousands of slot reviews on one of the big-name sites. Once they get rid of all of the common words, what is left over, and what does that spell out?
I happened to do this type of analysis with one-, two- and three-word combinations for a smaller number of reviews (around 50 items, a total of around 75,000 words). Then, I removed all of the words that showed up in the top 100 phrases in Pride and Prejudice. This got rid of all of the most common words in the English language.
So what was I left with? You can probably guess? Slot, online slots, reels, paylines, free spins, wild symbols, etc. Google has suddenly figured out topical relevance and the types of words and phrases it expects to see.
Getting REALLY FAR Down to Topical Relevance
Here’s one that will bake your noodle: How does Google figure out what makes this slot special so that it can group similar slots?
It’s easy: Take all of the common words left over, and remove the most common ones found in all slot reviews.
Guess which words were left after all of that? Consider the words that would be left over once you take away all of the most common English language words and all of the most common terms related to slots.
It was mostly theme-based. I saw things like “Aztec,” “fruit machine” and “pirates” start showing up. Sometimes I’d see things like “four reels” since that doesn’t happen very often. The point is that it becomes very specialized pieces of information specific to that one piece of content within the larger context of the group that piece of content belongs to.
Chapter 3: How to Use tf*IDF For Yourself
Knowing is Half the Battle (or Maybe 80% in This Case)
I don’t want to sound like I’m all about some doom and gloom here, but a lot of the big names are going to continue to fall if they don’t take some drastic action to go back and fix their content. I mean, a number of them have already taken losses that seem insane since Panda came out, and it seems like each core update just keeps chewing them up and spitting them out.
For better or worse, it seems like Google has a hard-on for going after people who have become successful using their platforms. It’s the weirdest thing, but whatever floats their boat. It’s their playground; we just run around on it.
That’s why knowing what’s going on is so ridiculously important here. It’s not some unsolvable mystery that Google just randomly smites big sites here and there that have been established for over a decade. It’s happening for a reason, and the switch to tf*IDF models over the past several years has been a big part of it.
The good news, however, is that it’s opening up space for new faces to come in and pick up a bigger piece of the pizza.
Concept #1: Focus on Quality Over Quantity
For the longest time, the best business models to implement have been about churning out craploads of content to cover as much ground as possible. If quality was lacking to some degree, that was completely fine because you could very easily make up for it in quantity.
Something tf*IDF has done is make it easier for Google to crack down on this type of thing. As a result, a focus on quality is now the best business decision. Generally speaking, that means paying more for content. However, even if you have to put out fewer items, it means that those items will have staying power that earns over the long run.
Concept #2: Stop Going Against the Grain (ie: Google)
Google is making it plain as day what they want. They’re spelling it out for you every time they knock the titans out of the industry out of the sky. Instead of thinking about it as being in a fight with Google, instead think about it as being on the same side as Google.
For example, you could be worried about the next algorithm update because you’re afraid it will hurt your sites. Another way to think about it is being eager for the next update because that will remove more of the competition out of your way and provide more spots on more SERPs for your superior content.
Concept #3: Think of This as a Massive Opportunity
The online gambling industry is absolutely huge already, and it continues to get bigger. This is a game that does not slow down, and it’s never going to go away. That means you have tremendous opportunity, especially if you keep up with the times, because so many of the big names get complacent.
And that’s not knocking them.
It’s completely normal to let your guard down and rest on your laurels. It’s completely normal to keep following most of the same processes that made you successful in the first place.
For the sake of restoring some masculinity to this post after the Jane Austen references, consider the plot of Rocky III as a great example. The old sites are Rocky, and the new breed are Clubber Lang.
If you’re Rocky, this is an opportunity to step up your game and show what you’re willing to do to stay up to date and solidify your position as a force to be reckoned with in the industry. If you’re Clubber Lang, then throw on “Eye of the Tiger,” and start strategizing on how to make the Google algorithm work for you instead of against you.