Over the past few days, Google's search results have suffered from a spam attack that can only be described as completely out of control. Many domain names rank for hundreds of thousands of keywords, suggesting that the scale of such attacks can easily reach millions of keyword phrases.
Update:
Spam was first discovered by Lily Ray:
Surprisingly, many domain names were registered in the last 24-48 hours.
Recently, a series of posts from Bill Hartzer (LinkedIn profile) caught my attention where he posted a link map generated by the Majestic backlinks tool that exposed several spam** link networks.
The link diagram he posted shows dozens of ** closely linked to each other, which is a fairly typical pattern of spam link networks.
Screenshots of tightly connected networks
Bill Hartzer, from Majestic
Bill and I discussed spam via Facebook Messenger** and we both agreed that although spammers put a lot of effort into creating a network of backlinks, these links weren't actually responsible for the high rankings.
Bill said
"In my opinion, this is partly the fault of Google, which seems to value content more than links. ”I 100% agree that Google places more emphasis on content than links. But my thinking is that spam links exist to allow Googlebot to spot and index spam pages, even if it's just for a day or two.
Once indexed, spam pages are likely to exploit what I think are two vulnerabilities in Google's algorithm, which I'll discuss next.
Multiple are ranking long-tail phrases that are easy to rank, as well as phrases that have a local search component, which are also easy to rank.
Long-tail phrases are keyword phrases that people use but rarely use. The concept of the long tail has been around for nearly two decades and then became popular with the 2006 book Long Tail: Why the Future of Business Is to Sell Less and Sell More.
Spammers are able to rank these phrases that are rarely searched for because there is little competition for these phrases, which makes ranking easy.
So, if spammers create millions of pages of long-tail phrases, those pages can rank for hundreds of thousands of keywords per day in a short period of time.
Companies like Amazon use the long-tail principle to sell hundreds of thousands of products per day, unlike selling one product hundreds of thousands of times a day.
This is what spammers take advantage of, namely the convenience of ranking long-tail phrases.
The second thing spammers exploit is the inherent vulnerability of local search.
The local search algorithm is different from the non-local keyword ranking algorithm.
Examples of *are variations of craigslist and related keywords.
For example, phrases such as craigslist auto parts, craigslist rooms torent, craigslist for sale byowner, and thousands of other keywords, most of which don't use the word craigslist.
The scale of spam is enormous, far beyond keywords containing the word "craigslist".
It is impossible to see what a spam page looks like by accessing it using a browser.
I'm trying to see the source of the **ranking in google, but all spam** is automatically redirected to another domain.
Next, I entered the spam URL into the W3C link checker to access it, but the W3C bot couldn't see it either.
So I changed the browser user** to identify myself as googlebot, but the spam still redirected me.
This indicates that the user is not checking if it is Googlebot.
Spam**Checking Googlebot IP address. If a visitor's IP address matches that it belongs to Google, the spam page will show content to Googlebot.
All other visitors are redirected to other domains that display cursory content.
In order to view the HTML of the **, I have to use the Google IP address to access it. So I use Google's Rich Results tester to access the spam** and log the HTML of the page.
I showed Bill Hartzer how to extract HTML using the rich results tester and he immediately tweeted about it, lol. Dangdang!
The rich search results tester can choose to display the HTML of a web page. So, copy the HTML, paste it into a text file, and save it as an HTML file.
HTML screenshot provided by the Rich Results tool
Next, I edited the html file to remove all j**ascript and saved the file again.
I can now see what the page looks like on Google:
Screenshot of spam web page
Bill sent me an e-** with a list of keyword phrases that only one spam** ranked. One spam**, one of them, ranked for more than 300,000 keyword phrases.
A screenshot showing a domain's keywords
There are a lot of craigslist keyword phrases, but there are also other long-tail phrases, many of which contain local search elements. As I mentioned, it's easy to rank for long-tail phrases, it's easy to rank for local search phrases, and it's easy to rank for these keyword phrases when you combine these two phrases.
Local search uses a different algorithm than non-native algorithms. For example, local sites often don't need a lot of links to rank queries. These pages only require the right type of keyword to trigger the local search algorithm and rank it for geographic regions.
So, if you search for "craigslist auto parts", this will trigger the local search algorithm, and because it's long-tail, it won't take much time to rank it.
This is a problem that has persisted for years. A few years ago, a ** was able to rank "Plano Rhinoplasty in Texas" that contained Ancient Roman Latin content and English titles. Rhinoplasty is a long-tail local search, and Plano, Texas is a relatively small town. Ranking for rhinoplasty keyword phrases is very simple, and Latin is able to rank them easily.
As Danny Sullivan admitted in a tweet, Google has been aware of the spam issue since at least December 19.
After all this time, it will be interesting to see if Google finally finds a way to combat this type of spam.