How Google Ranks Web Pages – Part I: Overview

If there’s one thing that can be frustrating in the world of SEO, is the oft lack of understanding amongst its purveyors about how Google actually works. We all know the usual suspects, links, ‘content‘, TITLE tags… but there’s so much more to it. In this series of articles we’re going to look at the myriad of ways that Google finds content and ranks it. Our goal is NOT to deconstruct Google, but to give SEOs a broader understanding of all the elements that make up modern search engines.

The Basics

Before we even get into the nuts and bolts of how documents are ranked in Google, we need to get a few of the basics down. One area that’s seemingly not well understood is the difference between a signal, and a scoring element (aka ranking factor). Not everything that Google looks at is used for scoring a document. While it could be argued that it’s all semantics (pun intended), I prefer to try and conceptually look at them separately. What is a non-scoring signal? One good example is geo-location. Google will look at the location of the device that’s querying the system to establish what the potential results to be returned are. This obviously, doesn’t actually do any direct scoring of a document. It’s more about the results being returned. Another is semantic understanding of the query being input, or query classification. While this can affect the documents returned and their ordering, it’s not directly scoring them either. We might also consider duplication. There are instances where Google already has information (via syndication, bad architectural approaches etc) and would discount or ignore a given document altogether. Other elements can include language localization, internal link ratios, content updates/freshness, technology (schema etc), or even the infamous “RankBrain”. You get the idea. There are even signals that can be used for assessing search quality (implicit / explicit user feedback etc…). Not all signals are direct scoring elements of a web site or the documents therein. It’s an important distinction. What is a scoring signal? This one is a little more familiar and obvious to most that are working, or interested in search engine optimization. These are signals that will have a direct effect on how pages/documents are ranked in the search results. While I prefer to call these ‘scoring elements‘, it doesn’t matter that much as long as we understand that not all signals, are used for scoring. Site-Level Signals:
  • Authority/Trust
  • Classifications
  • Internal links
  • Localization
  • Entities
  • Domain history
  • Thin content
  • Boilerplate elements
  • Duplication
  • …etc…
Page Level Signals:
  • TITLE and Meta data
  • Classifications (and Localization)
  • Entities
  • Authority/trust (external links)
  • Temporal signals
  • Semantic signals
  • Linguistic indicators (language and nuances)
  • Prominence factors (bold, headings, italics, lists, etc.)
  • Semantic and phrase based
  • …etc…
Off-site signals
  • Links (PageRank, anchor text, relevance, temporal etc..)
  • Temporal (velocity, age, entity citation frequency, social etc..)
  • Authority / Trust (citations, co-citations, references etc..)
  • Graphs (social, knowledge base, entity etc..)
  • Spam signals
  • …etc…
And believe me, there are a TON more. But we’ll be getting into that in the next installment. What I wanted to get across at this point is the conceptual approach to how we understand the inner workings of Google.

Pages vs Web Sites

Another element to knock out here at the beginning is that Google will look at things from a few levels when understanding, scoring and ranking the search results. As we noted above, we have website level signals as well as actual page / document ones. This is also an important distinction to plug into your brain as we start to take this journey into the wild. The website can have concise elements such as boilerplate elements (templates and site-wide information) as well as topical and trust / authority signals. Whereas the document level can have specifics such a semantic relevance, temporal, links, demographic information, images and more.

Scoring, Boosting and Dampening

The next thing we want to start to get our head around is the concept that Google isn’t really just a bag of algorithms that is run, which spits back values to add to the pie. This type of thinking can be limiting and inhibit our understanding of the potential logic that’s in place. Scoring – we’ve already talked some about this, but one that is often easiest to understand is of course, PageRank. In fact, a lot of the link related factors. In simplest terms these are what makes up the initial scoring for a document in the index. Boosting  the next set of elements are scoring signals that takes that initial scoring for documents and boosts the initial scoring based on various signals and thresholds. They can be based on device or demographic aspects, on temporal needs or even personalization. Dampening – just as we have scoring signals that can boost a documents rankings, we also have those that can be used to lower them (aka dampening). They aren’t penalties per se, as those are more the world of manual actions. But some of these can include spam signals, quality (Panda etc) and even trust, temporal and device related. In many instances there is cross-over from an application of boosting and dampening, while in other instances, they’re stand-alone in the degree of implementation. For the moment, we merely need to get a sense of how everything works, we’ll get into the specifics in the next installment.

The “Who Knows” Element

And before we move on to the next part in this series, it’s worth noting that we can never truly know exactly how Google is ranking the search results. Not definitively at least. I call it the ‘Manhattan Project’ effect. Not only do we not know (as SEOs) all the nuances of how it works, it’s unlikely there are many at Google that know all the aspects in detail either. I’ve had many a chat with Googlers where the response would be, “…not sure on that, I’ll have to check”. As such, as we go down this path you will see a lot of “possibly”.. “might be” or “may be” using various approaches we’re going to look at. We’re seeking to better understand the mind-set of the engineers. We’re going to open our minds to concepts we may not have previously considered. Our goal is to have a broader understanding to use when trying to get our websites and clients to higher visibility and ultimately traffic. Oh and if you’re here and a bit new to it all, this video might help get you started:


Continue to Part II – Unusual Suspects > Back To All Guides >
Copy link
Powered by Social Snap