Google's Big Daddy
Posted on March 2, 2006
By Miles EvansGoogle has several data centers housing its index and anyone familiar with the Google dance will know what I am talking about. The dance is what occurs when one data center is not returning the same results as another. Someone searching for their keywords in LA will often have different results from someone searching for the same words in New York. The data is synching or dancing as SEO'ers have termed it.
A quote by PhilC at webworkshop sums it up nicely:
"Google has quite a few separate datacenters (DCs), each of which contain the entire index and the entire algorithms. To all intents and purposes, they are independent of each other. They don't all contain identical indexes, and they don't all contain identical algorithms (programs that do the rankings). It means that they often produce different results to each other.
When you do a search, you get the results from whatever datacenter Google chooses at that time. Unless you search a specific DC's IP address, Google chooses the DC to return the results from, and they choose it with every search you make, including when you click to get the next page of results. It's not uncommon for the next page of results to be provided by a different DC than the previous page of results."
The location of these DC's is important to any SEO'er as they can often be used to determine PR scores and ranking changes during an update. Chasing these updates is what we do. Living in Thailand these servers also allow me to see search results as I would in North America as the .co.th Google server is a bit slow at times propagating updates.
Now for the news. As Matt Cutts pointed out on his blog, Google is readying a major change in the way it handles its data - dubbed appropriately, 'Big Daddy'. (For those who don't know, Cutts is a software engineer at Google and all around cool guys who shares SEO tips on his blog). The new BigDaddy data center contains new code for examining and sorting the Web, and once it has been tested fully, will become the default source for Web results, according to Matt. In a January 4 post on his blog, Cutts said that this might happen in early February or March of this year.
But what does Big Daddy mean to SEO? According to Rob Sullivan
a well known organic search strategist at Enquiro: "If an algorithm
update is like putting new tires on a car or installing a new
stereo system, this BigDaddy is like putting in a whole new motor.
They're totally revamping how Google works and resolving some
long-standing issues with getting sites indexed properly." Among
these long standing issues are:
* Canonicalization. This is a fancy search corp term describing how a search engine decides which of a series of related URL's is the proper one to insert into the Google index.
* Duplicate Content. See my previous article: Duplicate Content Penalties.
* 302 redirects. This nefarious technique has long been used by black hat's to hijack search rankings by providing a redirect while still maintaining an innocent looking ranking description.
"As Web technology develops and we get richer and more interactive
Web sites, [the search engines] can't just stick with just indexing
hyperlinks and text," Sullivan says. "They're going to have to
blog comments powered by Disqus