Google
has several data centers housing its index and anyone familiar
with the Google dance will know what I am talking about. The dance
is what occurs when one data center is not returning the same
results as another. Someone searching for their keywords in LA
will often have different results from someone searching for the
same words in New York. The data is synching or dancing as SEO'ers
have termed it.
A quote by PhilC
at webworkshop sums it up nicely:
"Google has quite a few separate datacenters (DCs), each of
which contain the entire index and the entire algorithms. To all
intents and purposes, they are independent of each other. They
don't all contain identical indexes, and they don't all contain
identical algorithms (programs that do the rankings). It means
that they often produce different results to each other.
When you do a search, you get the results from whatever datacenter
Google chooses at that time. Unless you search a specific DC's
IP address, Google chooses the DC to return the results from,
and they choose it with every search you make, including when
you click to get the next page of results. It's not uncommon for
the next page of results to be provided by a different DC than
the previous page of results."
The location of these DC's is important to any SEO'er as they
can often be used to determine PR scores and ranking changes during
an update. Chasing these updates is what we do. Living in Thailand
these servers also allow me to see search results as I would in
North America as the .co.th Google server is a bit slow at times
propagating updates.
Now for the news. As Matt
Cutts pointed out on his blog, Google is readying a major
change in the way it handles its data - dubbed appropriately,
'Big Daddy'. (For those who don't know, Cutts is a software engineer
at Google and all around cool guys who shares SEO tips on his
blog). The new BigDaddy data center contains new code for examining
and sorting the Web, and once it has been tested fully, will become
the default source for Web results, according to Matt. In a January
4 post on his blog, Cutts said that this might happen in early
February or March of this year.
But what does Big Daddy mean to SEO? According to Rob Sullivan
a well known organic search strategist at Enquiro: "If an algorithm
update is like putting new tires on a car or installing a new
stereo system, this BigDaddy is like putting in a whole new motor.
They're totally revamping how Google works and resolving some
long-standing issues with getting sites indexed properly." Among
these long standing issues are:
* Canonicalization. This is a fancy search corp term describing
how a search engine decides which of a series of related URL's
is the proper one to insert into the Google index.
* Duplicate Content. See my previous article: Duplicate
Content Penalties.
* 302 redirects. This nefarious technique has long been used
by black hat's to hijack search rankings by providing a redirect
while still maintaining an innocent looking ranking description.
Now how Google will tackle these issues is a closely guarded
secret but there's a twist. In the past Google's data center IP's
changed almost daily facilitating a server hunt feverishly carried
out on many SEO forums. This time around Google has opened the
floodgates and Matt has publicly revealed a pair of server IP's
on his blog for testing and feedback by the community. Matt posted
the following IP's for testers: (66.249.93.104 and 64.233.179.104).
Matt regularly discusses the future of search and has also detailed
a new Google spider bot which is more flexible, quicker, and able
to read javascript and flash files. The bot is built on a Mozilla
browser and promises to read all non-text content.
"As Web technology develops and we get richer and more interactive
Web sites, [the search engines] can't just stick with just indexing
hyperlinks and text," Sullivan says. "They're going to have to
do everything."