Bing and Yahoo are now (mostly) sharing their search func­tions and further inte­gra­tion will occur over the next several months.  This combi­na­tion has resulted in a 25% market share of the Search land­scape for Bing/Yahoo!  If you have so far ignored the growing impor­tance of Bing, now is the time to start looking at how to opti­mize your website for Bing search results.  This post will take a look at how index­ing occurs with Bing.

Google is clearly the leader in the Search space and a very mature and sophis­ti­cated search engine.  They do a remark­able job of index­ing a wide variety of content.  Bing, on the other hand, is still in its infancy and has yet to develop some of the rich index­ing capa­bil­i­ties that exist with Google.  Some of the more compelling differ­ences between how Google indexes websites verses how the Bing crawler oper­ates include Canon­i­cal Require­ments, Page Size , 301 & 302 Redi­rects, Meta Refreshes and Back­link Require­ments.

Canon­i­cal Require­ments:  Google is very good at deter­min­ing a website’s Canon­i­cal URL even if a website is not coded to prop­erly return the Canon­i­cal URL.  Google’s Webmas­ter Tools even allows website owners to manage their Canon­i­cal URL without chang­ing any code.  Addi­tion­ally, Google supports the use of the Canon­i­cal tag as a way for website owners to easily avoid dupli­cate content issues.

Bing, on the other hand, does not support the Canon­i­cal tag and does not offer Canon­i­cal URL manage­ment in their Webmas­ter Center.  And Bing has a need for websites to be Canon­i­cal from a program­matic stand­point.

The Bing crawler, by default, initially accesses a website’s root domain without the “www” sub domain (example: http://searchdiscovery.com).  If the server sends back a 200 ok response, then Bing will regis­ter the domain in their index without the “www”.  If the non-www domain is 301 redi­rected to the “www” sub domain, Bing will usually follow that direc­tive without issue and prop­erly index the “www” version of the domain.  If your preferred domain config­u­ra­tion includes the “www” sub domain, make sure your Canon­i­cal redi­rects are in place to reflect this pref­er­ence.

Page Size: Back in the early days of Google, google­bot would only crawl the first 100k of any given page.  As Google has matured, page size is less of an issue for their crawler.  Bing, however, currently only caches the first 100k of most web pages (although the range is more like 95k-105k).   Keep this in mind as you opti­mize your website for Bing.  Be sure and place the impor­tant elements of your content within the first 100k or it will not make it into the Bing cache.

301 & 302 Redi­rects:  Although Google prefers a 301 redi­rect, a 302 will not cause major issues with index­ing.  However, if a website employs a 302 Canon­i­cal redi­rect instead of a 301, Bing will not follow the redi­rect and, in many cases, will refuse to index the website alto­gether.  For this reason, it is very impor­tant that Canon­i­cal redi­rects always use a 301.  Bing has stated that, “We do not index any pages that have been 302 redi­rected by design.”  In other words, if the non-www version of a domain 302 redi­rects to the www version of the domain, Bing simply will not index the website.

Meta Refreshes:  Some websites still utilize a Meta Refresh to redi­rect users.  Bing and Google handle this tech­nique very differ­ently.  Google will follow a zero-second Meta refresh and treat it like a 301.   Bing will not.  As a matter of fact, the use of a Meta Refresh will termi­nate the Bing crawler from access­ing any more of the website being indexed.  If you want Bing to index your entire site, don’t use Meta Refreshes.

Back­link Require­ments:  Google clearly has the largest index.  In years past, Google made a big deal about how many web pages were in their index.  These days, they don’t really brag too much about their index size as they have won that battle.  Bing doesn’t even play.  Instead of seeking to index each and every piece of content avail­able on a domain, Bing actively removes web pages from their index if those pages are not found to have enough link author­ity or value to rank in their SERPs.

While Google will index every single file that it can find on a given website (and even some that don’t exist thanks to Javascript func­tions that expose inter­nal URLs), Bing discards pages that do not have ranking author­ity.  In most cases, in order for pages to main­tain a place in Bing’s index, they must have at least one exter­nal website link to them.  Accord­ing to Bing’s former Program Manager Brett Yount, websites “need to build page-specific back­links before those inter­nal pages will get indexed.”  There are some excep­tions, but this is currently the stan­dard oper­at­ing proce­dure of Bing’s index.

Bing will certainly evolve as their search engine matures.  Check back for future posts that will discuss ranking factors for Bing as well as updates to their index­ing capa­bil­i­ties.

by John Sherrod
Google +