Bing and Yahoo are now (most­ly) shar­ing their search func­tions and fur­ther inte­gra­tion will occur over the next sev­er­al months.  This com­bi­na­tion has result­ed in a 25% mar­ket share of the Search land­scape for Bing/Yahoo!  If you have so far ignored the grow­ing impor­tance of Bing, now is the time to start look­ing at how to opti­mize your web­site for Bing search results.  This post will take a look at how index­ing occurs with Bing.

Google is clear­ly the leader in the Search space and a very mature and sophis­ti­cat­ed search engine.  They do a remark­able job of index­ing a wide vari­ety of con­tent.  Bing, on the oth­er hand, is still in its infan­cy and has yet to devel­op some of the rich index­ing capa­bil­i­ties that exist with Google.  Some of the more com­pelling dif­fer­ences between how Google index­es web­sites vers­es how the Bing crawler oper­ates include Canon­i­cal Require­ments, Page Size , 301 & 302 Redi­rects, Meta Refresh­es and Back­link Requirements.

Canon­i­cal Require­ments:  Google is very good at deter­min­ing a website’s Canon­i­cal URL even if a web­site is not cod­ed to prop­er­ly return the Canon­i­cal URL.  Google’s Web­mas­ter Tools even allows web­site own­ers to man­age their Canon­i­cal URL with­out chang­ing any code.  Addi­tion­al­ly, Google sup­ports the use of the Canon­i­cal tag as a way for web­site own­ers to eas­i­ly avoid dupli­cate con­tent issues.

Bing, on the oth­er hand, does not sup­port the Canon­i­cal tag and does not offer Canon­i­cal URL man­age­ment in their Web­mas­ter Cen­ter.  And Bing has a need for web­sites to be Canon­i­cal from a pro­gram­mat­ic standpoint.

The Bing crawler, by default, ini­tial­ly access­es a website’s root domain with­out the “www” sub domain (exam­ple: http://searchdiscovery.com).  If the serv­er sends back a 200 ok response, then Bing will reg­is­ter the domain in their index with­out the “www”.  If the non-www domain is 301 redi­rect­ed to the “www” sub domain, Bing will usu­al­ly fol­low that direc­tive with­out issue and prop­er­ly index the “www” ver­sion of the domain.  If your pre­ferred domain con­fig­u­ra­tion includes the “www” sub domain, make sure your Canon­i­cal redi­rects are in place to reflect this preference.

Page Size: Back in the ear­ly days of Google, google­bot would only crawl the first 100k of any giv­en page.  As Google has matured, page size is less of an issue for their crawler.  Bing, how­ev­er, cur­rent­ly only caches the first 100k of most web pages (although the range is more like 95k-105k).   Keep this in mind as you opti­mize your web­site for Bing.  Be sure and place the impor­tant ele­ments of your con­tent with­in the first 100k or it will not make it into the Bing cache.

301 & 302 Redi­rects:  Although Google prefers a 301 redi­rect, a 302 will not cause major issues with index­ing.  How­ev­er, if a web­site employs a 302 Canon­i­cal redi­rect instead of a 301, Bing will not fol­low the redi­rect and, in many cas­es, will refuse to index the web­site alto­geth­er.  For this rea­son, it is very impor­tant that Canon­i­cal redi­rects always use a 301.  Bing has stat­ed that, “We do not index any pages that have been 302 redi­rect­ed by design.”  In oth­er words, if the non-www ver­sion of a domain 302 redi­rects to the www ver­sion of the domain, Bing sim­ply will not index the website.

Meta Refresh­es:  Some web­sites still uti­lize a Meta Refresh to redi­rect users.  Bing and Google han­dle this tech­nique very dif­fer­ent­ly.  Google will fol­low a zero-sec­ond Meta refresh and treat it like a 301.   Bing will not.  As a mat­ter of fact, the use of a Meta Refresh will ter­mi­nate the Bing crawler from access­ing any more of the web­site being indexed.  If you want Bing to index your entire site, don’t use Meta Refreshes.

Back­link Require­ments:  Google clear­ly has the largest index.  In years past, Google made a big deal about how many web pages were in their index.  These days, they don’t real­ly brag too much about their index size as they have won that bat­tle.  Bing doesn’t even play.  Instead of seek­ing to index each and every piece of con­tent avail­able on a domain, Bing active­ly removes web pages from their index if those pages are not found to have enough link author­i­ty or val­ue to rank in their SERPs.

While Google will index every sin­gle file that it can find on a giv­en web­site (and even some that don’t exist thanks to Javascript func­tions that expose inter­nal URLs), Bing dis­cards pages that do not have rank­ing author­i­ty.  In most cas­es, in order for pages to main­tain a place in Bing’s index, they must have at least one exter­nal web­site link to them.  Accord­ing to Bing’s for­mer Pro­gram Man­ag­er Brett Yount, web­sites “need to build page-spe­cif­ic back­links before those inter­nal pages will get indexed.”  There are some excep­tions, but this is cur­rent­ly the stan­dard oper­at­ing pro­ce­dure of Bing’s index.

Bing will cer­tain­ly evolve as their search engine matures.  Check back for future posts that will dis­cuss rank­ing fac­tors for Bing as well as updates to their index­ing capabilities.

by John Sher­rod
Google +