Kamis, 31 Agustus 2006

How search results may differ based on accented characters and interface languages

When a searcher enters a query that includes a word with accented characters, our algorithms consider web pages that contain versions of that word both with and without the accent. For instance, if a searcher enters [México], we'll return results for pages about both "Mexico" and "México."



Conversely, if a searcher enters a query without using accented characters, but a word in that query could be spelled with them, our algorithms consider web pages with both the accented and non-accented versions of the word. So if a searcher enters [Mexico], we'll return results for pages about both "Mexico" and "México."



How the searcher's interface language comes into play
The searcher's interface language is taken into account during this process. For instance, the set of accented characters that are treated as equivalent to non-accented characters varies based on the searcher's interface language, as language-level rules for accenting differ.

Also, documents in the chosen interface language tend to be considered more relevant. If a searcher's interface language is English, our algorithms assume that the queries are in English and that the searcher prefers English language documents returned.

This means that the search results for the same query can vary depending on the language interface of the searcher. They can also vary depending on the location of the searcher (which is based on IP address) and if the searcher chooses to see results only from the specified language. If the searcher has personalized search enabled, that will also influence the search results.

The example below illustrates the results returned when a searcher queries [Mexico] with the interface language set to Spanish.



Note that when the interface language is set to Spanish, more results with accented characters are returned, even though the query didn't include the accented character.

How to restrict search results
To obtain search results for only a specific version of the word (with or without accented characters), you can place a + before the word. For instance, the search [+Mexico] returns only pages about "Mexico" (and not "México"). The search [+México] returns only pages about "México" and not "Mexico." Note that you may see some search results that don't appear to use the version of word you specified in your query, but that version of the word may appear within the content of the page or in anchor text to the page, rather than in the title or description listed in the results. (You can see the top anchor text used to link to your site by choosing Statistics > Page analysis in webmaster tools.)

The example below illustrates the results returned when a searcher queries [+Mexico].

Rabu, 30 Agustus 2006

Listen in - Matt Cutts and Vanessa Fox talk search

Tune into Webmaster Radio Thursday, August 31 at 1 pm Pacific to hear Matt Cutts and me take over GoodKarma while GoodROI (Greg Niland), the program's regular host, is on vacation. We'll talk about a little of everything, including giving Danny Sullivan career advice (if he ever decides to get out of search -- which we hope he never does -- he can always pursue a career in song), Google's handling of words with accented characters, display date changes in Google cached pages, and the not-so-nice side of SEO.

And if you missed last week's show, check out the podcast. Danny Sullivan and I explained that everything you need to know about search marketing, you can learn by watching Buffy the Vampire Slayer. If you heard the show and are worried about Danny's favorite espresso machine shop, don't be. They're doing OK after all.

Rabu, 23 Agustus 2006

System maintenance

We're currently doing routine system maintenance, and some data may not be available in your webmaster tools account today. We're working as quickly as possible, and all information should be available again by Thursday, 8/24. Thank you for your patience in the meantime.

Update: We're still finishing some things up, so thanks for bearing with us. Note that the preferred domain feature is currently unavailable, but will available as soon as our maintenance is complete.

Sabtu, 19 Agustus 2006

All About Googlebot

I've seen a lot of questions lately about robots.txt files and Googlebot's behavior. Last week at SES, I spoke on a new panel called the Bot Obedience course. And a few days ago, some other Googlers and I fielded questions on the WebmasterWorld forums. Here are some of the questions we got:

If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.

What should I do if Googlebot is crawling my site too much?
You can contact us -- we'll work with you to make sure we don't overwhelm your server's bandwidth. We're experimenting with a feature in our webmaster tools for you to provide input on your crawl rate, and have gotten great feedback so far, so we hope to offer it to everyone soon.

Is it better to use the meta robots tag or a robots.txt file?
Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).

If my robots.txt file contains a directive for all bots as well as a specific directive for Googlebot, how does Googlebot interpret the line addressed to all bots?
If your robots.txt file contains a generic or weak directive plus a directive specifically for Googlebot, Googlebot obeys the lines specifically directed at it.

For instance, for this robots.txt file:
User-agent: *
Disallow: /

User-agent: Googlebot
Disallow: /cgi-bin/
Googlebot will crawl everything in the site other than pages in the cgi-bin directory.

For this robots.txt file:
User-agent: *
Disallow: /
Googlebot won't crawl any pages of the site.

If you're not sure how Googlebot will interpret your robots.txt file, you can use our robots.txt analysis tool to test it. You can also test how Googlebot will interpret changes to the file.

For complete information on how Googlebot and Google's other user agents treat robots.txt files, see our webmaster help center.

Rabu, 16 Agustus 2006

Back from SES San Jose

Thanks to everyone who stopped by to say hi at the Search Engine Strategies conference in San Jose last week!

I had a great time meeting people and talking about our new webmaster tools. I got to hear a lot of feedback about what webmasters liked, didn't like, and wanted to see in our Webmaster Central site. For those of you who couldn't make it or didn't find me at the conference, please feel free to post your comments and suggestions in our discussion group. I do want to hear about what you don't understand or what you want changed so I can make our webmaster tools as useful as possible.

Some of the highlights from the week:

This year, Danny Sullivan invited some of us from the team to "chat and chew" during a lunch hour panel discussion. Anyone interested in hearing about Google's webmaster tools was welcome to come and many did -- thanks for joining us! I loved showing off our product, answering questions, and getting feedback about what to work on next. Many people had already tried Sitemaps, but hadn't seen the new features like Preferred domain and full crawling errors.

One of the questions I heard more than once at the lunch was about how big a Sitemap can be, and how to use Sitemaps with very large websites. Since Google can handle all of your URLs, the goal of Sitemaps is to tell us about all of them. A Sitemap file can contain up to 50,000 URLs and should be no larger than 10MB when uncompressed. But if you have more URLs than this, simply break them up into several smaller Sitemaps and tell us about them all. You can create a Sitemap Index file, which is just a list of all your Sitemaps, to make managing several Sitemaps a little easier.

While hanging out at the Google booth I got another interesting question: One site owner told me that his site is listed in Google, but its description in the search results wasn't exactly what he wanted. (We were using the description of his site listed in the Open Directory Project.) He asked how to remove this description from Google's search results. Vanessa Fox knew the answer! To specifically prevent Google from using the Open Directory for a page's title and description, use the following meta tag:
<meta name="GOOGLEBOT" content="NOODP">

My favorite panel of the week was definitely Pimp My Site. The whole group was dressed to match the theme as they gave some great advice to webmasters. Dax Herrera, the coolest "pimp" up there (and a fantastic piano player), mentioned that a lot of sites don't explain their product clearly on each page. For instance, when pimping Flutter Fetti, there were many instances when all the site had to do was add the word "confetti" to the product description to make it clear to search engines and to users reaching the page exactly what a Flutter Fetti stick is.

Another site pimped was a Yahoo! Stores web site. Someone from the audience asked if the webmaster could set up a Google Sitemap for their store. As Rob Snell pointed out, it's very simple: Yahoo! Stores will create a Google Sitemap for your website automatically, and even verify your ownership of the site in our webmaster tools.

Finally, if you didn't attend the Google dance, you missed out! There were Googlers dancing, eating, and having a great time with all the conference attendees. Vanessa Fox represented my team at the Meet the Google Engineers hour that we held during the dance, and I heard Matt Cutts even starred in a music video! While demo-ing Webmaster Central over in the labs area, someone asked me about the ability to share site information across multiple accounts. We associate your site verification with your Google Account, and allow multiple accounts to verify ownership of a site independently. Each account has its own verification file or meta tag, and you can remove them at any time and re-verify your site to revoke verification of a user. This means that your marketing person, your techie, and your SEO consultant can each verify the same site with their own Google Account. And if you start managing a site that someone else used to manage, all you have to do is add that site to your account and verify site ownership. You don't need to transfer the account information from the person who previously managed it.

Thanks to everyone who visited and gave us feedback. It was great to meet you!

Senin, 07 Agustus 2006

Chat with us in person at the Search Engine Strategies conference

Got a burning question about the new Webmaster Central? Eager to give feedback about our Webmaster tools?

Of course, we always appreciate hearing from you in our recently expanded Webmaster Help discussion group. But if you're one of thousands of Webmasters attending the Search Engine Strategies conference ("SES") this week in San Jose, California, we'd particularly enjoy meeting you in person!

Amanda, Vanessa, Matt, and I (along with many other Googlers) will be speaking at various sessions throughout the conference, as well as hanging out in the exhibit hall. On Tuesday and Wednesday, Amanda will also be spending some quality time in the Google booth. On Tuesday night at the Googleplex, a huge mass of Googlers (including all of us) will be demo'ing products and services, answering questions, and enjoying the food, libations, and live music with a broad array of guests from SES at the annual Google Dance.

Interested in learning more details? Check out the post on the main Google blog.

Jumat, 04 Agustus 2006

More webmaster tools

With our latest release, we've done more than just change our name --we've listened to you and added some features and enhanced others as a result.

Telling us your preferred domain URL format
Some webmasters want their sites indexed under the www version of their domain; others want their sites indexed without the www. Which do you prefer? Now you can tell us and we'll do our best to do what you like when crawling and indexing your site. Note that it might take some time for changes to be reflected in our index, but if you notice that your site is currently indexed using both versions of your domain, tell us your preference.

Downloading query stats for all subfolders
Do you like seeing the top queries that returned your site? Now you can download a CSV file that shows you the top queries for each of your subfolders in the results.

Seeing revamped crawl errors
Now you can see at a glance the types of errors we get when crawling your site. You can see a table of the errors on the summary page, with counts for each error type. On the crawl errors page, you can still see the number of errors for type, as well as filter errors by date.

Managing verification
If somebody from your team no longer has write access to a site and should no longer be a verified owner of it, you can remove the verification file or meta tag for that person. When we periodically check verification, that person's account will no longer be verified for the site. We've added the ability to let you request that check so that you don't have to wait for our periodic process. Simply click the "Manage site verification" link, make note of the verification files and meta tags that may exist for the site, remove any that are no longer valid, and click the "Reverify all site owners" button. We'll check all accounts that are verified for the site and only leave verification in place for accounts for which we find a verification file or meta tag.

Other enhancements
You'll find a number of other smaller enhancements throughout the webmaster tools, all based on your feedback. Thanks as always for your input -- please let us know what you think in our newly revamped Google Group.