Rabu, 25 Oktober 2006

Update to our webmaster guidelines

As the web continues to change and evolve, our algorithms change right along with it. Recently, as a result of one of those algorithmic changes, we've modified our webmaster guidelines. Previously, these stated:


Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.

However, we've recently removed that technical guideline, and now index URLs that contain that parameter. So if your site uses a dynamic structure that generates it, don't worry about rewriting it -- we'll accept it just fine as is. Keep in mind, however, that dynamic URLs with a large number of parameters may be problematic for search engine crawlers in general, so rewriting dynamic URLs into user-friendly versions is always a good practice when that option is available to you. If you can, keeping the number of URL parameters to one or two may make it more likely that search engines will crawl your dynamic urls.

Kamis, 19 Oktober 2006

Googlebot activity reports

The webmaster tools team has a very exciting mission: we dig into our logs, find as much useful information as possible, and pass it on to you, the webmasters. Our reward is that you more easily understand what Google sees, and why some pages don't make it to the index.

The latest batch of information that we've put together for you is the amount of traffic between Google and a given site. We show you the number of requests, number of kilobytes (yes, yes, I know that tech-savvy webmasters can usually dig this out, but our new charts make it really easy to see at a glance), and the average document download time. You can see this information in chart form, as well as in hard numbers (the maximum, minimum, and average).

For instance, here's the number of pages Googlebot has crawled in the Webmaster Central blog over the last 90 days. The maximum number of pages Googlebot has crawled in one day is 24 and the minimum is 2. That makes sense, because the blog was launched less than 90 days ago, and the chart shows that the number of pages crawled per day has increased over time. The number of pages crawled is sometimes more than the total number of pages in the site -- especially if the same page can be accessed via several URLs. So http://googlewebmastercentral.blogspot.com/2006/10/learn-more-about-googlebots-crawl-of.html and http://googlewebmastercentral.blogspot.com/2006/10/learn-more-about-googlebots-crawl-of.html#links are different, but point to the same page (the second points to an anchor within the page).


And here's the average number of kilobytes downloaded from this blog each day. As you can see, as the site has grown over the last two and a half months, the number of average kilobytes downloaded has increased as well.


The first two reports can help you diagnose the impact that changes in your site may have on its coverage. If you overhaul your site and dramatically reduce the number of pages, you'll likely notice a drop in the number of pages that Googlebot accesses.

The average document download time can help pinpoint subtle networking problems. If the average time spikes, you might have network slowdowns or bottlenecks that you should investigate. Here's the report for this blog that shows that we did have a short spike in early September (the maximum time was 1057 ms), but it quickly went back to a normal level, so things now look OK.

In general, the load time of a page doesn't affect its ranking, but we wanted to give this info because it can help you spot problems. We hope you will find this data as useful as we do!

Selasa, 17 Oktober 2006

Learn more about Googlebot's crawl of your site and more!

We've added a few new features to webmaster tools and invite you to check them out.

Googlebot activity reports
Check out these cool charts! We show you the number of pages Googlebot's crawled from your site per day, the number of kilobytes of data Googlebot's downloaded per day, and the average time it took Googlebot to download pages. Webmaster tools show each of these for the last 90 days. Stay tuned for more information about this data and how you can use it to pinpoint issues with your site.

Crawl rate control
Googlebot uses sophisticated algorithms that determine how much to crawl each site. Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server's bandwidth.

We've been conducting a limited test of a new feature that enables you to provide us information about how we crawl your site. Today, we're making this tool available to everyone. You can access this tool from the Diagnostic tab. If you'd like Googlebot to slow down the crawl of your site, simply choose the Slower option.

If we feel your server could handle the additional bandwidth, and we can crawl your site more, we'll let you know and offer the option for a faster crawl.

If you request a changed crawl rate, this change will last for 90 days. If you liked the changed rate, you can simply return to webmaster tools and make the change again.


Enhanced image search
You can now opt into enhanced image search for the images on your site, which enables our tools such as Google Image Labeler to associate the images included in your site with labels that will improve indexing and search quality of those images. After you've opted in, you can opt out at any time.

Number of URLs submitted
Recently at SES San Jose, a webmaster asked me if we could show the number of URLs we find in a Sitemap. He said that he generates his Sitemaps automatically and he'd like confirmation that the number he thinks he generated is the same number we received. We thought this was a great idea. Simply access the Sitemaps tab to see the number of URLs we found in each Sitemap you've submitted.

As always, we hope you find these updates useful and look forward to hearing what you think.

Rabu, 11 Oktober 2006

Got a website? Get gadgets.

Google Gadgets are miniature-sized devices that offer cool and dynamic content -- they can be games, news clips, weather reports, maps, or most anything you can dream up. They've been around for a while, but their reach got a lot broader last week when we made it possible for anyone to add gadgets to their own webpages. Here's an example of a flight status tracker, for instance, that can be placed on any page on the web for free.

Anyone can search for gadgets to add to their own webpage for free in our directory of gadgets for your webpage. To put a gadget on your page, just pick the gadget you like, set your preferences, and copy-and-paste the HTML that is generated for you onto your own page.

Creating gadgets for others isn't hard, either, and it can be a great way to get your content in front of people while they're visiting Google or other sites. Here are a few suggestions for distributing your own content on the Google homepage or other pages across the web:

* Create a Google Gadget for distribution across the web. Gadgets can be most anything, from simple HTML to complex applications. It’s easy to experiment with gadgets – anyone with even a little bit of web design experience can make a simple one (even me!), and more advanced programmers can create really snazzy, complex ones. But remember, it’s also quick and easy for people to delete gadgets or add new ones too their own pages. To help you make sure your gadget will be popular across the web, we provide a few guidelines you can use to create gadgets. The more often folks find your content to be useful, the longer they'll keep your gadget on their pages, and the more often they’ll visit your site.

* If your website has a feed, visitors can put snippets of your content on their own Google homepages quickly and easily, and you don't even need to develop a gadget. However, you will be able to customize their experience much more fully with a gadget than with a feed.

* By putting the “Add to Google” button in a prominent spot on your site, you can increase the reach of your content, because visitors who click to add your gadget or feed to Google can see your content each time they visit the Google homepage. Promoting your own gadget or feed can also increase its popularity, which contributes to a higher ranking in the directory of gadgets for the Google personalized homepage.

Senin, 09 Oktober 2006

Multiple Sitemaps in the same directory

We've gotten a few questions about whether you can put multiple Sitemaps in the same directory. Yes, you can!

You might want to have multiple Sitemap files in a single directory for a number of reasons. For instance, if you have an auction site, you might want to have a daily Sitemap with new auction offers and a weekly Sitemap with less time-sensitive URLs. Or you could generate a new Sitemap every day with new offers, so that the list of Sitemaps grows over time. Either of these solutions works just fine.

Or, here's another sample scenario: Suppose you're a provider that supports multiple web shops, and they share a similar URL structure differentiated by a parameter. For example:

http://example.com/stores/home?id=1
http://example.com/stores/home?id=2
http://example.com/stores/home?id=3

Since they're all in the same directory, it's fine by our rules to put the URLs for all of the stores into a single Sitemap, under http://example.com/ or http://example.com/stores/. However, some webmasters may prefer to have separate Sitemaps for each store, such as:

http://example.com/stores/store1_sitemap.xml
http://example.com/stores/store2_sitemap.xml
http://example.com/stores/store3_sitemap.xml

As long as all URLs listed in the Sitemap are at the same location as the Sitemap or in a sub directory (in the above example http://example.com/stores/ or perhaps http://example.com/stores/catalog) it's fine for multiple Sitemaps to live in the same directory (as many as you want!). The important thing is that Sitemaps not contain URLs from parent directories or completely different directories -- if that happens, we can't be sure that the submitter controls the URL's directory, so we can't trust the metadata.

The above Sitemaps could also be collected into a single Sitemap index file and easily be submitted via Google webmaster tools. For example, you could create http://example.com/stores/sitemap_index.xml as follows:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>http://example.com/stores/store1_sitemap.xml</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store2_sitemap.xml</loc>
<lastmod>2006-10-01</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/stores/store3_sitemap.xml</loc>
<lastmod>2006-10-05</lastmod>
</sitemap>
</sitemapindex>

Then simply add the index file to your account, and you'll be able to see any errors for each of the child Sitemaps.

If each store includes more than 50,000 URLs (the maximum number for a single Sitemap), you would need to have multiple Sitemaps for each store. In that case, you may want to create a Sitemap index file for each store that lists the Sitemaps for that store. For instance:

http://example.com/stores/store1_sitemapindex.xml
http://example.com/stores/store2_sitemapindex.xml
http://example.com/stores/store3_sitemapindex.xml

Since Sitemap index files can't contain other index files, you would need to submit each Sitemap index file to your account separately.

Whether you list all URLs in a single Sitemap or in multiple Sitemaps (in the same directory of different directories) is simply based on what's easiest for you to maintain. We treat the URLs equally for each of these methods of organization.