whether there were indexation caps that Google used, meaning does Google put a limit as to how many pages of a site they are willing to index?
The answer was no, Google does not simply view a site and make a determination as to how much of a site they are willing to index. However there are some factors that Google looks at when determining crawling and the major determining factor is PageRank. PageRank is a formula that Google put together to determine the importance of a page and so crawling by Googlebot can be proportional to PageRank.
Generally the homepage of a site will have a higher amount of PageRank and then the PageRank declines as you navigate further into the site. This is another reason that you would want to link to high profile pages from the home page and keep as much of the content as close to the root as possible. You do not want to make your users have to click 7 times to get to some content and you will not want the search engines have to do that as well.
The PageRank scale goes from 0-10 with 10 being the highest. The higher the PageRank a webpage has the more weight Googelbot puts on crawling that page.
One of the biggest problems that Google faces is that there is a lot of junk out there on the web and so they have to create systems to help determine what is junk and secondly how to set a priority schedule on how to crawl junk content. One of the biggest indicators that helps a search engine determine what is junk content is Duplicate Content.
Duplicate content is a huge thing online and can even happen to the best of sites, especially blogs where there can be category pages and then the actual post page itself. This is why the canonical tags are so important to a search engine because that tag can quickly help them understand the original content.
Low Quality content can also be an indicator to Googlebot in processing the importance of what to crawl. Websites that have very low quality content or have a lot of affiliated type of content are all good signs that Googlebot does not need to spend to much time crawling the content. This is why it is always important to have good content that actually adds value to the web.
All images provided by SEOMoz.
Leave a Reply
Recently Eric Enge of StoneTemple interviewed Google Engineer Matt Cutts and asked him several questions in regards to what Google looks at when they are crawling pages and if there are limits as to what they will crawl. One major question that was asked was