Website Crawler

website crawler tool

Website Crawler

The AuthorityLabs Website Crawler will help you quickly check through the pages of a small website to find technical issues that might be interfering with search engine crawlers (and therefore your site’s rankings).

How to Use

Once you have the link, copy the spreadsheet into your Google Sheets account – File > Make a copy.

Once you have the link, copy the spreadsheet into your Google Sheets account – File > Make a copy.

Input your target domain and the start page into the first two rows of the Links tab.

Click the Crawler > Start menu (up to the right there!) and then watch your results roll in.

To view a live summary of your crawl update, complete with fancy charts, navigate to the Crawl Status tab.

Website Crawler Tips

When we initiate the crawl we pull down all the links on the start page and queue to crawl all of the links that match what you’ve entered as the target domain. Links to other domains will not be crawled.

To crawl subdomains, enter sub.domain.com and we’ll limit the crawl to links on that subdomain.

You can enter domain.com/folder and we’ll only crawl links in that directory. Possibly useful for something like a blog at domain.com/blog or a section of an ecommerce site at domain.com/sweaters.

Some sites don’t like to be crawled. Those will probably show up as a lot of 403 response codes. To see this in action try something like craigslist.org.

You don’t have to keep the document open to keep the crawl running. You will get an email when the crawl is done.

You can run multiple crawls at once if you make another copy of the document.

Caveats

The target domain should be a portion of the starting URL (e.g. domain.com of http://www.domain.com/). Google limits the amount of pages you can crawl in a day based on both the number of URLs crawled and the time it takes to crawl them. The limits vary based on the type of account you have and how long those accounts have been active. 
 
In general, we’ve found that we can do roughly 5,000 URLs in a day if they take ~2 seconds on average to crawl but your mileage may vary! 
 
We’ve artificially limited each crawl to 1,000 pages. Once the crawl hits 1,000 pages it will stop.

Copy link
Powered by Social Snap