How to find dead links in website

If your website has to credit a lot of external links or refer to external resources, it’s not surprised that many of the links would not be reachable overtime regardless the site is big or small. In order to maintain a healthy status of your website, it’s important you know how to find dead links in your website as the dead links may create additional unnecessary burden for search engines.

What are dead links?

In short, a dead link is a url that returns 404 http status code or server not found when crawled. Links that take a long time to load and ended up at maximum running time (timeout) are also considered not healthy. This might be caused by server problems or redundant scripts of the website.

In this post I will introduce several method to locate any dead links on your website. Certainly, not complete ones, but it’s enough we have a overview of ways we may employ in different situations.

Using cURL

cURL is a library that lets you transfer data through a wide variety of protocols. Here we can use cURL to make HTTP request using PHP to determine if the link is dead or not or timeout.

$links_no_server = array();
$links_404 = array();
$links_timeout = array();  

//Initializes a new cURL session and return a cURL handle 
$ch = curl_init();
// set URL
curl_setopt($ch, CURLOPT_URL, $url);
//Return a string instead of outputting it to the screen instantly
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// take me anywhere 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
// no need to retrieve page body
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
//set time out 10 seconds
curl_setopt($ch, CURLOPT_TIMEOUT,  10);   
// execute the cURL request
curl_exec($ch);
$error = curl_errno($ch);
$info  = curl_getinfo($ch) 
// server not found?
if (!$info['http_code'] && $error==6)
{
  $links_no_server[]= $url;    
// link not found 404?
}
else if ($info['http_code'] == 404 && $error==0)
{
  $links_404[] = $url;				
// timeout?
}
else if ($info['http_code'] == 200) && $error==28)
{
  $links_timeout[] = $url;
}
curl_close($ch);

In the above php codes, first we defined 3 arrays to store links of 3 scenarios – $links_no_server, $links_404, $links_timeout. Then we initialize a cURL resource to retrieve status of the target url. The last we use returned http status and curl_errno to determine the status of the link. For detailed curl_errno returns’ definitions, refer to this page.

W3C Link Checker

If you just have several links to check without the need for coding, W3C Link Checker is good online tool for you. Just copy paste the target url in the input box. I have a few of options to set. After finished, click check and you will get the result of links status of the url.

Dead Link Checker

Dead Link Checker is a handy online free tool to check dead links of the whole website. Just the input the top level url, you will get the list of dead links as well as their source links.

WordPress Plugin Broken Link Checker

If you have a wordpress blog, you are lucky as you can leverage its powerful plugin system and plugins. Broken Link Checker is one of the very useful ones. The plugin check the broken links in the site level and returns the result in a very user friendly way. You can also set time span for the plugin to search regularly. It’s one of must-have plugin for your wordpress blog.

OK, this is all I researched these days. If you may have any good idea on how to detect dead links, please don’t hesitate to share with us in the comment box.

Leave a Reply

Your email address will not be published. Required fields are marked *