In the last post How to find dead links in website, I used cURL technique to determine if a link is broken or not. In this post, I am going to write a series of articles to have a summary on the examples how to use cURL in PHP, in order to have deep understanding on PHP functions. In this article, I will use cURL to get webpage content and its header information.






Working steps of a cURL request

For any cURL request, it normally takes the following 5 steps to complete a cURL request. Depending on which kind of options you set on the 2nd step, you could accomplish various work of data transfer on the net.

  • curl_init() – Initiate a cURL session
  • curl_setopt() – Set options
  • curl_exec()– Execute cURL Request
  • curl_getinfo() – Retrieve results. curl_errno() and curl_error() is available to get error number and messages
  • curl_close() Close the cURL session and frees all resources

Fetch webpage content using cURL

If your script is to fetch a webpage content, sometimes you may consider to use PHP function file_get_contents, but this function is not designed to fetch webpage content from a remote url, but more appropriate for getting content from local absolute path. Using cURL gives your ability to get more additional information, such as header, error information.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);   
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        
$body = curl_exec($ch);
$info  = curl_getinfo($ch); 
$error = curl_errno($ch);  
curl_close($ch);  

After execute cURL request, curl_exec() will return the result to the variable $body. Depending on the option value of CURLOPT_RETURNTRANSFER and if there happens an error, the function will return 3 different results.

  • false – if there is an error taken place when executing the request. The error could be detected by the function curl_errno. If the error number is larger than 0, there’s an error happened. For various error numbers, refer to this page. 0 or CURLE_OK means no error.
  • true – if the request executed without error and CURLOPT_RETURNTRANSFER is set to false or 0. When CURLOPT_RETURNTRANSFER is set to 0, the retrieved web page content is sent to the screen immediately while the curl_exec() receives a Boolean result – true.
  • String of web page content – if the request executed without error and CURLOPT_RETURNTRANSFER is set to true or 1. When CURLOPT_RETURNTRANSFER is set to 1, the web page content is retrieved as string and sent as the result of curl_exec() function. This gives the opportunity to further analyze the page and get more targeted information. If you just echo curl_exec($ch), it looks no difference from when CURLOPT_RETURNTRANSFER is set to 0.

Get header information using cURL

$ch = curl_init();
$url = 'http://www.w3schools.com/';
curl_setopt($ch, CURLOPT_URL, $url);  
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, True);   //only header     
$result = curl_exec($ch);  
curl_close($ch);  
echo $result;

Here I set CURLOPT_HEADER as true to transfer header information and CURLOPT_NOBODY as true as I only need header information. The result will be a string containing only header’s information.

Here’s the header result.

HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: Public,public Content-Type: text/html Date: Sat, 07 Mar 2015 00:57:30 GMT Expires: Sat, 07 Mar 2015 01:57:31 GMT Last-Modified: Fri, 06 Mar 2015 23:41:09 GMT Server: ECS (hhp/F7B9) X-Cache: HIT X-Powered-By: ASP.NET Content-Length: 18791 

What about we need both header and body information? Yes, set CURLOPT_NOBODY as false.

$ch = curl_init();
$url = 'http://www.w3schools.com/';
curl_setopt($ch, CURLOPT_URL, $url);  
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, FALSE);   //transfer both header and body.   
$result = curl_exec($ch);
$info  = curl_getinfo($ch);  
$header_size = $info['header_size'];
// or $header_size = curl_getinfo($ch,CURLINFO_HEADER_SIZE);
$header = substr($result, 0, $header_size);
$body = substr($result, $header_size, strlen($result) );
curl_close($ch);  

In the above script, curl_exec() returns the content of both header and page body, which is split using substr function. You can also get the header information using PHP function get_headers.

array get_headers ( string $url [, int $format = 0 ] )

— Fetches all the headers sent by the server in response to a HTTP request.

$headers=get_headers("http://www.w3schools.com/");
print_r($headers);

The script will output the header information in the form of PHP array. Not get_headers function follow redirects. New headers will be appended to the array if $format=0. If $format=1 each header will be an array of multiple values, one for each redirection.

Array ( 
[0] => HTTP/1.0 200 OK 
[1] => Accept-Ranges: bytes 
[2] => Cache-Control: Public,public 
[3] => Content-Type: text/html 
[4] => Date: Sat, 07 Mar 2015 01:08:23 GMT 
[5] => Expires: Sat, 07 Mar 2015 02:08:24 GMT 
[6] => Last-Modified: Fri, 06 Mar 2015 23:41:09 GMT 
[7] => Server: ECS (hhp/F7B9) 
[8] => Vary: Accept-Encoding 
[9] => X-Cache: HIT 
[10] => X-Powered-By: ASP.NET 
[11] => Content-Length: 18791 
[12] => Connection: close )