In the last post How to find dead links in website, I used cURL technique to determine if a link is broken or not. In this post, I am going to write a series of articles to have a summary on the examples how to use cURL in PHP, in order to have deep understanding on PHP functions. In this article, I will use cURL to get webpage content and its header information.
Working steps of a cURL request
For any cURL request, it normally takes the following 5 steps to complete a cURL request. Depending on which kind of options you set on the 2nd step, you could accomplish various work of data transfer on the net.
- curl_init() – Initiate a cURL session
- curl_setopt() – Set options
- curl_exec()– Execute cURL Request
- curl_getinfo() – Retrieve results. curl_errno() and curl_error() is available to get error number and messages
- curl_close() Close the cURL session and frees all resources
Fetch webpage content using cURL
If your script is to fetch a webpage content, sometimes you may consider to use PHP function file_get_contents, but this function is not designed to fetch webpage content from a remote url, but more appropriate for getting content from local absolute path. Using cURL gives your ability to get more additional information, such as header, error information.
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_MAXREDIRS, 5); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $body = curl_exec($ch); $info = curl_getinfo($ch); $error = curl_errno($ch); curl_close($ch);
After execute cURL request, curl_exec() will return the result to the variable $body. Depending on the option value of CURLOPT_RETURNTRANSFER and if there happens an error, the function will return 3 different results.
- false – if there is an error taken place when executing the request. The error could be detected by the function curl_errno. If the error number is larger than 0, there’s an error happened. For various error numbers, refer to this page. 0 or CURLE_OK means no error.
- true – if the request executed without error and CURLOPT_RETURNTRANSFER is set to false or 0. When CURLOPT_RETURNTRANSFER is set to 0, the retrieved web page content is sent to the screen immediately while the curl_exec() receives a Boolean result – true.
- String of web page content – if the request executed without error and CURLOPT_RETURNTRANSFER is set to true or 1. When CURLOPT_RETURNTRANSFER is set to 1, the web page content is retrieved as string and sent as the result of curl_exec() function. This gives the opportunity to further analyze the page and get more targeted information. If you just echo curl_exec($ch), it looks no difference from when CURLOPT_RETURNTRANSFER is set to 0.
Get header information using cURL
$ch = curl_init(); $url = 'http://www.w3schools.com/'; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, True); //only header $result = curl_exec($ch); curl_close($ch); echo $result;
Here I set CURLOPT_HEADER as true to transfer header information and CURLOPT_NOBODY as true as I only need header information. The result will be a string containing only header’s information.
Here’s the header result.
HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: Public,public Content-Type: text/html Date: Sat, 07 Mar 2015 00:57:30 GMT Expires: Sat, 07 Mar 2015 01:57:31 GMT Last-Modified: Fri, 06 Mar 2015 23:41:09 GMT Server: ECS (hhp/F7B9) X-Cache: HIT X-Powered-By: ASP.NET Content-Length: 18791
What about we need both header and body information? Yes, set CURLOPT_NOBODY as false.
$ch = curl_init(); $url = 'http://www.w3schools.com/'; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, FALSE); //transfer both header and body. $result = curl_exec($ch); $info = curl_getinfo($ch); $header_size = $info['header_size']; // or $header_size = curl_getinfo($ch,CURLINFO_HEADER_SIZE); $header = substr($result, 0, $header_size); $body = substr($result, $header_size, strlen($result) ); curl_close($ch);
In the above script, curl_exec() returns the content of both header and page body, which is split using substr function. You can also get the header information using PHP function get_headers.
array get_headers ( string $url [, int $format = 0 ] )
— Fetches all the headers sent by the server in response to a HTTP request.
$headers=get_headers("http://www.w3schools.com/"); print_r($headers);
The script will output the header information in the form of PHP array. Not get_headers function follow redirects. New headers will be appended to the array if $format=0. If $format=1 each header will be an array of multiple values, one for each redirection.
Array ( [0] => HTTP/1.0 200 OK [1] => Accept-Ranges: bytes [2] => Cache-Control: Public,public [3] => Content-Type: text/html [4] => Date: Sat, 07 Mar 2015 01:08:23 GMT [5] => Expires: Sat, 07 Mar 2015 02:08:24 GMT [6] => Last-Modified: Fri, 06 Mar 2015 23:41:09 GMT [7] => Server: ECS (hhp/F7B9) [8] => Vary: Accept-Encoding [9] => X-Cache: HIT [10] => X-Powered-By: ASP.NET [11] => Content-Length: 18791 [12] => Connection: close )