charles web proxy review

As I said in my last post, I’ve written many a scraper using php with curl or fsockopen in my time, trying to write automated tools and scraping data. I’ve tried many tools to help me sniff the HTTP traffic so I could emulate it in PHP as quick as possible. I started off using Wireshark or Ethereal as it was called at the time which was complete overkill, mostly used for network trouble shooting and grabs all TCP/UDP packets which is information overload, all we want is HTTP data. Then I think I used the LiveHTTPheaders addon for Firefox which was pretty limited. Then a Java program called Burpsuite which was pretty powerful but I ran into a problem trying to automate myspace myads submissions, trying to figure out what HTTP the myads flash file was sending over HTTPS. I ran the gamut of every proxy tool out there until I came across Charles Web Proxy.

It’s basically the best out there. It sits as a proxy between the web and your browser, grabbing all data as it comes in. This usually causes problems with SSL but it has a custom SSL cert that you manually add to your browser that lets you log HTTPS data with no warnings. It can grab Flash traffic as it seems to work as a Windows proxy, not just a browser one. It presents HTTP data many different ways so you can understand what’s going on quicker. For example, a multipart form upload is presented as the the raw HTTP data sent, just the headers, just the cookies, the text body and all the form fields. I won’t list all the features as they’re all listed on the site. If you’re using any other tool for automation/scraping, you’re wasting time.

php curl debugging – seeing the exact http request headers sent by curl

In my many of years of php/curl use, I’ve hammered my head off my table countless times trying to debug scripts that weren’t emulating the browser like it was supposed to. This was pretty hard without seeing the exact HTTP request header sent by cURL each session, but this is possible now from PHP 5.1.3

Use the curl_getinfo php function with the CURLINFO_HEADER_OUT option but make sure to set option CURLINFO_HEADER_OUT to true as a curl option.

$ch = curl_init("http://www.google.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
$get = curl_exec($ch);
$info=curl_getinfo($ch,CURLINFO_HEADER_OUT);
var_dump($info);