parsing the yahoo search api xml using php

I didn’t even know Yahoo had released an API for their search until today, so I decided to mess around with it, anything but study accounting. Yahoo returns results as an XML file as opposed to Google with their SOAP. Wouldn’t it be so much easier if they used the same system as Google, we wouldn’t have to write new PHP scripts? Anyway, the PHP API example scripts supplied by Yahoo use the the document object model method of parsing XML in PHP which requires you to download and re-install PHP which I couldn’t be bothered doing. Also, a lot of people who don’t have a dedicated server might not be allowed to install this.

So I used the built-in XML parser functions that seems a messy method but it gets the job done. I went looking online for tutorials on parsing XML with PHP and found a great one on SitePoint by Kevin Yank. There’s no point in me copying that article. Read it. But the code provided didn’t work with the Yahoo XML because there’s two URL values, one the actual result URL and the other the Yahoo cache URL. So basically you just add a line of code in the startElement function (remember, you should have read the sitepoint article) that stops reading information when you reach the cache bit:

if ($tagName == "CACHE") {
$this->insideitem = false;

Here’s the full source of my adapted code that searches Yahoo for two Madonna results and then just spits out the two links.
Download Yahoo API PHP Script

Google Bowling

After Google’s recent bad data push that resulted in 5 billion spam pages in the index, a lot of webmasters seem to be accepting the fact that Google is bust and seem to be looking to the dark side of SEO, including me. I registered at syndk8 and have been reading up on all the methods used.

Did you know anybody could get your site banned through a method called Googlebowling? This is where you somehow insert HTML onto the target site that you know will activate Google spam filter, like stuffing keywords in h1 tags. You could this by enterting the HTML in a search box on the target site and then copying the resulting URL.

Then get this URL indexed in Google somehow and there’s a good chance the site will be banned. I can’t see a easy way Google could stamp out this practice without introducing a human element somewhere in the algo. Why bother with white hat when this could happen to your site at anytime?

rooney off!

rooney’s just been sent off in the england v portugal for what it looks like, absolutely nothing. apparently he stamped on yer mans balls but i just think he was getting his leg free. not a red card offence either way anyway. and i’m not an england supporter, don’t care who wins really. but it’s like the first lions test against the all blacks when brian o’driscoll was spear tackled, a talented player is gone so the match won’t be as good. absolutely raging.

Regular expressions

Regular expressions are the most powerful and complicated things you can get, so I always get mine wrong when I first code them. Which is where you need a good online tester. I used to use Sam Fullman’s one but now it seems to be bust. So I basically cloned it. It’s available here if anybody on the interweb wants to use it. I hope to have write a few more tools for this site. Have to fill it with something.