Skillett.com

This was going to be a tutorial on getting these two things running to achieve everything I want, sadly I can’t work out how to get the last step working, which is to navigate the returned Ajax page to allow me to extract different information.

As such this is more a guide on getting the two things installed and working – if you have any more luck than I do on getting navigating Ajax working then let me know!!

XULRunner

First things first, I downloaded the Windows version of XULRunner from (look in the runtimes directory!):

http://releases.mozilla.org/pub/mozilla.org/xulrunner/releases/

(Unpacking takes a while the 8.23MB download contained 302 items totalling 18.8MB!)

Crowbar

Not such a simple download for the uninitiated. It’s not actually released, so it uses Subversion to store its files – you’ll need a Subversion client to download it. I don’t have one on the machine I’m working on, so another post will cover the in’s and out’s of downloading Crowbar with subversion.

All Downloaded and Unpacked – Onwards we go!

Back to the instructions here, which tell me once I’ve done all this to open a command prompt (thankfully a place I’m familiar with) and run:

c:\> %XULRUNNER_HOME%\xulrunner.exe --install-app %CROWBAR%\xulapp
c:\> cd %CROWBAR%\xulapp
c:\> %XULRUNNER_HOME%\xulrunner.exe application.ini

Windows Firewall blocked the program but that was kind of expected, so I unblocked that.
I now have a Crowbar window and an Error Console, apparently I can use Crowbar by visting:

http://127.0.0.1:10000/

On doing so, a nice little web window pops up similar to a web proxy, asking me what page I want to fetch.

I inserted my Ajax based page and the next thing I know, I’m being presented with all the source code for that page, which includes all the output from the Javascript that wouldn’t be there when I did a PHP curl get on the page!!

Now apparently I can run this using curl (why can I see me having to install a fair bit of software on my laptop to get this all working over there?).

OK, so all well and good we’ve fetched one page, but that page has a dropdown box on it that forces the entire page to change – how do I go about “Crowbarring” my way around that?

With little documentation I can’t see a way… Back to the drawing/scraping board?