We use cookies to keep our site relevant and easy to use, your continued use of this site is consent that we may set several cookies (see our Privacy & Cookie Policy), click to always allow cookies from our site (and not see this notifcation on your next visit) or read more.Allow Cookies

EU legislation requires that all websites clearly specify if cookies are being used and their purpose, You can read more about how we use cookies (and which cookies we use) in our Privacy and Cookie Policy.

You will see this notification the first time you visit our website unless you accept cookies (in which case we'll set a cookie to remember thay you're happy for us to to set cookies!).

XULRunner and Crowbar – Crawling of sorts?

This was going to be a tutorial on getting these two things running to achieve everything I want, sadly I can’t work out how to get the last step working, which is to navigate the returned Ajax page to allow me to extract different information.

As such this is more a guide on getting the two things installed and working – if you have any more luck than I do on getting navigating Ajax working then let me know!!

XULRunner

First things first, I downloaded the Windows version of XULRunner from (look in the runtimes directory!):

http://releases.mozilla.org/pub/mozilla.org/xulrunner/releases/

(Unpacking takes a while the 8.23MB download contained 302 items totalling 18.8MB!)

Crowbar

Not such a simple download for the uninitiated. It’s not actually released, so it uses Subversion to store its files – you’ll need a Subversion client to download it. I don’t have one on the machine I’m working on, so another post will cover the in’s and out’s of downloading Crowbar with subversion.

All Downloaded and Unpacked – Onwards we go!

Back to the instructions here, which tell me once I’ve done all this to open a command prompt (thankfully a place I’m familiar with) and run:

c:\> %XULRUNNER_HOME%\xulrunner.exe --install-app %CROWBAR%\xulapp
c:\> cd %CROWBAR%\xulapp
c:\> %XULRUNNER_HOME%\xulrunner.exe application.ini

Windows Firewall blocked the program but that was kind of expected, so I unblocked that.
I now have a Crowbar window and an Error Console, apparently I can use Crowbar by visting:

http://127.0.0.1:10000/

On doing so, a nice little web window pops up similar to a web proxy, asking me what page I want to fetch.

I inserted my Ajax based page and the next thing I know, I’m being presented with all the source code for that page, which includes all the output from the Javascript that wouldn’t be there when I did a PHP curl get on the page!!

Now apparently I can run this using curl (why can I see me having to install a fair bit of software on my laptop to get this all working over there?).

OK, so all well and good we’ve fetched one page, but that page has a dropdown box on it that forces the entire page to change – how do I go about “Crowbarring” my way around that?

With little documentation I can’t see a way… Back to the drawing/scraping board?

This entry was posted on Sunday, November 30th, 2008 at 10:50 am and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed.