XULRunner and Crowbar – Crawling of sorts?
This was going to be a tutorial on getting these two things running to achieve everything I want, sadly I can’t work out how to get the last step working, which is to navigate the returned Ajax page to allow me to extract different information.
As such this is more a guide on getting the two things installed and working – if you have any more luck than I do on getting navigating Ajax working then let me know!!
XULRunner
First things first, I downloaded the Windows version of XULRunner from (look in the runtimes directory!):
http://releases.mozilla.org/pub/mozilla.org/xulrunner/releases/
(Unpacking takes a while the 8.23MB download contained 302 items totalling 18.8MB!)
Crowbar
Not such a simple download for the uninitiated. It’s not actually released, so it uses Subversion to store its files – you’ll need a Subversion client to download it. I don’t have one on the machine I’m working on, so another post will cover the in’s and out’s of downloading Crowbar with subversion.
All Downloaded and Unpacked – Onwards we go!
Back to the instructions here, which tell me once I’ve done all this to open a command prompt (thankfully a place I’m familiar with) and run:
c:\> %XULRUNNER_HOME%\xulrunner.exe --install-app %CROWBAR%\xulapp c:\> cd %CROWBAR%\xulapp c:\> %XULRUNNER_HOME%\xulrunner.exe application.ini
Windows Firewall blocked the program but that was kind of expected, so I unblocked that.
I now have a Crowbar window and an Error Console, apparently I can use Crowbar by visting:
http://127.0.0.1:10000/
On doing so, a nice little web window pops up similar to a web proxy, asking me what page I want to fetch.
I inserted my Ajax based page and the next thing I know, I’m being presented with all the source code for that page, which includes all the output from the Javascript that wouldn’t be there when I did a PHP curl get on the page!!
Now apparently I can run this using curl (why can I see me having to install a fair bit of software on my laptop to get this all working over there?).
OK, so all well and good we’ve fetched one page, but that page has a dropdown box on it that forces the entire page to change – how do I go about “Crowbarring” my way around that?
With little documentation I can’t see a way… Back to the drawing/scraping board?
{ 3 comments… read them below or add one }
[...] post is just a reference point for another post, a Subversion client is needed for downloading Crowbar, so I downloaded TortoiseSVN available [...]
FireWatir might be a better tool for you:
http://wiki.openqa.org/display/WTR/FireWatir
Dan
I eventually resorted to outsourcing it via rentacoder to an absolutely excellent coder in the US.
He provided exactly what I needed in PHP to the spec I provided with no extensions or the like! Was really pleased with his work – I think he was kind of surprised when I had no complaints or changes that needed making as well!
Leave a Comment
PLEASE: Take note of the commenting policy, using keywords instead of your name will only result in your comment being deleted or the link removed.
Use your name and actually get a 'dofollow' link to your site!