<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: XULRunner and Crowbar &#8211; Crawling of sorts?</title>
	<atom:link href="http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=xulrunner-and-crowbar-crawling-of-sorts</link>
	<description>Keiron&#039;s daily take on life, the internet and the world around us!</description>
	<lastBuildDate>Thu, 02 Sep 2010 16:32:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
	<item>
		<title>By: John</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-28363</link>
		<dc:creator>John</dc:creator>
		<pubDate>Thu, 02 Sep 2010 14:19:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-28363</guid>
		<description>Excellent! I&#039;m glad to hear you got it sorted in the end.
Crowbar is an interesting program... you&#039;d think that reading/interpreting JavaScript would be something that all webspiders would be able to do - yet NONE of them do it for the simple reason that understanding JavaScript and the DOM model requires, essentially, a full browser. Building a full browser into a spidering engine is overkill for just this little bit of added functionality - but when you need to scrape JS, you need to scrape JS!
As a result, the only program that will allow you to do headless JS processing is XULRunner/Crowbar.... but Crowbar doesn&#039;t understand cookies!
If your outsourced method doesn&#039;t do it, I guess the only other option is to either modify Crowbar to understand and send cookie to XULRunner - OR - point the crowbar proxy to another proxy which can inject cookies into the headers send/receive/modify cookies that way.

Both ways would work, and the latter way would probably be more extensible, but it&#039;s not very neat... not to mention both would require me to know how to code applications for XULRunner (which I think is actually all JavaScript and a bit of C using a bridging library, but still... all I can code in is PHP and HTML :P).
I really don&#039;t fancy doing that - so this outsourced sourcecode of your, depending on how it works, could *really* help me out. Certainly save me a LOT of time!!

So yeah, hehe, thanks a lot for your help - and thanks a lot for this blog! It&#039;s probably the only resource on what Crowbar is that exists besides the under-loved and cryptic Crowbar homepage!! Good job!</description>
		<content:encoded><![CDATA[<p>Excellent! I&#8217;m glad to hear you got it sorted in the end.<br />
Crowbar is an interesting program&#8230; you&#8217;d think that reading/interpreting JavaScript would be something that all webspiders would be able to do &#8211; yet NONE of them do it for the simple reason that understanding JavaScript and the DOM model requires, essentially, a full browser. Building a full browser into a spidering engine is overkill for just this little bit of added functionality &#8211; but when you need to scrape JS, you need to scrape JS!<br />
As a result, the only program that will allow you to do headless JS processing is XULRunner/Crowbar&#8230;. but Crowbar doesn&#8217;t understand cookies!<br />
If your outsourced method doesn&#8217;t do it, I guess the only other option is to either modify Crowbar to understand and send cookie to XULRunner &#8211; OR &#8211; point the crowbar proxy to another proxy which can inject cookies into the headers send/receive/modify cookies that way.</p>
<p>Both ways would work, and the latter way would probably be more extensible, but it&#8217;s not very neat&#8230; not to mention both would require me to know how to code applications for XULRunner (which I think is actually all JavaScript and a bit of C using a bridging library, but still&#8230; all I can code in is PHP and HTML <img src='http://www.skillett.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> ).<br />
I really don&#8217;t fancy doing that &#8211; so this outsourced sourcecode of your, depending on how it works, could *really* help me out. Certainly save me a LOT of time!!</p>
<p>So yeah, hehe, thanks a lot for your help &#8211; and thanks a lot for this blog! It&#8217;s probably the only resource on what Crowbar is that exists besides the under-loved and cryptic Crowbar homepage!! Good job!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keiron</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-28361</link>
		<dc:creator>Keiron</dc:creator>
		<pubDate>Thu, 02 Sep 2010 09:26:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-28361</guid>
		<description>Hi John,

I outsourced it in the end as I needed to get it done quickly, interestingly I have another project coming up that may need it - I need to dig out the source code I&#039;ll let you know once I can define some decent examples!</description>
		<content:encoded><![CDATA[<p>Hi John,</p>
<p>I outsourced it in the end as I needed to get it done quickly, interestingly I have another project coming up that may need it &#8211; I need to dig out the source code I&#8217;ll let you know once I can define some decent examples!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-28354</link>
		<dc:creator>John</dc:creator>
		<pubDate>Fri, 27 Aug 2010 10:58:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-28354</guid>
		<description>Keiron,

So you managed to crawl your pages with php using curl and crowbar?!
I would love to see that sourcecode man. I&#039;m having a bit of a tough time curling my way into some javascript, and even with Crowbar installed and ready to go I don&#039;t seem to get the results I want. 

What did the curl line you used to call Javascript pages look like?

All the best,

~ John</description>
		<content:encoded><![CDATA[<p>Keiron,</p>
<p>So you managed to crawl your pages with php using curl and crowbar?!<br />
I would love to see that sourcecode man. I&#8217;m having a bit of a tough time curling my way into some javascript, and even with Crowbar installed and ready to go I don&#8217;t seem to get the results I want. </p>
<p>What did the curl line you used to call Javascript pages look like?</p>
<p>All the best,</p>
<p>~ John</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keiron</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-19351</link>
		<dc:creator>Keiron</dc:creator>
		<pubDate>Wed, 14 Jan 2009 17:06:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-19351</guid>
		<description>I eventually resorted to outsourcing it via rentacoder to an absolutely excellent coder in the US. 

He provided exactly what I needed in PHP to the spec I provided with no extensions or the like! Was really pleased with his work - I think he was kind of surprised when I had no complaints or changes that needed making as well!</description>
		<content:encoded><![CDATA[<p>I eventually resorted to outsourcing it via rentacoder to an absolutely excellent coder in the US. </p>
<p>He provided exactly what I needed in PHP to the spec I provided with no extensions or the like! Was really pleased with his work &#8211; I think he was kind of surprised when I had no complaints or changes that needed making as well!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-19350</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Wed, 14 Jan 2009 15:51:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-19350</guid>
		<description>FireWatir might be a better tool for you:

http://wiki.openqa.org/display/WTR/FireWatir

Dan</description>
		<content:encoded><![CDATA[<p>FireWatir might be a better tool for you:</p>
<p><a href="http://wiki.openqa.org/display/WTR/FireWatir" >http://wiki.openqa.org/display/WTR/FireWatir</a></p>
<p>Dan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Using Subversion to get Crowbar &#124; Skillett.com</title>
		<link>http://www.skillett.com/technology/computers/programming/xulrunner-and-crowbar-crawling-of-sorts/comment-page-1/#comment-16200</link>
		<dc:creator>Using Subversion to get Crowbar &#124; Skillett.com</dc:creator>
		<pubDate>Sun, 30 Nov 2008 10:51:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.skillett.com/?p=688#comment-16200</guid>
		<description>[...] post is just a reference point for another post, a Subversion client is needed for downloading Crowbar, so I downloaded TortoiseSVN available [...]</description>
		<content:encoded><![CDATA[<p>[...] post is just a reference point for another post, a Subversion client is needed for downloading Crowbar, so I downloaded TortoiseSVN available [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
