<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>The Iron Sea &#187; scrubyt</title>
	<atom:link href="http://theironsea.com/tag/scrubyt/feed" rel="self" type="application/rss+xml" />
	<link>http://theironsea.com</link>
	<description>inside the never settle mind of Houston Ng</description>
	<pubDate>Tue, 13 May 2008 04:15:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
	<language>en</language>
			<item>
		<title>scRubyt Tutorial: Dogs of Hang Seng Index</title>
		<link>http://theironsea.com/2008/05/01/21/</link>
		<comments>http://theironsea.com/2008/05/01/21/#comments</comments>
		<pubDate>Thu, 01 May 2008 15:43:10 +0000</pubDate>
		<dc:creator>Houston</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[finance]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[scrubyt]]></category>

		<guid isPermaLink="false">http://theironsea.com/?p=21</guid>
		<description><![CDATA[Inspired by a scRubyt Tutorial: Dog of FTSE posted on straws-dog. Guess it&#8217;s time for me to give it a go!
This should work on all the Index, Dow Jones, NASDAQ Qs&#8230;.
So here&#8217;s the paste of the code. The idea is very similar to Dog of FTSE, except this uses the intelligent learning feature of scRubyt [...]]]></description>
			<content:encoded><![CDATA[<p>Inspired by a scRubyt Tutorial: <a href="http://www.straw-dogs.co.uk/09/05/scrubyt-tutorial-dogs-of-the-ftse/">Dog of FTSE</a> posted on straws-dog. Guess it&#8217;s time for me to give it a go!</p>
<p>This should work on all the Index, Dow Jones, NASDAQ Qs&#8230;.</p>
<p>So here&#8217;s the paste of the code. The idea is very similar to Dog of FTSE, except this uses the intelligent learning feature of scRubyt instead of the straight XPATH. This feature of scRubyt is especially useful when it is used on a well-element-classified (don&#8217;t know how to phrase it) site like yahoo finance.</p>
<p>#<br />
# Dog of HSI written by Houston Ng</p>
<p>require &#8216;rubygems&#8217;<br />
require &#8217;scrubyt&#8217;</p>
<p># Initialize an empty hash for final data<br />
final_data={}</p>
<p># Extractor 1: Get an update symbol list of HSI from yahoo finance<br />
hsi_list = Scrubyt::Extractor.define do<br />
fetch &#8216;http://hk.finance.yahoo.com/q/cp?s=%5EHSI&#8217;<br />
top    &#8220;//td[@class=&#8217;c1&#8242;]&#8221; do<br />
symbol &#8220;//a&#8221;<br />
end<br />
end<br />
hsi_stocks = hsi_list.to_hash</p>
<p># Extractor 2: Get the stock information, just modify the pattern learner<br />
hsi_stocks.each do |hsi_stock|<br />
sc = hsi_stock[:symbol]<br />
stock_data = Scrubyt::Extractor.define do<br />
fetch &#8216;http://hk.finance.yahoo.com/q&#8217;<br />
fill_textfield &#8217;s&#8217;, sc<br />
submit</p>
<p>#For the div with id, stock bar,<br />
#get the content inside h3, the first and second b tags<br />
#This only works for HK finance<br />
top &#8220;//div[@id=&#8217;stock-bar&#8217;]&#8221; do<br />
name &#8220;//h3&#8243;<br />
price &#8220;//b[1]&#8221;<br />
change &#8220;//b[2]&#8221;<br />
end<br />
end<br />
#Save the stock data into the final data<br />
final_data[sc] = stock_data.to_hash<br />
end<br />
# Print out the final data, basically just puts out the hash on screen<br />
#The example is just spitting out the name for simplicity<br />
final_data.each do |key, entry|<br />
puts &#8220;\n#{key}&#8221;<br />
entry.each do |datapair|<br />
puts &#8220;#{datapair[:name]} | Price: #{datapair[:price]} | Change: #{datapair[:change]}&#8221;<br />
end<br />
end<br />
I know I can just get everything from the same page I got the HSI symbol list. But Just for fun to play with auto-fill in. I load the actual quote page.</p>
<p>There is however, a slight pitfall that my little brain needs to work around. If you notice the output, there are two empty hashes. They are from the HSI list grabbing. It grabs too many cells, that it grab the header cell but since, the header cell does not contain any &lt;a&gt; tags, it return an empty value. This will be next improvement of the code&#8230;</p>
<p>Here&#8217;s the source code <a href="http://theironsea.com/wp-content/uploads/hsi-list.rb">Dog of HSI</a></p>
<p>Dog of the Dow coming soon&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://theironsea.com/2008/05/01/21/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
