ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> ruby-talk
ruby-talk
Re: Using hpricot to get tables
by Lrlebron@Gmail.Com other posts by this author
Jul 1 2008 2:44PM messages near this date
Re: Using hpricot to get tables | options for running code in parallel
On Jul 1, 4:03 pm, Dan Diebolt <dandieb...@[...].com>  wrote:
>  [Note:  parts of this message were removed to make it a legal post.]
> 
>  >I would like to access each table individually
> 
>  doc.search returns an array even if there is only one match. The consturct you are using i
terates through this array:
> 
>  doc.search(strPath) do |div|
> 
>  end
> 
>  if you capture the search results into a variable named "divs" you can index it like and a
rray (because it is one)
> 
>  divs=doc.search(strPath)
> 
>  If you want to immediately start iterating you can do this:
> 
>  doc.search(strPath).each_with_index do |div,idiv|
>    puts idiv if idiv==2
>  end
> 
>  I work with hpricot a lot and I find it is more productive to not use all the fancy ruby i
dioms to shorten your code as you are dealing with pages that are very fragile to parse when
 someone changes the page content.
> 
>  See code below
>  ==============
>  require 'hpricot'
>  require 'open-uri'
> 
>  strLink ="http://www.sportsline.com/mlb/gamecenter/boxscore/MLB_20080331_ARI@CIN"
>  strPath ="//div[@class='SLTables1']/div"
> 
>  doc = Hpricot(open(strLink))
>  divs=doc.search(strPath)
> 
>  puts "#{divs[0].inner_text.slice(0..70)}\n\n"
>  puts "#{divs[1].inner_text.slice(0..70)}\n\n"
>  puts "#{divs[2].inner_text.slice(0..70)}\n\n"
>  puts "#{divs[3].inner_text.slice(0..70)}\n\n"

This works. Will be very useful for future projects.

I ended up using the xpath for each table which also worked.

Thanks,

Luis
Thread:
Lrlebron@Gmail.Com
Dan Diebolt
Lrlebron@Gmail.Com

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved