Err the Blog Atom Feed Icon
Err the Blog
Rubyisms and Railities
  • “OpenStruct IRL”
    – Chris on January 08, 2007

    Advertisement

    Hey, here’s a fun one. Just last week cdcarter needed to scrape and Rubyify the SciFi channel’s listings. (Wow, those guys really like tables, huh? And nondescript markup. And PHP3. (PHP3 was sooo the best.))

    Quickly carter and I dusted off Hpricot and, with it, scraped the hell out of the listing page. We then turned each listing into a Show object, easy. With OpenStruct.

    %w[open-uri rubygems hpricot ostruct].each { |f| require f }
    
    class Show < OpenStruct
      LISTINGS = 'http://www.scifi.com/schedulebot/index.php3?feed_req=US:Central:E'
    
      def to_s
        "#{time}: #{title}" << (program ? " [#{program}]" : '')
      end  
    
      def self.find_all_from_today
        shows = []
        doc = Hpricot open(LISTINGS)
        tds = (doc/:td).select { |td| td.respond_to?(:[]) && td['class'] == 'text' }
        tds.each_with_index do |td, i|
          next unless td.innerHTML =~ /:.+(AM|PM)/
          time    = td.innerHTML
          program = tds[i+1].innerHTML.gsub(/<a.+>(.+)<\/a>/, '\1')
          title   = tds[i+2].innerHTML
          shows << new(:time => time, :program => program, :title => title)
        end
        shows
      end
    end
    
    # print all found shows for today
    Show.find_all_from_today.each { |show| puts show.to_s }
    

    Run it. I get something like this:

    5:00 AM: [PAID PROGRAMMING]
    7:00 AM: SHADOW PLAY [TWILIGHT ZONE, THE]
    7:30 AM: BLACK MARKET [BATTLESTAR GALACTICA (SEASON 2)]
    8:30 AM: SCAR [BATTLESTAR GALACTICA (SEASON 2)]
    9:30 AM: SACRIFICE [BATTLESTAR GALACTICA (SEASON 2)]
    ...
    

    Way cool (even though I’m dying to slip in some returning action). You can imagine how this might be expanded into a nice little pirate RSS feed or something.

    Any more cool Struct or OpenStruct uses floating around out there? Jay Fields has done lots of messin’ with OpenStruct and kindly sprinkles a few write-ups throughout his blog. How’s about yous?

  • ChrisJ, 3 months later:

    Nice article. I’ve been trying to find some “excuse” to try Hpricot. This gives me an idea for stats on nfl.com.

  • Chris Carter, 3 months later:

    Thanks for giving me the metion :) I think I’m gonna turn the Buggy code to use openstruct. It’s so clever. So many classes could just inherit from it, like a superjavabean

  • Seth Thomas Rasmussen, 3 months later:

    I started playing with Hpricot just today! Me likeyy.

  • kbrown, 2 months later:

    What is the advantage of using openstruct in this case? I can see why module opts_parse uses is because the methods vary based on program options, but you are just using methods: time,program,title. no?

  • John Nunemaker, about 1 month later:

    I used hpricot on my twitter gem and absolutely fell in love. Wouldn’t have thought of it without this article. Unfortunately, using open struct slipped my mind. That is a really nice touch. Glad I revisted this.

  • Five people have commented.
    Chime in.



    Textile is permitted.

Projects

  • Cheat! Sheets
  • Subtlety: RSSin' Your SVN
  • cache_fu
  • acts_as_textiled
  • mofo [microformat parsing]
  • require 'errtheblog'

Information

  • Dynamite! — The Err Free Weblog
  • Err Free: Ruby Development & Consulting
  • Err on GitHub
  • Err on Twitter
  • Report Err Plugin Bugs (Lighthouse Tracker)
  • Contact
This is Err, the weblog of PJ Hyett and Chris Wanstrath.
All original content copyright ©2006-2008 the aforementioned.