PrePAN

Sign in to PrePAN

Web::HackerNews Scrape the HTML of Hackernews

Author
benkasminbullock@github
Date
URL
Status
In Review
Good

Synopsis

use Web::HackerNews;

my $hn = Web::HackerNews->new ();
my @stories = $hn->parse_file ('hn.html');
for my $story (@story) {
    if ($story->{title} =~ /Elon Musk|Google memo|James Damore|react.js/i) {
         get ($story->{hide});
    }
}

Description

Given a Hacker News page, scrape the HTML to extract the contents. For example, get the title and the "hide" URL, etc., so that one can automatically match the titles against a regular expression then "hide" stories about Elon Musk, James Damore, react.js, Google memos, or other tedious things and people.

This is an HTML scraper and not related to WebService::HackerNews by Neil Bower. Note that Hacker News uses tables and "center" tags for layout, with no particular logical subdivision.

Comments

What is the purpose of this scraper in comparison to the other module that accesses the api (WebService::HackerNews)? Is it just the "hide" feature?
And aren't web scraper interfaces typically in the WWW:: namespace?

Please sign up to post a review.