2007-10-04
■ [CustomFeed-Script] Web::Scraperを使ってCustomFeed::Manganomoriを書き直した

CustomFeed::Hogeはダサいという話とWeb::Scraperが面白そうだという話を聞いたので,Web::Scraperを使って以前作ったCustomFeed::Manganomoriを書き直してみた。
assets/plugins/CustomFeed-Script/manganomori.pl
何故か文字が化ける……。switch-caseの正規表現は上から順に以下の通り。
- /^\d+$/ # これは化けてない
-
/中//下/ -
/下//中/
#!/usr/bin/perl use strict; use warnings; use utf8; use DateTime; use Switch; use URI; use Web::Scraper; use YAML; my $date = shift || DateTime->now->strftime('%Y.%m'); my ($year, $month) = ($date =~ /^(\d{4})\.(\d{2})$/) or return; my $id = ($year - 2003) * 12 + $month - 4; # month = 2003.5 : id = 1 my $url = "http://www.manganomori.net/list.asp?listid=$id"; my $s = scraper { my $publisher; process 'tr[bgcolor]', 'comics[]' => scraper { my $dummy; process 'td[colspan] > a > b' # publisher? , dummy => sub { if ($_) { $publisher = $_->as_text; $dummy = 1; } }; return if $dummy; process '*' # dummy selector , publisher => sub { $publisher }; process '//td[2]', part_or_day => 'text'; process '//td[3]', title => 'text'; process '//td[4]', author => 'text'; process '//td[5]', price => 'text'; result qw/ publisher part_or_day title author price /; }; [grep { defined $_ } @{result qw/ comics /}]; }; binmode STDOUT, ":utf8"; print YAML::Dump +{ title => "まんがの森 コミックリスト $date", link => $url, entry => [ map { +{ title => $_->{title}, author => $_->{author}, tags => [$_->{publisher}], date => &mk_date($_, $year, $month), body => &mk_body($_), } } @{ $s->scrape(URI->new($url)) || [] } ], }; sub mk_date { my ($comic, $year, $month) = @_; DateTime->new( year => $year, month => $month, day => &part_to_day($comic->{part_or_day}), )->strftime('%Y-%m-%d'); } sub mk_body { my $comic = shift; $comic->{part_or_day} eq &part_to_day($comic->{part_or_day}) # day or part? ? join ', ', map { $comic->{$_} } qw/ author publisher price / : join ', ', map { $comic->{$_} } qw/ part_or_day author publisher price /; } sub part_to_day { my $part_or_day = shift; switch ($part_or_day) { case /^\d+$/ { return $part_or_day } case /<e4><b8><8b>/ { return 21 } case /<e4><b8><ad>/ { return 11 } else { return 1 } } };
config.yaml
plugins: - module: Subscription::Config config: feed: - url: script:/path/to/manganomori.pl # - url: script:/path/to/manganomori.pl 2007.10 - module: CustomFeed::Script - module: Publish::iCal config: dir: . filename: manganomori.ics
コメント
トラックバック - http://plagger.g.hatena.ne.jp/SweetPotato/20071004