2007-06-10
■ [assets][CustomFeed-Config][EFT] scraper and upgrader 新文化

assets/plugins/CustomFeed-Config/shinbunka_newsflash.yaml
トップページのニュースフラッシュ。
# author: SweetPotato match: http://www\.shinbunka\.co\.jp/(?:index\.html?)?$ extract_xpath: title: //span[@class='bold-midashi']/text() body: //span[@class='bold-midashi']/../../../following-sibling::tr/td/p updated: //span[@class='bold-midashi']/../../../../../../following-sibling::tr/td/span extract_after_hook: $data->{body} .= $data->{updated}
assets/plugins/CustomFeed-Config/shinbunka_joholog.yaml
# author: SweetPotato match: http://www\.shinbunka\.co\.jp/joholog\.html?$ extract_xpath: title: //td[@width='600' and @bgcolor='#FAEBD7']/div[2]/b[1]/text() body: //td[@width='600' and @bgcolor='#FAEBD7']/div[2] updated: //td[@width='600' and @bgcolor='#FAEBD7']/div[1]/font/text() extract_after_hook: | $data->{body} .= $data->{updated}; $data->{body} =~ s!<b.*?>.*?</b>.*?<br.*?>!!;
assets/plugins/Filter-EntryFullText/shinbunka_yell-rue.yaml
# author: SweetPotato custom_feed_handle: http://www\.shinbunka\.co\.jp/rensai/yell-ruelog\.html?$ custom_feed_follow_link: /rue\d+\.html?$ handle: http://www\.shinbunka\.co\.jp/rensai/yell-rue/rue\d+\.html?$ extract_xpath: title: //div[@class='bold-midashi']/text() body: //div[@class='bold-midashi']/../../following-sibling::tr[2]/td date: //div[@class='bold-midashi']/../../following-sibling::tr[2]/td/div/text() extract_date_format: \(%Y/%m/%d\) extract_date_timezone: Asia/Tokyo extract_after_hook: $data->{body} =~ s!<td.*?>(.*)</td>!$1!
assets/plugins/Filter-EntryFullText/shinbunka_shuzainote.yaml
# author: SweetPotato custom_feed_handle: http://www\.shinbunka\.co\.jp/(?:shuzainote/)?shuzainotelog.*?\.html?$ custom_feed_follow_link: /\d+\.html?$ handle: http://www\.shinbunka\.co\.jp/shuzainote/\d+\.html?$ extract_xpath: title: //h2 body: //h2/../../following-sibling::tr[2]/td author: //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/b/text() date: //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/text()[last()] extract_date_format: %Y/%m/%d extract_date_timezone: Asia/Tokyo extract_after_hook: | $data->{title} =~ s!<.*?>!!g; $data->{body} =~ s!<td.*?>(.*)</td>!$1!; $data->{date} = $1 if $data->{date} =~ m!(\d+/\d+/\d+)!;
assets/plugins/Filter-EntryFullText/shinbunka_henshucho.yaml
社長室。
# author: SweetPotato custom_feed_handle: http://www\.shinbunka\.co\.jp/(?:henshucho/)?henshucholog.*?\.html?$ custom_feed_follow_link: /hen\d+\.html?$ handle: http://www\.shinbunka\.co\.jp/henshucho/hen\d+\.html?$ extract_xpath: title: //h2 body: //h2/../../following-sibling::tr[2]/td author: //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/b/text() date: //h2/../../following-sibling::tr[2]/td/p[@align='right'][2]/text() extract_date_format: %Y\x{FF0F}%m\x{FF0F}%d extract_date_timezone: Asia/Tokyo extract_after_hook: | $data->{title} =~ s!<.*?>!!g; $data->{body} =~ s!<td.*?>(.*)</td>!$1!; $data->{date} = $1 if $data->{date} =~ m!(\d+\x{FF0F}\d+\x{FF0F}\d+)!;
config.shinbunka.yaml
plugins: - module: Subscription::Config config: feed: - url: http://www.shinbunka.co.jp/ - url: http://www.shinbunka.co.jp/joholog.htm - url: http://www.shinbunka.co.jp/rensai/yell-ruelog.htm - url: http://www.shinbunka.co.jp/shuzainotelog.htm # - url: http://www.shinbunka.co.jp/shuzainote/shuzainotelog001-030.htm - url: http://www.shinbunka.co.jp/henshucholog.htm # - url: http://www.shinbunka.co.jp/henshucho/henshucholog001-035.htm - module: CustomFeed::Config - module: Filter::ForcePermalink - module: Filter::EntryFullText - module: Filter::ForceTimeZone config: timezone: Asia/Tokyo
Filter::ForcePermalinkについては以下の記事を参照。
コメント
トラックバック - http://plagger.g.hatena.ne.jp/SweetPotato/20070610