SweetPotato::Plagger このページをアンテナに追加 RSSフィード

2007-06-10

[][][] scraper and upgrader 新文化 02:56  scraper and upgrader 新文化 - SweetPotato::Plagger を含むブックマーク はてなブックマーク -  scraper and upgrader 新文化 - SweetPotato::Plagger  scraper and upgrader 新文化 - SweetPotato::Plagger のブックマークコメント

出版業界紙・新文化の色々なコンテンツをPlaggerで。

assets/plugins/CustomFeed-Config/shinbunka_newsflash.yaml

トップページのニュースフラッシュ。

# author: SweetPotato
match: http://www\.shinbunka\.co\.jp/(?:index\.html?)?$
extract_xpath:
  title:   //span[@class='bold-midashi']/text()
  body:    //span[@class='bold-midashi']/../../../following-sibling::tr/td/p
  updated: //span[@class='bold-midashi']/../../../../../../following-sibling::tr/td/span
extract_after_hook: $data->{body} .= $data->{updated}

assets/plugins/CustomFeed-Config/shinbunka_joholog.yaml

情報掲示板

# author: SweetPotato
match: http://www\.shinbunka\.co\.jp/joholog\.html?$
extract_xpath:
  title:   //td[@width='600' and @bgcolor='#FAEBD7']/div[2]/b[1]/text()
  body:    //td[@width='600' and @bgcolor='#FAEBD7']/div[2]
  updated: //td[@width='600' and @bgcolor='#FAEBD7']/div[1]/font/text()
extract_after_hook: |
  $data->{body} .= $data->{updated};
  $data->{body} =~ s!<b.*?>.*?</b>.*?<br.*?>!!;

assets/plugins/Filter-EntryFullText/shinbunka_yell-rue.yaml

連載コラムルーエからのエール」。

# author: SweetPotato
custom_feed_handle: http://www\.shinbunka\.co\.jp/rensai/yell-ruelog\.html?$
custom_feed_follow_link: /rue\d+\.html?$
handle: http://www\.shinbunka\.co\.jp/rensai/yell-rue/rue\d+\.html?$
extract_xpath:
  title: //div[@class='bold-midashi']/text()
  body:  //div[@class='bold-midashi']/../../following-sibling::tr[2]/td
  date:  //div[@class='bold-midashi']/../../following-sibling::tr[2]/td/div/text()
extract_date_format: \(%Y/%m/%d\)
extract_date_timezone: Asia/Tokyo
extract_after_hook: $data->{body} =~ s!<td.*?>(.*)</td>!$1!

assets/plugins/Filter-EntryFullText/shinbunka_shuzainote.yaml

連載コラム取材ノート」。

# author: SweetPotato
custom_feed_handle: http://www\.shinbunka\.co\.jp/(?:shuzainote/)?shuzainotelog.*?\.html?$
custom_feed_follow_link: /\d+\.html?$
handle: http://www\.shinbunka\.co\.jp/shuzainote/\d+\.html?$
extract_xpath:
  title:  //h2
  body:   //h2/../../following-sibling::tr[2]/td
  author: //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/b/text()
  date:   //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/text()[last()]
extract_date_format: %Y/%m/%d
extract_date_timezone: Asia/Tokyo
extract_after_hook: |
  $data->{title} =~ s!<.*?>!!g;
  $data->{body} =~ s!<td.*?>(.*)</td>!$1!;
  $data->{date} = $1 if $data->{date} =~ m!(\d+/\d+/\d+)!;

assets/plugins/Filter-EntryFullText/shinbunka_henshucho.yaml

社長室

# author: SweetPotato
custom_feed_handle: http://www\.shinbunka\.co\.jp/(?:henshucho/)?henshucholog.*?\.html?$
custom_feed_follow_link: /hen\d+\.html?$
handle: http://www\.shinbunka\.co\.jp/henshucho/hen\d+\.html?$
extract_xpath:
  title:  //h2
  body:   //h2/../../following-sibling::tr[2]/td
  author: //h2/../../following-sibling::tr[2]/td/p[@align='right'][1]/b/text()
  date:   //h2/../../following-sibling::tr[2]/td/p[@align='right'][2]/text()
extract_date_format: %Y\x{FF0F}%m\x{FF0F}%d
extract_date_timezone: Asia/Tokyo
extract_after_hook: |
  $data->{title} =~ s!<.*?>!!g;
  $data->{body} =~ s!<td.*?>(.*)</td>!$1!;
  $data->{date} = $1 if $data->{date} =~ m!(\d+\x{FF0F}\d+\x{FF0F}\d+)!;

config.shinbunka.yaml

plugins:
  - module: Subscription::Config
    config:
      feed:
         - url: http://www.shinbunka.co.jp/
         - url: http://www.shinbunka.co.jp/joholog.htm
         - url: http://www.shinbunka.co.jp/rensai/yell-ruelog.htm
         - url: http://www.shinbunka.co.jp/shuzainotelog.htm
#        - url: http://www.shinbunka.co.jp/shuzainote/shuzainotelog001-030.htm
         - url: http://www.shinbunka.co.jp/henshucholog.htm
#        - url: http://www.shinbunka.co.jp/henshucho/henshucholog001-035.htm
  - module: CustomFeed::Config
  - module: Filter::ForcePermalink
  - module: Filter::EntryFullText
  - module: Filter::ForceTimeZone
    config:
      timezone: Asia/Tokyo

Filter::ForcePermalinkについては以下の記事を参照。

トラックバック - http://plagger.g.hatena.ne.jp/SweetPotato/20070610