Call me old-fashioned, but I like feeds: they're clean, they're fast, they're sort of local.
The problem is, however, not all sites provide an atom (or RSS) feed, and if they provide one, it sometimes is more than I want.
E.g. someone recommended the language articles by Michele Berdy at The Moscow Times.
It provides a RSS feed to all opinions, but not to a specific subset.
It does, however, give a HTML listing of most recent articles.
I went out for a quick look on the webz, and found a couple of paying services, and the free Feed Creator byte FiveFilters.org.
It seemed to do what I want, but as it wasn't open source and actually seemed quite easy, I did a very basic implementation myself.
The hosted version can be found at http://jerous.org/tools/site2atom.php, is self-documenting and its source can be downloaded from my git repo
An example for the above site:
It contains three entries:
- title optional title
- url what site to fetch. There is a limit of 1 MiB
- url_contains perform an XPATH query
//a[contains(@href, '$url_contains')]", i.e. return all
a-tags whose href contains url_contains
Currently, there are a couple of keys available that can be used.
Values are first converted to lowercase, unless otherwise noted.
- url[not]contains select if the href [not] contains the value. Multiple occurences are
- urltitle[not_]contains select the url if the link's title [not] contains value. Multiple occurences are
- div_class select only urls that are inside a div with a class containing case-sensitive value. Only last occurence will be used.
In the future there'll be some additions, which are probably documented only on http://jerous.org/tools/site2atom.php only.
Feel free to use in any way you see fit, and let me know if it suits you in any way :)