If
you are interested in getting daily RSS-feeds of one (or more) of the
following arXiv
sections : math.RA, math.AG, math.QA and
math.RT you can point your news-aggregator to
www.matrix.ua.ac.be. Most of the solution to my first
Perl-exercise I did explain already yesterday but the current program
has a few changes. First, my idea was to scrape the recent-files
from the arXiv, for example for math.RA I would get http://www.arxiv.org/list/math.RA/recent but this
contains only titles, authors and links but no abstracts of the papers.
So I thought I had to scrape for the URLs of these papers and then
download each of the abstracts-files. Fortunately, I found a way around
this. There is a lesser known way to get at all abstracts from
math of the current day (or the few last days) by using the Catchup interface. The syntax of this interface is
as follows : for example to get all math-papers with
abstracts posted on April 2, 2004 you have to get the page with
URL
http://www.arxiv.org/catchup?smonth=04&sday=02&num=50&archive= math&method=with&syear=2004
so in order to use it I had
to find a way to parse the present day into a numeric
day,month,year format. This is quite easy as there is the very
well documented Date::Manip-module in Perl. Another problem with
arXiv is that there are no posts in the weekend. I worked around
this by requesting the Catchup starting from the previous
business day (an option of the DateCalc-function. This means
that over the weekend I get the RSS feeds of papers posted on Friday, on
Monday I\’ll get those of Friday&Monday and for all other days I\’ll get
those of today&yesterday. But it is easy to change the script to allow
for a longer period so please tell me if you want to have RSS-feeds for
the last 3 or 4 days. Also, if you need feeds for other sections that
can easily be done, so tell me.
Here are the URLs to give to
your news-aggregator for these sections :
math.RA at
http://www.matrix.ua.ac.be/arxivRSS/mathRA/
math.QA at
http://www.matrix.ua.ac.be/arxivRSS/mathQA/
math.RT at
http://www.matrix.ua.ac.be/arxivRSS/mathRT/
math.AG at
http://www.matrix.ua.ac.be/arxivRSS/mathAG/
If
your news-aggregator is not clever then you may have to add an
additional index.xml at the end. If you like to use these feeds
on a Mac, a good free news-aggregator is NetNewsWire Lite. To get at the above feeds, click on the Subscribe
button and copy one of the above links in the pop-up window. I
don\’t think my Perl-script breaks the Robots Beware rule of the arXiv. All it does it to download one page a day
using their Catchup-Method. I still have to set up a cron-job to
do this daily, but I have to find out at which (local)time at night the
arXiv refreshes its pages…