Skip to content →

Tag: mac

arxiv RSS feeds available

If
you are interested in getting daily RSS-feeds of one (or more) of the
following arXiv
sections : math.RA, math.AG, math.QA and
math.RT you can point your news-aggregator to
www.matrix.ua.ac.be. Most of the solution to my first
Perl-exercise I did explain already yesterday but the current program
has a few changes. First, my idea was to scrape the recent-files
from the arXiv, for example for math.RA I would get http://www.arxiv.org/list/math.RA/recent but this
contains only titles, authors and links but no abstracts of the papers.
So I thought I had to scrape for the URLs of these papers and then
download each of the abstracts-files. Fortunately, I found a way around
this. There is a lesser known way to get at all abstracts from
math of the current day (or the few last days) by using the Catchup interface. The syntax of this interface is
as follows : for example to get all math-papers with
abstracts
posted on April 2, 2004 you have to get the page with
URL

http://www.arxiv.org/catchup?smonth=04&sday=02&num=50&archive=
math&method=with&syear=2004

so in order to use it I had
to find a way to parse the present day into a numeric
day,month,year format. This is quite easy as there is the very
well documented Date::Manip-module in Perl. Another problem with
arXiv is that there are no posts in the weekend. I worked around
this by requesting the Catchup starting from the previous
business day
(an option of the DateCalc-function. This means
that over the weekend I get the RSS feeds of papers posted on Friday, on
Monday I\’ll get those of Friday&Monday and for all other days I\’ll get
those of today&yesterday. But it is easy to change the script to allow
for a longer period so please tell me if you want to have RSS-feeds for
the last 3 or 4 days. Also, if you need feeds for other sections that
can easily be done, so tell me.
Here are the URLs to give to
your news-aggregator for these sections :

math.RA at
http://www.matrix.ua.ac.be/arxivRSS/mathRA/
math.QA at
http://www.matrix.ua.ac.be/arxivRSS/mathQA/
math.RT at
http://www.matrix.ua.ac.be/arxivRSS/mathRT/
math.AG at
http://www.matrix.ua.ac.be/arxivRSS/mathAG/

If
your news-aggregator is not clever then you may have to add an
additional index.xml at the end. If you like to use these feeds
on a Mac, a good free news-aggregator is NetNewsWire Lite. To get at the above feeds, click on the Subscribe
button
and copy one of the above links in the pop-up window. I
don\’t think my Perl-script breaks the Robots Beware rule of the arXiv. All it does it to download one page a day
using their Catchup-Method. I still have to set up a cron-job to
do this daily, but I have to find out at which (local)time at night the
arXiv refreshes its pages…

Leave a Comment

my first scraper

As
far as I know (but I am fairly ignorant) the arXiv does not
provide RSS feeds for a particular section, say mathRA. Still it would be a good idea for anyone
having a news aggregator to follows some weblogs and
news-channels having RSS syndication. So I decided to write one as my
first Perl-exercise and to my own surprise I have after a few hours work
a prototype-scraper for math.RA. It is not yet perfect, I still
have to convert the local URLs to global URLs so that they can be
clicked and at the moment I have only collected the titles, authors and
abstract-links whereas it would make more sense to include the full
abstract in the RSS feed, but give me a few more days…
The
basic idea is fairly simple (and based on an O\’Reilly hack).
One uses the Template::Extract module to
extract the goodies from the arXiv\’s template HTML. Maybe I am still
not used to Perl-documentation but it was hard for me to work out how to
do this in detail either from the hack or the online
module-documentation. Fortunately there is a good Perl Advent
Calendar
page giving me the details that I needed. Once one has this
info one can turn it into a proper RSS-page using the XML::RSS-module.
In fact, I spend far
more time trying to get XML::RSS installed under OS X than
writing the code. The usual method, that is via

iMacLieven:~
lieven$ sudo /usr/bin/perl -MCPAN -e shell Terminal does not support
AddHistory. cpan shell -- CPAN exploration and modules installation
(v1.76) ReadLine support available (try \'install
Bundle::CPAN\') cpan> install XML::RSS 

failed and even a
manual install for which the drill is : download the package from CPAN, go to the
extracted directory and give the commands

sudo /usr/bin/perl
Makefile.pl sudo make sudo make test sudo make
install

failed. Also a Google didn\’t give immediate results until
I did find this ADC page which set me on the right track.
It seems that the problem is in installing the XML::Parser for which one first need expat
to be installed. Now, the generic sourceforge page contains a
version for Linux but fortunately it is also part of the Fink
project
so I did a

sudo fink install expat

which worked
without problems but afterwards I still was not able to install
XML::Parser because Fink installs everything in the /sw
tree. But after

sudo perl Makefile.pl EXPATLIBPATH=/sw/lib
EXPATINCPATH=/sw/include

I finally got the manual installation
going. I will try to tidy up the script over the weekend…

One Comment

chicken of the VNC

If I
ever get our home automation system configured I’ll use my (partly
broken) old iBook as my Indigo-server (or my MisterHouse-server when I brush up my
Perl-knowledge). It should then run quietly put away somewhere and I
don’t want to take it out every time I want to add another routine to
the program.
Fortunately there is a way to do this by turning
the iBook into a VNC-server, where VNC stands for
Virtual Network Computer. Here is how RealVNC describes
it

VNC (Virtual Network Computing) software makes it
possible to view and fully-interact with one computer from any other
computer or mobile device anywhere on the Internet. VNC software is
cross-platform, allowing remote control between different types of
computer. For ultimate simplicity, there is even a Java viewer, so that
any desktop can be controlled remotely from within a browser without
having to install software.

But can all this be done under
Mac OS X without too much hassle? The first step is to download
OSXvnc and install it on the iBook. Some of the
sourceforge-sites do not seem to have this package, but fortunately some
still do. Installation is no problem and when you fire OSXvnc up
you have to fill in a password which you need later to connect to your
OSXvnc-server (the iBook). Most other options one can leave at their
default values but in the Startup-pane it is useful to click on
the Configure Startup Item button. When all this is done, press
the Start button to launch the VNC-server.
Next step is
to go to the computer you want to use to control the VNC-server (an iMac
in my case). On it one needs to install the Chicken of the VNC software which makes the iMac
into a VNC-client. Fire it up and fill out the Host (the name of
your OSXvnc-server, iBookLieven.local in my case) and the
Password (the one of the OSXvnc-server program), press the
Connect button and the screen of your VNC-server will appear
which you can control with your mouse as if you were actually working on
the thing. Very handy as I managed to break the touch-control on my
iBook when installing a new hard-drive and I need the only USB-port to
connect to the X10-network…

Leave a Comment