Skip to content →

Tag: google

the google matrix

This morning there was an intriguing post on arXiv/math.RA
entitled A Note on
the Eigenvalues of the Google Matrix
. At first I thought it was a
joke but a quick Google revealed that the PageRank algorithm really
is at the heart of Google technology, so I simply had to find out more
about it. An extremely readable account of it can be found in The PageRank Citation Ranking: Bringing Order to the Web which is really the
start of Google. It is coauthored by the two founders : Larry Page and
Sergey Brin. A quote from the introduction

“To test the utility of PageRank for search, we built a web
search engine called Google (Section
5)”

Here is an intuitive idea of
_PageRank_ : a page has high rank if the sum of the ranks of its
_backlinks_ (that is, pages linking to the page in question) is
high and it is computed by the _Random Surfer Model_ (see
sections 2.5 and 2.6 of the paper). More formally (at least from my
quick browsing of some papers, maybe the following account is slightly
erroneous and I’ll have to spend some more time reading) let
N be the number of webpages (estimated between 3 and 4
billion) and consider the N x N matrix
A the so called GoogleMatrix where

A = cP  + (1-c)(v x
vec(1)) 

where P is the
column-stochastic matrix (meaning : all entries are zero or positive and
the sum of all entries in each column adds up to 1) with
entries

P(i,j) = 1/N(i) if i->j and 0
otherwise 

where i and j are webpages and i->j
denotes that page i has a link to page j and where N(i) is the total
number of pages linked to in page i (all this information is available
once we download page i). c is a constant 0 < c < 1 and
corresponds to the fraction of webpages containing an _outlink (that
is, a link to another page) by all webpages (it seems that Google uses
c=0.85 as an estimate). Finally, v is a column vector with zero or
positive numbers adding up to 1 and vec(1) is the constant row vector
(1,…,1). The idea behind this term is that in the _Random Surfer
Model_ to compute the PageRank the Googlebot (normally following
links randomly in pages it enters) jumps every (1-c)x100% links randomly
to an entirely different webpage where the chance that it will end up at
page i is given by the i-th entry of v (this is to avoid being trapped
in a web-loop). So, in Googles model the bot _teleports_ itself
randomly every 6th link or so. Now, the PageRank is a
column-eigenvector for the GoogleMatrix A with eigenvalue 1 which can be
approximated by the RandomSurfer model and the rate of convergence of
this process depends on the _second_ largest eigenvalue for A
(the largest being 1). Now, in the paper posted this morning a simple
proof is given that this eigenvalue is c (because the matrix P has
multiple eigenvalues equal to 1). According to a previous paper on the
subject The
Second Eigenvalue of the Google Matrix
, this statement has
implications for the convergence rate of the standard PageRank algorithm
as the web scales, for the stability of PageRank to perturbations to the
link structure of the web, for the detection of Google spammers, and for
the design of algorithms to speed up PageRank. But I’ll have to
read more to understand the Google spammers bit…

2 Comments

homemade .mac

The
other members of my family don’t understand what I am trying to do the
last couple of days with all those ethernet-cables, airport-stations,
computer-books and the like. ‘Improving our network’ doesn’t make
much of an impression. To them, our network is fine as it is : from
every computer one has access to the internet and to the only
house-printer and that is what they want. To them, my
computer-phase is just an occupational therapy while recovering
from the flu. Probably they are right but I am obstinate in
experimenting to prove them wrong. Not that there is much hope,
searching the web for possible fun uses of home-networks does not give
that many interesting pages. A noteworthy exception is a series of four
articles by Alan Graham for the macdevcenter
on the homemade dot-mac with OS X-project.

In
the first article Homemade Dot-Mac with OS X he explains how to
set-up a house-network (I will give a detailed account of our
home-network shortly) and firing up your Apache webserver. One nice
feature I learned from this is to connect a computer by ethernet to the
router and via an Airport card to the network (you can force this by
specifying the order of active network ports in the
SystemPreferences/Network/Show Network port configuration-pane :
first Built-in Ethernet and second Airport). This way you
get a faster connection to the internet while still connecting to the
other computers on the network. In the second part he explains how to
get yourself a free domain name even if you have (as we do) a dynamic
IP-address via a service like DynDNS. Indeed it is quite easy to set this up but
so far I failed to reach my new DNS-server from outside the network,
probably because of bad port-mapping of my old isb2lan-router.
This afternoon I just lost two hours trying to fix this (so far :
failed) as I didn’t even know how to talk to my router as I lost the
manual which is no longer online. A few Google-searches further I
learned that i just had to type http://192.168.0.1 to get at the set-up pages
(there is even a hidden page) but you shouldnt try these links
unless you are connected to one of these routers. Maybe I will need
another look at this review.

In the second
article, Homemade Dot-Mac with OS X, Part 2 he discusses in
length setting up a firewall with BrickHouse (shareware costing $25) compared to the
built-in firewall-pane in SystemPreferences/Sharing convincing me
to stay with the built-in option. Further he explains what tools one can
use to set up a homepage (stressing the iPhoto-option).Finally, and this
is the most interesting part (though a bit obscure), he hints at the
possibility of setting up your own iDisk facility either using
FTP (insecure) or WebDAV.

The third article in the
series is Homemade Dot Mac: Home Web Radio in which he
claims that one can turn the standard OS X-Apache server into an iTunes
streaming server. He uses for this purpose the QuickTime Streaming Sever which you can get for
free from the Apple site but which I think works only when you have an
X-server. It seems that all nice features require an X-server so
maybe I should consider buying one…

The (so far)
final article is Six Great Tips for Homemade Dot Mac Servers is
really interesting and I will come back to most op these possibilities
when (if) I get them to work. The for me most promising options are :
the central file server (which he synchronizes using the
shareware-product ExecutiveSync ($15 for an academic license) but
I’m experimenting also a bit with the freeware Lacie-program Silverkeeper which seems to be doing roughly the
same things. The iTunes central-hack is next on my ToDo-list as
is (at a later stage) the WebDav and the Rendezvous-idea. So it seems
I’ll prolong my occupational therapy a while…

Leave a Comment

SSL on Mac OSX

A
longer term project is to get the web-server www.matrix.ua.ac.be integrated in our home-network
as an external WebDAV-server (similar to the .Mac-service
offered by Apple). But as this server runs all information about the
master-class on non-comutative geometry connecting to it via HTTP to use
WebDAV is too great of a security risk as all username/password
combinations will be send without encryption. Hence the natural question
whether this server can be set up to run SSL (Secure Sockets
Layer) such that one can connect via HTTPS and all exchanged information
will be encrypted. As the server is an Apache it comes down to get
mod-ssl running. A Google on mod_ssl OS X gives the
ADC-document Using mod-ssl on Mac OS X which seems to be just
what I want. This page is very well documented giving detailed
instructions of using the openssl command. However, the
end-result is rather weak : it only makes the localhost running
HTTPS, that is, one can connect to your own computer safely… which is
pretty ridiculous (other computers in the same network cannot even
connect safely).

So, back to the Google-list on which
one link raises my interest Configuring mod-ssl on Mac OS X which looks like
the previous link but has one essential difference : the page is written
by Marc Liyanage. If you ever tried to get PHP and/or MySQL
running under OS X you will have noticed that his pages are by far the
most reliable on the subject, hence maybe he has also something
interesting to say on mod-ssl. However, the bottom line of the
document is not very promising :

You
should now be able to access the content with https://127.0.0.1 from
the same machine.

which is again the
localhost. So perhaps it is just impossible to run mod-ssl
without having an X-server. Anyway, let us try out his procedure.
Begin by issuing the following commands in the Terminal

sudo -s cd /etc/httpd mkdir ssl chmod 700 ssl cd
ssl gzip -c --best /var/log/system.log > random.dat openssl rand
-rand file:random.dat 0

Next, we need a server certificate. If you
want to do it properly you need a certificate from a certification
authority
such as Thawte but this costs at least $200 a year which I
am not willing to pay. The alternative is to use a self-signed
certificate
which will force the browser to display an error-message
but if the user dismisses it all traffic exchanged with the server will
still be encrypted which is just what I want. So, type the command

openssl req -keyout privkey-2001.pem -newkey rsa:1024
 -nodes -x509 -days 365 -out cert-2001.pem

(all on one line).
You will be asked a couple of questions (the only important one is the
Common Name (eg, YOUR name). Here you should take care to enter
the host name of your web server exactly as it will be used later in the
common name field. In my test-case, if I want to get my server
used by other computers in the network this name will be
imaclieven.local. (note the trailing .). Now issue the following
commands

chmod 600 privkey-2001.pem chown root
privkey-2001.pem apxs -e -a -n ssl /usr/libexec/httpd/libssl.so

which will activate the SSL-module (if at a later state you want
to de-activate it you have to change -a by -A in the last command).
Finally, we have to change the /etc/httpd/httpd.conf file so
first save a backup-version and then add the following lines at the end
of the file :

(IfModule mod-ssl.c)     Listen 80    
Listen 443     SSLCertificateFile /etc/httpd/ssl/cert-2001.pem    
SSLCertificateKeyFile /etc/httpd/ssl/privkey-2001.pem    
SSLRandomSeed startup builtin     SSLRandomSeed connect builtin   
 (VirtualHost -default- :443)         SSLEngine on    
(/VirtualHost) (/IfModule)

Observe that round brackets ()
should be replaced by <>. Finally, we do

apachectl
stop apachectl start

and we are done! Going to another computer
in the network and typing in Safari https://imaclieven.local./
will result in an error message


Just click Continue and you will have a secure connection
to the server. Thanks Marc Liyanage!

(Added january
11th) Whereas the above allows one to make a HTTPS connection it is not
enough for my intended purposes. In order to get a secure connection to
a WebDAV server, this server must have the mod-auth-digest module
running which seems to be impossible for the standard Apache server of
10.3. You need an X-server to have this facility. So I think I have to
scale down my ambitions a bit.

Leave a Comment