Skip to content →

neverendingbooks Posts

the google matrix

This morning there was an intriguing post on arXiv/math.RA
entitled A Note on
the Eigenvalues of the Google Matrix
. At first I thought it was a
joke but a quick Google revealed that the PageRank algorithm really
is at the heart of Google technology, so I simply had to find out more
about it. An extremely readable account of it can be found in The PageRank Citation Ranking: Bringing Order to the Web which is really the
start of Google. It is coauthored by the two founders : Larry Page and
Sergey Brin. A quote from the introduction

“To test the utility of PageRank for search, we built a web
search engine called Google (Section
5)”

Here is an intuitive idea of
_PageRank_ : a page has high rank if the sum of the ranks of its
_backlinks_ (that is, pages linking to the page in question) is
high and it is computed by the _Random Surfer Model_ (see
sections 2.5 and 2.6 of the paper). More formally (at least from my
quick browsing of some papers, maybe the following account is slightly
erroneous and I’ll have to spend some more time reading) let
N be the number of webpages (estimated between 3 and 4
billion) and consider the N x N matrix
A the so called GoogleMatrix where

A = cP  + (1-c)(v x
vec(1)) 

where P is the
column-stochastic matrix (meaning : all entries are zero or positive and
the sum of all entries in each column adds up to 1) with
entries

P(i,j) = 1/N(i) if i->j and 0
otherwise 

where i and j are webpages and i->j
denotes that page i has a link to page j and where N(i) is the total
number of pages linked to in page i (all this information is available
once we download page i). c is a constant 0 < c < 1 and
corresponds to the fraction of webpages containing an _outlink (that
is, a link to another page) by all webpages (it seems that Google uses
c=0.85 as an estimate). Finally, v is a column vector with zero or
positive numbers adding up to 1 and vec(1) is the constant row vector
(1,…,1). The idea behind this term is that in the _Random Surfer
Model_ to compute the PageRank the Googlebot (normally following
links randomly in pages it enters) jumps every (1-c)x100% links randomly
to an entirely different webpage where the chance that it will end up at
page i is given by the i-th entry of v (this is to avoid being trapped
in a web-loop). So, in Googles model the bot _teleports_ itself
randomly every 6th link or so. Now, the PageRank is a
column-eigenvector for the GoogleMatrix A with eigenvalue 1 which can be
approximated by the RandomSurfer model and the rate of convergence of
this process depends on the _second_ largest eigenvalue for A
(the largest being 1). Now, in the paper posted this morning a simple
proof is given that this eigenvalue is c (because the matrix P has
multiple eigenvalues equal to 1). According to a previous paper on the
subject The
Second Eigenvalue of the Google Matrix
, this statement has
implications for the convergence rate of the standard PageRank algorithm
as the web scales, for the stability of PageRank to perturbations to the
link structure of the web, for the detection of Google spammers, and for
the design of algorithms to speed up PageRank. But I’ll have to
read more to understand the Google spammers bit…

2 Comments

the cpu 2 generation

Never
ever tell an ICT-aware person that you want to try to set up a
home-network before you understand all 65536 port-numbers and their corresponding
protocols. Here is what happened to me this week. Jan Adriaenssens returned from an extended vacation in New Zealand and I told him
about my problems with trying to set up WebDAV securely. He
stared at me with that look that teenage children have if they
find out their parents dont know how to handle the simplest things on a
mobile such as saving a number, writing an SMS let alone use the
dictionary… and asked ‘now why would you want to do that??? I just
use AppleTalk to connect to my computer securely’. Now I’m not such a
fool that I didnt try this out but I didnt manage to get matrix
mounted on my Desktop. ‘Oh, but thats probably because of the
firewall’ Jan said ‘just send an email to Peter (the guy running the
defenses here) and ask him to open up ports 548 and
427…’ And sure enough five minutes later the problem was
solved and I could trow my WebDAV-plans in the dustbin (although, I
think Ive found a use for WebDAV but will keep this a bit longer to
myself until I checked it out). If you think that was the end of it,
think twice. Never ever point an ICT-professional to your
computer. They then start looking at its firewall-logs and find all
sorts of things such as : ‘I noticed that traffic from port 53
was dropped to the firewall, could it be that you configured the
firewall as DNS-server. If this is the case, you better remove it and it
will increase your network-speed, I think.’ And sure enough that
IP-address was set on my machine as one of two possibilities for the
DNS-server so I quickly removed it and in the process thought that maybe
I should also remove the other one so I did send Peter another email
asking whether that was ok. It turned out that the second IP address was
the genuine DNS-server so I got a sec answer back ‘You better leave
this as it is otherwise not much will work…’ Oh, shame, shame eternal
shame on me!

My only defense is that I still belong
to what I would call the cpu 2 generation (I’m a few years too
old to belong to the more computer-aware generation X). When I
started out doing research in 1980 the single most important command was

cpu 2

which you had to type before you could run any program.
By typing this you asked to be given 2 minutes of central processing
time, so you had to write all your programs in such a way that either
they gave a result back within 2 minutes or to include lots of
output-commands giving you a chance to determine at which parameters you
would restart the program for your next cpu 2. I once computed in
this way all factorial maximal orders in quaternion algebras by spending
a couple of days in the computer room. These days any desktop computer
would finish this task in half a minute. Perhaps the younger generations
will appreciate all the hard computer-work we had to do back then if
they read a bit from the computer history museum page!

Leave a Comment

NOG master class


Yesterday I made reservations for lecture rooms to run the
master class on non-commutative geometry sponsored by the ESF-NOG project. We have a lecture room on
monday- and wednesday afternoon and friday the whole day which should be
enough. I will run two courses in the program : non-commutative
geometry
and projects in non-commutative geometry both 30
hours. I hope that Raf Bocklandt will do most of the work on the
Geometric invariant theory course so that my contribution to it
can be minimal. Here are the first ideas of topics I want to cover in my
courses. As always, all suggestions are wellcome (just add a
comment).

non-commutative geometry : As
I am running this course jointly with Markus Reineke and as Markus will give a
mini-course on his work on non-commutative Hilbert schemes, I will explain
the theory of formally smooth algebras. I will cover most of the
paper by Joachim Cuntz and Daniel Quillen “Algebra extensions and
nonsingularity”, Journal of AMS, v.8, no. 2, 1995, 251?289. Further,
I’ll do the first section of the paper by Alexander Rosenberg and Maxim Kontsevich,
Noncommutative smooth spaces“. Then, I will
explain some of my own work including the “One
quiver to rule them all
” paper and my recent attempts to classify
all formally smooth algebras up to non-commutative birational
equivalence. When dealing with the last topic I will explain some of Aidan Schofield‘s paper
Birational classification of moduli spaces of representations of quivers“.

projects in
non-commutative geometry
: This is one of the two courses (the other
being “projects in non-commutative algebra” run by Fred Van Oystaeyen)
for which the students have to write a paper so I will take as the topic
of my talks the application of non-commutative geometry (in particular
the theory of orders in central simple algebras) to the resolution of
commutative singularities and ask the students to carry out the detailed
analysis for one of the following important classes of examples :
quantum groups at roots of unity, deformed preprojective algebras or
symplectic reflexion algebras. I will explain in much more detail three talks I gave on the subject last fall in
Luminy. But I will begin with more background material on central simple
algebras and orders.

Leave a Comment