Sunday, August 14, 2005

Trust rank and trust.

Make the following search on a SE:

THE PROTOCOLS OF THE LEARNED ELDERS OF ZION

Who can decide between false pages and true pages?

http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=2004-17&format=pdf&compression=

http://advogato.org/trust-metric.html

Objective search is not an easy game. When you use mathematical algorithms, you must know what they say and don't say. You must know their limitations. That a lot of people vote for the same person, does not imply that that person is more trustworthy. It simply says that he got a lot of votes. You are not more right, even if most people agree with you.

"Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper."
http://www-db.stanford.edu/~backrub/google.html

A natural modification is this

PR(A) = (1-d) + (d1*(PR(T1)/C(T1)) + ... + dn*(PR(Tn))/C(Tn))

where d1+d2+ ... + dn = d.

If you have a true metric that other sites are valued against, human beings or perhaps AI could be used to set di=0 if page i is false. I never promised you a rose garden.

Related link:
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SrchEngCriteria.pdf

Kjell Gunnar Bleivik