2012年11月27日星期二

Thinking of Generalizablity




Although the page-rank algorithm works pretty well, it can not tackle with this condition, in which a malicious website put a lot of popular keywords on it to get a high priority when users search with multiple keywords.
To measure this kind of situation, I use generalizablity(概括度) to evaluate the results. A website tends to be a meaningless one with a high generalizablity

Definition:
Gene(web)=num(keywords)/con(w)   (a)
1 mum means number
2 con means content

If a website has more content and less keywords, it tends to have more details of a specific topic.

Extension:
The storage space of keywords is in directly proportional to num(keywords),so equation(a) can be:
Gene(web)=Space(keywords)/con(w)   (b)

The storage space of the web is in directly proportional to con(w) ,so equation(b) can be:
Gene(web)=Space(keywords)/Space(web)   (c)


Ok, we can easily work out the Gene easily.
Step1, make a statistics to the keywords in different fields
Step2, get the result of Gene with the Space of keywords in this website divide by the space of this web.

1 条评论:

  1. Fine, I get the first floor of this blog. lol... Although Your blog are really hard for me to understand since I am poor at math and formula, it let me know a way to measure the situation you refer to.

    回复删除