Although the page-rank algorithm works pretty
well, it can not tackle with this condition, in which a malicious website put a
lot of popular keywords on it to get a high priority when users search with multiple keywords.
To measure this kind of situation, I use generalizablity(概括度) to evaluate the results. A website tends to be a meaningless one
with a high generalizablity
Definition:
Gene(web)=num(keywords)/con(w) (a)
1 mum means number
2 con means content
If a website has more content and less
keywords, it tends to have more details of a specific topic.
Extension:
The storage space of keywords is in directly
proportional to num(keywords),so equation(a) can be:
Gene(web)=Space(keywords)/con(w) (b)
The storage space of the web is in directly
proportional to con(w) ,so equation(b) can be:
Gene(web)=Space(keywords)/Space(web) (c)
Ok, we can easily work out the Gene easily.
Step1, make a statistics to the keywords in
different fields
Step2, get the result of Gene with the
Space of keywords in this website divide by the space of this web.
Fine, I get the first floor of this blog. lol... Although Your blog are really hard for me to understand since I am poor at math and formula, it let me know a way to measure the situation you refer to.
回复删除