Search Engine Spam Definitions and Detection
Papers related to defining, identifying and combating search engine
spam, including those published and applicable to the Google Search
Engine, particularly those on the Stanford University Document Server.
Search Engine and Link Spam Detection
Stanford paper authored by Z. Gyongi and Hector Garcia-Molina
"Web spamming refers to actions intended to mislead search
engines and give some pages higher ranking than they deserve.
Recently, the amount of web spam has increased dramatically, leading
to a degradation of search results. This paper presents a comprehensive
taxonomy of current spamming techniques, which we believe can
help in developing appropriate countermeasures."
Stanford University Technical Paper published May 30, 2005 and authored
by Zoltan Gyongyi and Hector Garcia-Molina
"Link spam is used to increase the ranking of certain
target web pages by misleading the connectivity-based ranking
algorithms in search engines. In this paper we study how web pages
can be interconnected in a spam farm in order to optimize rankings.
We also study alliances, that is, interconnections of spam farms.
Our results identify the optimal structures and quantify the potential
gains. In particular, we show that alliances can be synergistic
and improve the rankings of all participants. We believe that
the insights we gain will be useful in identifying and combating
Web Spam with TrustRank
Stanford paper authored by Z. Gyongyi, J. Pedersen and Hector Garcia-Molina
"Web spam pages use various techniques to achieve higher-than-deserved
rankings in a search engine's results. While human experts can
identify spam, it is too expensive to manually evaluate a large
number of pages. Instead, we propose techniques to semi-automatically
separate reputable, good pages from spam."