I am currently in Toronto presenting our new paper titled “On the Robustness of Google Scholar against Spam” at Hypertext 2010. The paper is about some experiments we did on Google Scholar to find out how reliable their citation data etc. is. The paper soon will be downloadable on our publication page but for now i will post a pre-print version of that paper here in the blog:
In this research-in-progress paper we present the current results of several experiments in which we analyzed whether spamming Google Scholar is possible. Our results show, it is possible: We ‘improved’ the ranking of articles by manipulating their citation counts and we made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.
Researchers should have an interest in having their articles indexed by Google Scholar and other academic search engines such as CiteSeer(X). The inclusion of their articles in the index improves the ability to make their articles available to the academic community. In addition, authors should not only be concerned about the fact that their articles are indexed, but also where they are displayed in the result list. As with all ranked search results, articles displayed in top positions are more likely to be read.
In recent studies we researched the ranking algorithm of Google Scholar [1-3] and gave advice to researchers on how to optimize their scholarly literature for Google Scholar . However, there are provisos in the academic community against what we called “Academic Search Engine Optimization” . There is the concern that some researchers might use the knowledge about ranking algorithms to ‘over optimize’ their papers in order to push their articles’ rankings in non-legitimate ways.