For finding scientific literature? For obtaining citation counts and publication lists of researchers? Have you ever thought about how trustworthy the information is you get on Google Scholar?

My colleague and I performed several tests with Google Scholar and found out that it is really easy to fool Google Scholar. You can easily increase citation counts of articles and therefore increase the article’s rankings. You can easily add invisible keywords to articles and make the article appear relevant for searches it actually isn’t. You can also create complete non-sensical articles with the paper generator SciGen and make Google Scholar index them. And you can place any kind of advertisement in manipulated articles and make users of Google Scholar downloading them.

Of course, our results do not mean that you cannot trust Google Scholar at all or shouldn’t use it at all. Despite our results I am using Google Scholar frequently – imho it’s still the best academic search engine on the market. However, as with all other search engines you should be aware that there might be spam and manipulated information and you should really be carefully using citation counts from Google Scholar. Maybe there are no, or little, manipulations right now. But the more citation counts from Google Scholar are used for performance evaluations, the higher the incentive for researchers to manipulate them (and, as said, it’s really easy).

What I am interested in now is: What’s you opinion on this subject? Have you every found something on Google Scholar that was suspicious? Please let me know.

If you are interested in more information read the full article, titled “Academic Search Engine Spam and Google Scholar’s Resilience Against it”, here.

Update 2010/12/31:

We got a few questions when we did the experiments on Google Scholar (unfortunately we didn’t state that in the paper). The answer: Between early 2009 and mid of 2009. We first submitted the paper to WWW2010 in November 2009 but it was rejected. Well, and then it took… many many month (and edits) before the Journal of Electronic Publishing finally accepted and published the paper :-).

