Extracting Useful Data from Twitter for Methodological Evaluation – Part I

Posted By mcarter on Mar 7, 2013 | 0 comments

A facial reconstruction of King Richard III, based on an analysis of his recently identified remains and artist portrayals over the years, was unveiled by an eponymous historical society on Tuesday. (Rex Features / AP Images) @SmithsonianMag on Twitter

There has been a lot of talk about “Digital Archaeology” being “Public Archaeology” recently.  As part of my Methods class this semester, I wanted to put that assumption to the test and decided to analyse Twitter feeds on specific subjects against possible uptakes on other subjects.  In the last 365 days, the major archaeological event to occur has been the discovery and more importantly, confirmation of Richard III’s bones buried under a Council parking lot in Leicester.  The story is ripe for public engagement, especially since the Bard skewered Richard III in Tudor times to such an extent that he’s readily seen as a dastardly villain today!

Extracting meaningful data from any source is always a challenge.  With Twitter, it’s as David L. Driscoll et al coined as mixed methods research (2007), both qualitative and quantitative material extracted sometimes in meaningful chunks.  Several tools exist to do this type of analysis, but I found Topsy.com to be a great tool for first time Twitter data extractors like myself.  It’s as easy as typing in a Twitter hashtag or a subject heading and the system generates a report on the quantity and quality of Twitter results for that particular subject.


Doing a scan today on March 7 2013, over the last 365 days, we find that the hashtag, #richardiii has had 96,516 Tweets (as seen above in the chart generated in Topsy).  In that time, as displayed in the graph, there are two major events which help to accelerate and promote Twitter engagement around the topic of Richard III.

RichardIII 365 Twitter Analysis

They roughly occur at the Sept 12, 2012 mark when archaeologists confirm they have discovered what they think are Richard III’s bones and on Feb 4, 2013 when the University of Leicester confirms that DNA testing and physical analysis of the bones by qualitative means through oral histories and written accounts confirm that Richard III has been discovered.

Digging further down, on the Sept 12th, there were 1565 #richardiii Tweets.  Topsy can return data on the top Tweets and the content for that day which reveals the top # of Tweets coming from a journalist from BBC History Magazine who is supposedly embedded with the archaeologists at the moment of discovery.  The second top set of Tweets comes from Medievalists.net who is tweeting news from the Richard III Society about the success of finding Richard III.

Feb4 TopTweetsIn comparison on Feb 4th, when Richard III’s bones where confirmed, there where 66,696 Tweets.  The top tweet with over 1000 tweets and re-tweets was a Twitter handle by the name of @queen_uk Elizabeth Windsor, who’s message was; Don’t even think about putting one under a car park in Slough, which is a tounge-in-cheek reference to the industrial city just North of Windsor.  The second highest was from BBC Breaking News, reporting that the Mayor of Leicester had announced that Richard III was going to be reinterred at Leicester Cathedral (which, as we will see in next weeks blog has some interesting elements of it’s own).  The last three top tweets where from BBC and The Guardian reporting on official archaeological and scientific information.

Now, you’ll notice that the Top Tweets seem to be skewed in the Topsy screen shot above.  This is because Elizabeth Windsor, with 1000K tweets and retweets, actually had the fastest uptake amongst other twitter users.  BBC Breaking News, with over 4K tweets and retweets is the largest volume, yet the news spread slower than the joke.

Topsy also has the ability to breakdown the tweets into quantitative data.  However, as Driscoll et al (2007) discussed, Topsy can also quantitize, albeit with mixed results, the qualitative nature of the tweet into twitter industry accepted terms of Positive, Neutral and Negative Sentiments.  That is, the emotional value of the Tweet as written by the Tweeter through Sentiment Analysis or natural language processing (NLP).

Part II of our exploration into mixed methods research using Twitter analysis next week, I will explore some of the issues around the data generation in Twitter and specifically Topsy as well as see if my assumptions are correct from a Twitter perspective, that when an archaeological event like the discovery of Richard III happens, people become more publicly engaged in archaeology overall.




Driscoll, D.L., Appiah-Yeboah, A., Salib, P. and Rupert, D.J., 2007. Merging Qualitative and Quantitative Data in Mixed Methods Research: How To and Why Not. Ecological and Environmental Anthropology 3(1): 19-28.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.