The Happiest Emoticons ~ Hot Damn, Data!

Wednesday, September 11, 2013

The Happiest Emoticons

Clearly, a :) is happier than a :( but what about a :-* and a :-D ? Or a :-| and a :-o ? In this post I attempt to rank emoticons in order of how happy someone has to be to use each one. (And punctuate horribly to avoid mixing punctuation with the emoticon)

To start off, I need a collection of emoticons associated with some text. And where else would I find this, but that gigantic compendium of everyday emotions, the definitive corpus of our age - Twitter.
If you'd rather read the code yourself, it's available here, but the methodology is this: I collect lots of tweets containing emoticons, assign each one a 'sentiment' score[1], and then order the emoticons based on the average sentiment score of tweets containing each emoticon

The tweet gathering process is fairly direct. I parse tweets obtained from the streaming API[2] which contain any of a set of predefined emoticons and write them out to a file. If you want to, have a look at the Python code here. For the purpose of the R analysis, the tweet texts are already in a file. Each line is then (a) parsed for the emoticons it contains, and (b) assigned a sentiment score[3].

Finally, we plot each tweet on an emoticon-score plot. Like so:

[Original image on imgur]

The tiny vertical black lines mark the mean score for each emoticon.

There is no ordering to the colour scale. The colours just help differentiate each row.

The complete data collection, analysis and plotting code can be found on this github repo

Okay, so here's a list of observations and (partial) explanations for some surprises

o.O and :* score higher than :-)
I think the ubiquity of :-) is its burden. People feel :-) for all sorts of reasons. Also, the score for o.O is computed over a much smaller number of tweets, and is possibly unstable.
I can understand people using :-) at sad stuff, but what kind of a person uses :-( for happy tweets? (There aren't many of these, but a couple of them are too far right.) Let's look at one of those tweets:

Wow I was sleeping sooooo good which doesn't happen very often & They called from work & woke me up .. Now I can't go back to sleep :-(
That makes sense. It's a tweet that turned sour half way through, but overall, had a pretty high density of positive words, so it's no surprise that our scorer tagged it with a positive score
Here's a tweet with a 8D in it:

Got to take a pic with heage ! Who has by far been the most fun, funny and candid lecturer(in my... http://t.co/WaOTW8D2YO

Notice anything funny? It's a happy tweet, but the emoticon we were looking for, is conspicuously absent! Actually, the 8D does occur in the tweet - albeit in a url http://t.co/WaOTW8D2YO
Thanks to Twitter's automatic url compression using t.co, it's entirely possible to see an arbitrary collection of alphanumeric characters in a tweet without any semantic information. So be wary of the scores for stuff like 8D and xD.

So the next time you can't tell what someone is trying to convey with an emoticon, this chart might come in handy as a reference. In the meantime, if you're happy and you know it, contort your pupils o.O

Notes
[1] A linear scale where positive is happy, negative is unhappy
[2] Twitter's Search API handles punctuation poorly, so that's not an option
[3] Assignment of this score is done via a relatively simple lookup mechanism. This file provides a good evaluation

All code used has been made available on github

8 comments:

Mayur MamaSeptember 11, 2013 at 11:54 PM
Hmm...such a high score for o.O compared to o_O and O_o is quite interesting especially when they are used interchangeably. I'm guessing a sentence/word ending in 'o' and the next starting with 'O'(leading to a 'o.O') will also contribute to this. If not English, maybe Italian or something?
ReplyDelete
Replies
EvanSeptember 12, 2013 at 11:19 AM
You might lose some genuine emoticons (it's not like there's an absence of data in twitter), but could you refine the search so that you're only looking for emoticons that have a space in front? E.g. " 8D" Unless the emoticon is at the beginning of the tweet, this should remove the emoticons within urls or other strange non-emotional occurrences.
ReplyDelete
Replies
RasmusSeptember 12, 2013 at 2:32 PM
And you could conclude that there are way more positive tweets in the world than negative. o.O (Unless most of the negative tweets are hidden in negative emoticons not included.)
ReplyDelete
Replies
bbischofOctober 1, 2013 at 10:58 AM
Any references on your sentiment visualization? It is really great.
ReplyDelete
Replies

Add comment

Hot Damn, Data!

Wednesday, September 11, 2013

The Happiest Emoticons

8 comments:

Popular Posts

About the Blog

About the Author

Blog Roll

Labels