tag:blogger.com,1999:blog-76480134507280683692024-02-02T11:40:26.698+05:30Hot Damn, Data!Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-7648013450728068369.post-67976036676502618792013-10-25T18:26:00.004+05:302014-03-22T22:10:26.616+05:30xkcd: A webcomic of the internet, small talk, meta discussions, and whimsical phantasmagoria<div dir="ltr" style="text-align: left;" trbidi="on">
I've recently rediscovered my affection for <a href="http://www.xkcd.com/" target="_blank">xkcd</a> [1], and what better way to show it than to perform a data analysis on the comic's archives. In this post, we use Latent Dirichlet Allocation (LDA) to mine for topics from xkcd strips, and see if it lives up to it's tagline of "A webcomic of romance, sarcasm, math, and language"<br />
<br />
The first thing to realize is that this problem is intrinsically different from classifying documents into topics - because the topics are not known beforehand (This is also, in a way, what the 'latent' in 'latent dirichlet allocation' means). We want to simultaneously solve the problems of discovering topic groups in our data, and assign documents to topics (The assignment metaphor isn't exact, and we'll see why in just a sec)<br />
<br />
A conventional approach to grouping documents into topics might be to cluster them using some features, and call each cluster one topic. LDA goes a step further, in that it allows the possibility of a document to arise from a combination of topics. So, for example, <a href="http://www.xkcd.com/162/" target="_blank">comic 162 </a>might be classified as 0.6 physics, 0.4 romance.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://imgs.xkcd.com/comics/angular_momentum.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="203" src="http://imgs.xkcd.com/comics/angular_momentum.jpg" width="320" /></a></div>
<br />
<b>The Data and the Features</b><br />
Processing and interpreting the contents of the images would be a formidable task, if not an impossible one. So for now, we're going to stick to the text of the comics. This is good not only because text is easier to parse, but also because it probably contains the bulk of the information. Accordingly, I <a href="https://github.com/OhLookCake/xkcd-Topics/blob/master/scripts/getTranscripts.sh" target="_blank">scraped the transcriptions </a>for xkcd comics - an easy enough task from the command line. (Yes, they are crowd-transcribed! You can find a load of webcomics transcribed at <a href="http://www.ohnorobot.com/" target="_blank">OhnoRobot</a>, but Randall Munroe has conveniently put them in the source of the xkcd comic page itself)<br />
<br />
Cleaning up the text required a number of judgement calls, and I usually went with whatever was simple. I explain these in comments <a href="https://github.com/OhLookCake/xkcd-Topics/blob/master/scripts/topicExtraction.R" target="_blank">in the code</a> - Feel free to alter it and do this in a different way.<br />
<br />
Finally, the transcripts are converted into a bag of words - exactly the kind of input LDA works with. <a href="https://github.com/OhLookCake/xkcd-Topics" target="_blank">The code is shared via github</a><br />
<br />
<b>What to Expect</b><br />
I'm not going to cover the details of how LDA works (There is an easy to understand, layman explanation <a href="http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/" target="_blank">here</a>, and a rigorous, technical one <a href="http://ai.stanford.edu/~ang/papers/nips01-lda.pdf" target="_blank">here</a>), but I'll tell you what output we're expecting: LDA is a generative modeling technique, and is going to give us <i>k</i> topics, where each 'topic' is basically a probability distribution over the set of all words in our vocabulary (all words ever seen in the input data). The values indicate the probability of each word being selected if you were trying to generate a random document from the given topic.<br />
<br />
Each topic can then be interpreted from the words that are assigned the highest probabilities.<br />
<div>
<br /></div>
<div>
<div>
<b>The Results</b></div>
<div>
I decided to go for four topics, since that's how many Randall uses to describe xkcd (romance, sarcasm, math, language). Here are the top 10 words from each topic that LDA came up with:</div>
<div>
(Some words are stemmed[2], but the word root is easily interpretable)</div>
<div>
<br /></div>
<div>
<u>Topic 1</u>: click, found, type, googl, link, internet, star, map, check, twitter. This is clearly about <b>the internet</b></div>
<div>
<u>Topic 2</u>: read, comic, line, time, panel, tri, label, busi, date, look. This one's a little fuzzy, but I think it's fair to call it <b>meta discussions</b></div>
<div>
<u>Topic 3</u>: yeah, hey, peopl, world, love, sorri, time, stop, run, stuff. No clear topic here. I'm just going to label this <b>small talk</b></div>
<div>
<u>Topic 4</u>: blam, art, ghost, spider, observ, aww, kingdom, employe, escap, hitler. A very interesting group - Let's call this <b>whimsical phantasmagoria</b></div>
<div>
<br /></div>
</div>
<div>
I arbitrarily took the top 10 words from each topic, but we could wonder how many words actually 'make' the topic[3]. This plot graphs the probability associated with the top 100 words for each topic, sorted from most to least likely.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/Mcv2qjf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="267" src="http://i.imgur.com/Mcv2qjf.png" width="400" /></a></div>
<div style="text-align: center;">
[<a href="http://imgur.com/a/kqgyZ#0" target="_blank">Link to full size image on imgur</a>]</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<div>
And individual comics can be visualized as a combination of the topic fractions they are comprised of. A few comics (Each horizontal bar is one comic. The length of the coloured subsegments denote the probability of the comic belonging to a particular topic):</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/jfmWPer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="432" src="http://i.imgur.com/jfmWPer.png" width="640" /></a></div>
<div>
<div style="text-align: center;">
[<a href="http://imgur.com/a/kqgyZ#1" target="_blank">Link to full size image on imgur</a>]</div>
<div>
<br /></div>
<div>
As expected, most comics draw largely from one or two topics, but a few cases are difficult to assign and go all over the place. </div>
<div>
<br /></div>
<div>
<b>So, what does this mean? </b><br />
Well- I-</div>
<div>
I actually don't have anything to say, so I'm just going to leave this here:</div>
<div>
<a href="http://imgs.xkcd.com/comics/philosophy.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="192" src="http://imgs.xkcd.com/comics/philosophy.png" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<a href="https://github.com/OhLookCake/xkcd-Topics" target="_blank">All code used in the process is shared in a github repository</a></div>
<div>
<br /></div>
<div>
<div>
<b>Notes</b></div>
<div>
[1] There was some love lost in the middle, but the rediscovery started mostly after <a href="http://xkcd.com/1190/" target="_blank">this comic</a></div>
<div>
[2] <a href="http://en.wikipedia.org/wiki/Stemming" target="_blank">Stemming</a> basically means reducing "love", "lovers", "loving", and other such lexical variants to the common root "lov", because they are all essentially talking about the same concept</div>
<div>
[3] Like I pointed out earlier in this post, each topic assigns a probability to each word, but we can think of the words with dominant probabilities to be what define the topic</div>
<div>
<br /></div>
</div>
<div>
<br /></div>
</div>
</div>Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.com3tag:blogger.com,1999:blog-7648013450728068369.post-65361762873048827032013-09-11T09:52:00.000+05:302014-03-22T22:05:48.686+05:30The Happiest Emoticons<div dir="ltr" style="text-align: left;" trbidi="on">
Clearly, a :) is happier than a :( but what about a :-* and a :-D ? Or a :-| and a :-o ? In this post I attempt to rank emoticons in order of how happy someone has to be to use each one. (And punctuate horribly to avoid mixing punctuation with the emoticon)<br />
<br />
To start off, I need a collection of emoticons associated with some text. And where else would I find this, but that gigantic compendium of everyday emotions, the definitive corpus of our age - Twitter.<br />
If you'd rather read the code yourself, it's available <a href="https://github.com/OhLookCake/EmoticonSentiment" target="_blank">here</a>, but the methodology is this: I <a href="https://github.com/OhLookCake/EmoticonSentiment/blob/master/scripts/gatherTweets.py" target="_blank">collect</a> <a href="https://github.com/OhLookCake/EmoticonSentiment/blob/master/data/tweetswithemoticons.txt" target="_blank">lots of tweets</a> containing emoticons, assign each one a 'sentiment' score[1], and then order the emoticons based on the average sentiment score of tweets containing each emoticon<br />
<br />
The tweet gathering process is fairly direct. I parse tweets obtained from the streaming API[2] which contain any of a set of predefined emoticons and write them out to a file. If you want to, have a look at the <a href="https://github.com/OhLookCake/EmoticonSentiment/blob/master/scripts/gatherTweets.py" target="_blank">Python code here</a>. For the purpose of the <a href="https://github.com/OhLookCake/EmoticonSentiment/blob/master/scripts/scoreEmoticons.R" target="_blank">R analysis</a>, the tweet texts are already in a file. Each line is then (a) parsed for the emoticons it contains, and (b) assigned a sentiment score[3].<br />
<br />
Finally, we plot each tweet on an emoticon-score plot. Like so:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/nlXAqDD.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="569" src="http://i.imgur.com/nlXAqDD.png" width="640" /></a></div>
<div style="text-align: center;">
[<a href="http://imgur.com/a/gIWK2#2" target="_blank">Original image on imgur</a>]</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The tiny vertical black lines mark the mean score for each emoticon.</div>
<div style="text-align: left;">
There is no ordering to the colour scale. The colours just help differentiate each row.<br />
<br />
The complete data collection, analysis and plotting code can be found on <a href="https://github.com/OhLookCake/EmoticonSentiment" target="_blank">this github repo</a></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Okay, so here's a list of observations and (partial) explanations for some surprises</div>
<div style="text-align: left;">
</div>
<ol style="text-align: left;">
<li> o.O and :* score higher than :-)<br />I think the ubiquity of :-) is its burden. People feel :-) for <i>all </i>sorts of reasons. Also, the score for o.O is computed over a much smaller number of tweets, and is possibly unstable.</li>
<li>I can understand people using :-) at sad stuff, but what kind of a person uses :-( for happy tweets? (There aren't many of these, but a couple of them are too far right.) Let's look at one of those tweets:<br /><span style="color: #3d85c6; font-family: inherit;"><br />Wow I was sleeping sooooo good which doesn't happen very often & They called from work & woke me up .. Now I can't go back to sleep :-( </span><br />That makes sense. It's a tweet that turned sour half way through, but overall, had a pretty high density of positive words, so it's no surprise that our scorer tagged it with a positive score</li>
<li>Here's a tweet with a 8D in it:<br /><br /><span style="color: #3d85c6;">Got to take a pic with heage ! Who has by far been the most fun, funny and candid lecturer(in my... http://t.co/WaOTW8D2YO</span><br /><br />Notice anything funny? It's a happy tweet, but the emoticon we were looking for, is conspicuously absent! Actually, the 8D does occur in the tweet - albeit in a url http://t.co/WaOTW<span style="color: red;">8D</span>2YO<br />Thanks to Twitter's automatic url compression using t.co, it's entirely possible to see an arbitrary collection of alphanumeric characters in a tweet without any semantic information. So be wary of the scores for stuff like 8D and xD.</li>
</ol>
<div>
So the next time you can't tell what someone is trying to convey with an emoticon, this chart might come in handy as a reference. In the meantime, if you're happy and you know it, contort your pupils o.O</div>
<br />
<br />
Notes<br />
[1] A linear scale where positive is happy, negative is unhappy<br />
[2] Twitter's Search API handles punctuation poorly, so that's not an option<br />
[3] Assignment of this score is done via a relatively simple lookup mechanism. <a href="http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010" target="_blank">This file</a> provides a good evaluation<br />
<div>
All code used has been made <a href="https://github.com/OhLookCake/EmoticonSentiment" target="_blank">available on github</a></div>
</div>Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.com8tag:blogger.com,1999:blog-7648013450728068369.post-24721660698443058182013-07-30T14:29:00.001+05:302014-03-22T22:01:35.075+05:30Visualizing Book Sentiments<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Sentiment analysis of social media content has become pretty popular of late, and a few days ago, as I lay in bed, I wondered if we could do the same thing to books - and see how sentiments vary through the story.<br />
<br />
The answer, of course, was that yes, we could. And if you’d rather just<a href="http://spark.rstudio.com/eeshan/BookSentiments/" target="_blank"> jump to an implementation you can try yourself</a>, here’s the link: <a href="http://spark.rstudio.com/eeshan/BookSentiments">http://spark.rstudio.com/eeshan/BookSentiments</a>. Upload a book (in plaintext format), and the variation of sentiment as you scroll through the pages is computed and plotted.<br />
<br />
Here are a couple of graphs that help visualize the flow of sentiments through one of my favourite novels, <i>A Tale of Two Cities</i>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/HlvXrT0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="252" src="http://i.imgur.com/HlvXrT0.png" width="640" /></a></div>
<div style="text-align: center;">
The values above zero indicate 'positive' emotions, and the values below zero indicate 'negative' emotions</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/E63xupX.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="252" src="http://i.imgur.com/E63xupX.png" width="640" /></a></div>
<div>
<div style="text-align: center;">
Red is negative, green is positive, yellow is neutral</div>
<br /></div>
<div>
The text is freely available via <a href="http://www.gutenberg.org/cache/epub/98/pg98.txt" target="_blank">Project Gutenberg</a>. The code was written in R, and deployed using the <a href="http://www.rstudio.com/shiny/" target="_blank">shiny</a> package. The <a href="http://spark.rstudio.com/eeshan/BookSentiments/" target="_blank">app itself</a> is hosted by the generous people at <a href="http://www.rstudio.com/" target="_blank">RStudio</a>. The <a href="https://github.com/OhLookCake/BookSentiments" target="_blank">code</a> is available on github at https://github.com/OhLookCake/BookSentiments/, and a basic description of the functions used to generate the scores can also be found in this post.<br />
<br /></div>
<div>
So how does the sentiment analysis really work? We use a dictionary that maps a given word to its ‘valence’ – a single integer score in the range -5 to +5. One such freely available mapping is the <a href="http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010" target="_blank">AFINN-111</a> list. </div>
<div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I read the AFINN file into
R, and used it to look up the score for each word in the book file…<o:p></o:p><br />
<br /></div>
</div>
<div>
<div class="MsoNormal">
<style type="text/css">
pre.CICodeFormatter{
font-family:arial;
font-size:12px;
border:1px dashed #CCCCCC;
width:99%;
height:auto;
overflow:auto;
background:#f0f0f0;
line-height:20px;
padding:0px;
color:#000000;
text-align:left;
}
pre.CICodeFormatter code{
color:#000000;
word-wrap:normal;
}
</style>
</div>
</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">df.sentiments[match(term,df.sentiments[,"term"]),"score"]
</code></pre>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
…divided up the scores into the desired number of parts and
averages the scores for each part…<o:p></o:p></div>
<span style="font-family: Courier New, Courier, monospace;">
<br />
</span><br />
<pre class="CICodeFormatter"><code class="CICodeFormatter"> RollUpScores <-function(scores, parts=100){
batch.size <- round(length(scores)/parts,0)
s <- sapply(seq(batch.size/2, length(scores) - batch.size/2, batch.size), function(x){
low <- x - (batch.size/2)
high <- x + (batch.size/2)
mean(scores[low:high])
})
s
}
</code></pre>
<span style="font-family: Courier New, Courier, monospace;">
<span style="font-family: Courier New, Courier, monospace;">
</span></span>
<br />
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
…And plotted the resulting data frame using ggplot2<o:p></o:p></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
Complete code available <a href="https://github.com/OhLookCake/BookSentiments/tree/master/scripts" target="_blank">here</a>. There's a version to run on a standalone R window, and a Shiny deployment version. Python files provide an alternative implementation.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As a side note, I'd like to comment on a drawback of using a lookup table for sentiment analysis – this completely overlooks the context of a keyword (“happy” in “I am not happy” certainly has a different valence than in most other scenarios). This method cannot capture such patterns. An even more difficult task is to be able to capture sarcasm. There are a <a href="http://www.aclweb.org/anthology-new/P/P11/P11-2102.pdf" target="_blank">number</a> of <a href="http://www.cs.columbia.edu/~julia/papers/davidovetal10.pdf" target="_blank">papers</a> on how to capture sarcasm in text in case you're interested, but our current approach ignores these cases. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Finally, there may or may not be an upcoming post on author prediction using sentiment analysis in book texts. In the meantime, do play around with the app/code and suggest improvements.</div>
<div>
<br /></div>
</div>Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.com2tag:blogger.com,1999:blog-7648013450728068369.post-35596002705362944792013-07-09T23:14:00.000+05:302014-03-22T21:58:57.619+05:30This, That or the Other: Of Pasta, Pokémon and The Sopranos<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: center;">
<i>Identifying proper noun categories using machine learning</i></div>
<div style="text-align: center;">
<br /></div>
<div align="left" style="margin-left: 50px;">
<div class="separator" style="clear: both; text-align: center;">
</div>
<a href="http://media.npr.org/images/branding/programs/askmeanother/ama_279.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://media.npr.org/images/branding/programs/askmeanother/ama_279.png" /></a><b>OPHIRA EISENBERG</b><i>: Jonathan, what's our next game?</i><i><br /></i><b>JONATHAN COULTON</b><i>: Well, if you have listened to </i>Ask Me Another<i> before, then this game will probably sound familiar to you. It is one of my favorites. It's called This, That, or the Other. We will name an item and all you have to do is tell us which of three categories that item belongs to. Today's categories are grains, world currencies or Pokémon characters.</i></div>
<div>
<br />
<br />
That was an excerpt from the transcript of an old episode of <a href="http://www.npr.org/" target="_blank">NPR</a>’s whimsical trivia/puzzle radio show <a href="http://www.npr.org/programs/ask-me-another/" target="_blank">Ask Me Another</a></div>
<div>
<br /></div>
<div>
<div class="MsoNormal">
Other editions of the show have asked participants to tell
between:<o:p></o:p></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
</div>
<ul style="text-align: left;">
<li><span style="text-indent: -18pt;">A tech company, a car model, or a <i>Star Wars</i>
location;</span></li>
<li><span style="text-indent: -18pt;"><i>Harry Potter</i> spell, a prescription drug, and a
piece of IKEA furniture;</span></li>
<li><span style="text-indent: -18pt;">A type of cheese, a dance move, or a <i>Moby Dick</i>
character;<br />and my personal </span><span lang="EN-IN" style="text-indent: -18pt;">favourite</span><span style="text-indent: -18pt;">,</span></li>
<li><span style="text-indent: -18pt;">A type of pasta, a title of an opera, or a
character on <i>The Sopranos</i>.</span></li>
</ul>
<div style="text-indent: -24px;">
<br /></div>
<!--[if !supportLists]--><o:p></o:p>You see how this makes for a funny radio show. Here’s how this makes for an interesting post on a data blog:<br />
<br />
If you were to guess, based on what you know about how each category sounds like, would you be likely to be right more than a third of the time? Rather, could we train a model to pick up on features that are particular to each category, and then get that model to perform better than random on unseen examples?<br />
Okay, so the data collection part isn't very difficult for a number of those categories, thanks to Wikipedia maintaining parseable articles in list form, like <a href="http://en.wikipedia.org/wiki/List_of_Pok%C3%A9mon" target="_blank">List of Pokémon</a><sup>1</sup>. I managed to easily procure lists of </div>
<div>
<br />
<ul style="text-align: left;">
<li>Currencies</li>
<li>Pokémon</li>
<li>Pastas</li>
<li>Cheese</li>
<li>Locations in the Star Wars Universe</li>
</ul>
For starters, the features I decided to create were simply the occurrence of letters and ‘bi-letters’ in each name. For example, Naboo (Die hard followers may recognise that as the place where Jar Jar Binks was from, but for the uninitiated, that’s a planet in the Star Wars universe. Just in case you were wondering), would have a ‘True’ value for features:<br />
<br />
<blockquote class="tr_bq">
n, a, b, o, ^n, na, ab, bo, oo, o$</blockquote>
<div>
<div>
<br />
The ^ and the $ are special symbols I used to indicate beginning and the end of a word respectively.</div>
<div>
<br /></div>
<div>
If you've worked in the field of Natural Language Processing, you'll recognize these features as analogues of unigrams and bigrams in language models.</div>
</div>
<div>
<br /></div>
<div>
Next, I <a href="https://github.com/OhLookCake/ThisThat" target="_blank">trained a Naïve Bayes model</a> in in Python, using the excellent NLTK libraries. I picked categories two at a time, but Naïve Bayes allows for an extension to any number of targets very naturally. Feel free to play around with the <a href="https://github.com/OhLookCake/ThisThat/tree/master/scripts" target="_blank">code</a>. You can find it on github <a href="https://github.com/OhLookCake/ThisThat" target="_blank">here</a>.</div>
<div>
<br /></div>
<div>
With just those simple kinds of features, I was able to get upwards of 80% accuracy<sup>2</sup> on most pairs. In fact, on a pair like Cheese vs Pasta (I would totally watch a movie with that title.(Oi! In some circles I could pass that for humour!)), which seem like a difficult pair to classify, I could get as much as 92% accuracy.<br />
<br />
Here’s a twist on the problem statement: What if you were designing the game, and wanted to pick the hardest items to guess? Actually, we can directly extend the results of the earlier part to get these. We simply need the items that the algorithm misclassified. So here’s a test to see how you do on the toughies. In the following set, can you guess if it’s a pasta, or a location in the Star Wars universe?<br />
<br />
<ol style="text-align: left;">
<li>Bestine</li>
<li>Quelli</li>
<li>Falleen</li>
<li>Alfabeto</li>
<li>Felucia</li>
<li>Sorprese</li>
<li>Sulorine</li>
<li>Egg barley</li>
</ol>
</div>
<div>
<br /></div>
<div>
<div>
Answers:</div>
<span style="font-size: xx-small;">Star Wars Locations: 1, 2, 3, 5, 7,<br />Pastas: 4, 6, 8 (Yeah, the last one was a giveaway<sup>3</sup> )</span></div>
<div>
<span style="background-color: black;"><br /></span></div>
<div>
Do well? Pat yourself on the back, for today, you have outwitted a machine.</div>
<div>
<br /></div>
<div>
<div class="MsoNormal">
So why does the model classify these incorrectly? This might provide some insight. Here are
the top 10 features the model picked<sup sup="">:</sup></div>
</div>
<br />
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; margin-left: 4.65pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 1184; width: 464px;">
<tbody>
<tr style="height: 15.0pt; mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td nowrap="" style="border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b>Feature/Value<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b>Dominant
cat: Lesser cat<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b>Ratio
of Occurence<o:p></o:p></b></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 1;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
li = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
23.2 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 2;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
ti = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
12.8 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 3;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
et = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
10.8 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 4;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
i$ = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
9.0 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 5;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
^p = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
8.8 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 6;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
tt = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
8.1 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 7;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
length = 5<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
starWa : pastas<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
7.9 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 8;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
ci = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
6.9 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 9;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
f = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
6.3 : 1.0<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 10; mso-yfti-lastrow: yes;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 76.8pt;" width="102"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
nn = True<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 143.7pt;" valign="bottom" width="192"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
pastas : starWars<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 127.55pt;" valign="bottom" width="170"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
6.2 : 1.0<o:p></o:p></div>
</td>
</tr>
</tbody></table>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
That’s that for these models. Here are some suggestions for other cool things you could do with the code if you have a teeny weeny bit of coding experience (No math/stats/machine learning experience required):<br />
<br />
<br />
<ul style="text-align: left;">
<li>Find out if your name looks like a grain or a kind pasta (And then go around claiming "I just don't get grain-people. Pasta-ites FTW!")</li>
<li>Gather more lists (Simpsons characters, varieties of chili, brands of cosmetics, ….) and see which ones look like which, er, other ones. (The data is represented as simply a text file, with one item on each line)</li>
<li>See which the cheesiest pastas are! (I'm so terribly sorry. I'll never ever try to be funny again. Ever.) </li>
</ul>
<div>
I’m working on a <a href="http://www.r-project.org/" target="_blank">R</a> + <a href="http://www.rstudio.com/shiny/" target="_blank">Shiny</a> app that would make all of the above much easier for the casual browser, but it’s much more fun to play around with the code, don’t you think?</div>
</div>
<div>
<br /></div>
<div>
Don't you?</div>
<div>
Don't leave me hanging here, guys.</div>
<div>
Guys?</div>
<div>
Fine. I'll just make the app.</div>
<div>
<br />
<br /></div>
<div>
<div>
Footnotes</div>
<div>
<sup>1</sup> While on that topic, check out these weird lists on Wikipedia: </div>
<div>
<a href="http://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes" target="_blank">List of helicopter prison escapes</a>, <a href="http://en.wikipedia.org/wiki/List_of_individual_dogs#Faithful_after_master.E2.80.99s_death" target="_blank">List of dogs faithful after hteir masters' deaths</a>, <a href="http://en.wikipedia.org/wiki/List_of_fictional_swords" target="_blank">List of fictional swords</a>, <a href="http://en.wikipedia.org/wiki/List_Of_Fictitious_Jews" target="_blank">List of fictional Jews</a></div>
<div>
<sup>2</sup> I’m using accuracy as simply #correct predictions/(#correct predictions + #incorrect predictions)</div>
<div>
<sup>3</sup> Only goes to show that there are more/better features to be derived here.</div>
<div>
<sup>4</sup> I was working with a slightly different version here, which included a variable for length. Don't be thrown off by that.</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
</div>Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.com2tag:blogger.com,1999:blog-7648013450728068369.post-45917008117484481792013-06-22T11:55:00.000+05:302014-03-22T21:53:19.319+05:30Warrior Zombies from Outer Space II: Mayhem Unleashed<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div>
Given the speed at which I consume them, it's only justified that the first post on this blog is about movies. (Although, by that logic, it could have equally well been about sandwiches, Nutella, or tissue paper. Note to self: Look for a Nutella consumption dataset)</div>
<div>
Anyway, this post is about movie taglines - specifically, the words that constitute them.</div>
<div>
The data is pretty much there for the picking – IMDb hosts a number of freely available<sup>1</sup> <a href="http://www.imdb.com/interfaces" target="_blank">datasets</a>, and one of them is about taglines.<br />
<br />
The data is in an odd format, but at least it’s all available in one place. After the <a href="https://github.com/OhLookCake/GutenTag/blob/master/scripts/1_makejson.py" target="_blank">coding equivalent</a> of jamming the fork into the toaster and jerking it around until something pops, I have the data in a usable structure<br />
<br />
Once here, R’s <a href="http://cran.r-project.org/web/packages/tm/index.html" target="_blank"><i>tm</i> package</a> makes quick work of the word frequency analysis, and I have derived a dataset with common words and their frequencies in movie titles. After removing some highly frequent words in English (articles, pronouns, some prepositions, etc.). Here’s a list of the most used words in movie taglines, ordered by frequency:<br />
<br />
<blockquote class="tr_bq">
love, life, story, world, time, film, comedy, death, woman, don’t</blockquote>
<br />
Not many surprises there – until we look at the fraction of taglines these terms occur in:<br />
<br />
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: none; margin-left: 85.75pt; mso-border-alt: solid windowtext .5pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 1184; width: 128px;">
<tbody>
<tr style="height: 15.0pt; mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td nowrap="" style="border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
7.5%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 1;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
6.0%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 2;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
5.0%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 3;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
3.8%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 4;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
3.1%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 5;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
2.3%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 6;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
comedy<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
2.2%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 7;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
death<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
2.1%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 8;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
woman<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
2.1%<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 9; mso-yfti-lastrow: yes;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
dont<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
1.9%<o:p></o:p></div>
</td>
</tr>
</tbody></table>
<br />
<br />
These numbers are way higher than I expected. ‘Love’ alone occurs in a whopping 7.5% of all movie taglines!<br />
Here’s a visual representation of the words you'd have seem most often in movie taglines (the size of each word is proportional to the frequency of its occurrence)<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8WXqhwJFLVa3aqr4bfPvsoBmmYOGPCEp63a558LuZ4AV88E_sF0i0lKhOn-d0WrcpaKEealXaJoE-2PmUQd6ikyWL7AB388Hvxk1WfOYSm_g1NFqfLPzovtMA5PHHjmPQdUqwALJpYP4/s1600/wordcloud_global.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8WXqhwJFLVa3aqr4bfPvsoBmmYOGPCEp63a558LuZ4AV88E_sF0i0lKhOn-d0WrcpaKEealXaJoE-2PmUQd6ikyWL7AB388Hvxk1WfOYSm_g1NFqfLPzovtMA5PHHjmPQdUqwALJpYP4/s400/wordcloud_global.png" height="400" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
[<a href="http://imgur.com/1P0l7VO" target="_blank">Full size image on imgur</a>]</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both;">
Yes Hollywood, we see right through you.</div>
<div class="separator" style="clear: both;">
The <a href="https://github.com/OhLookCake/GutenTag/blob/master/scripts/2_simpleFrequentTerms.R" target="_blank">R code</a> to parse the data and make the word cloud is available at github if you're interested </div>
<div class="separator" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both;">
I'm kinda keen to know if this trend has been constant through the years. Let's do the same thing, except looking at the taglines decade by decade. Here’s the list of top 10 words in movie taglines from each decade<sup>2</sup> – from the fifties to the teens (Teens? Onesies? I like ‘onesies’)</div>
<div class="separator" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; margin-left: 4.65pt; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 1184; width: 599px;">
<tbody>
<tr style="height: 15.0pt; mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td nowrap="" style="border: solid windowtext 1.0pt; height: 15.0pt; mso-border-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1940s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1950s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1960s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1970s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1980s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>1990s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>2000s<o:p></o:p></b></div>
</td>
<td nowrap="" style="border-left: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b>2010s</b><o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 1;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
action<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 2;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
love<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 3;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
story<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 4;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
thrills<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
terror<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
picture<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 5;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
adventure<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
adventure<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
hes<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
sometimes<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 6;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
romance<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
woman<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
woman<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
world<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
comedy<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
sometimes<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
dont<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 7;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
gun<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
screen<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
adventure<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
movie<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
comedy<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
hes<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 8;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
west<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
picture<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
life<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
death<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
adventure<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
murder<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
dont<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 9;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
thrill<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
gun<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
time<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
terror<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
terror<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
film<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
comedy<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
family<o:p></o:p></div>
</td>
</tr>
<tr style="height: 15.0pt; mso-yfti-irow: 10; mso-yfti-lastrow: yes;">
<td nowrap="" style="border-top: none; border: solid windowtext 1.0pt; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
screen<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
girl<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
motion<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
hes<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 60.0pt;" valign="bottom" width="80"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
movie<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 48.0pt;" valign="bottom" width="64"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
dont<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
family<o:p></o:p></div>
</td>
<td nowrap="" style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; height: 15.0pt; mso-border-bottom-alt: solid windowtext .5pt; mso-border-right-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 50.65pt;" valign="bottom" width="68"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
cant<o:p></o:p></div>
</td>
</tr>
</tbody></table>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
Again, I think a visual representation might come in handy</div>
</div>
<div style="text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5_PHxisKqgEMtkbGrFbnRsIzbwNFQ-8GhKR-zpvGlXx30pyA9OcdiJ5rDgnIv9nk1wlyP4umko0duDCnEUweeAC8_RCTuUY7wu5B361SQHrwoTEDSp2s30LFt16OXIjhFKjLNhXUGKyk/s1600/Top10_Tilemap.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5_PHxisKqgEMtkbGrFbnRsIzbwNFQ-8GhKR-zpvGlXx30pyA9OcdiJ5rDgnIv9nk1wlyP4umko0duDCnEUweeAC8_RCTuUY7wu5B361SQHrwoTEDSp2s30LFt16OXIjhFKjLNhXUGKyk/s400/Top10_Tilemap.png" height="390" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
[<a href="http://imgur.com/6eCm5O4" target="_blank">Full size image on imgur</a>]</div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
‘story’ and ‘love’ are part of the Top 10 list in each decade, but the other words are distinctly symbolic of the movies of each era:</div>
<div style="text-align: left;">
<br /></div>
<ul style="text-align: left;">
<li style="text-align: left;">The 40s are the years of ‘action’, ‘adventure’, ‘thrills’ and ‘west’</li>
<li style="text-align: left;">The 50s go slightly more romantic, and scale stuff up, adding ‘woman’, ‘girl’, ‘world’ and ‘terror’</li>
<li style="text-align: left;">In the 60s, ‘girl’ is out, but ‘woman’ is still in; No more ‘gun’ and ‘terror’. instead, it’s about ‘life’ and ‘time’, both of which are here to stay</li>
<li style="text-align: left;">‘terror’ makes a comeback in the 70s; ‘adventure’ goes out. And ‘death’ is explored.</li>
<li style="text-align: left;">In the 80s, ‘comedy’ makes the list for the first time</li>
<li style="text-align: left;">The 90s are the only time ‘murder’ was cool.</li>
<li style="text-align: left;">The 00s (I like to call these the noughties) and (the early part of) the 10s show a distinct change in values that sell.’family’, ‘sometimes’ and ‘cant’ are popular</li>
</ul>
<div style="text-align: left;">
<br /></div>
<div>
<div style="text-align: left;">
I've put up <a href="http://imgur.com/a/5RpdG#1" target="_blank">word clouds for each decade</a>, and <a href="https://github.com/OhLookCake/GutenTag/blob/master/scripts/3_frequentTermsByDecade.R" target="_blank">the code to generate them</a> on imgur and github.</div>
</div>
<div>
<div style="text-align: left;">
<br />
So that’s that for frequent words. But I'm also after words that are frequent exclusively in high (or low) rated movies. Or to look at it another way, words that, in retrospect, are indicative of the movie’s success.<br />
<br /></div>
<div style="text-align: left;">
One way of doing this is to segment the data into different parts by performance, and do the same analysis as above. But the prior frequencies will likely dominate these lists. What I really want is words whose presence (or absence) is highly indicative of the movie’s rating.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
NOTE: Some math to follow. If you're uncomfortable with arithmetic and/or statistics, skip a couple of paragraphs.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
For a given term, if D1 is the distribution of movie ratings with the term present in the tagline, and D2 is the distribution of movie ratings with the term absent in the tagline, I'm going to define my divergence/separation metric as<sup>3,4</sup>:</div>
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxRwQshba_WYLTmtBDdHrVRbSPTp7bq8Owx19khlyS4dQ97u5Af-9gaRURMirwCyjIh9PclVrGMvAXffDgjPbbWS-W8DhlnDcWqrQ-isxuMzXoj84UYk-ICSdsUEoZWnKqypDZfhbvbOo/s1600/divergence.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxRwQshba_WYLTmtBDdHrVRbSPTp7bq8Owx19khlyS4dQ97u5Af-9gaRURMirwCyjIh9PclVrGMvAXffDgjPbbWS-W8DhlnDcWqrQ-isxuMzXoj84UYk-ICSdsUEoZWnKqypDZfhbvbOo/s1600/divergence.png" /></a></div>
<div style="text-align: left;">
<br /></div>
</div>
<div>
<div style="text-align: left;">
<Obligatory CORRELATION DOES NOT IMPLY CAUSATION warning> </div>
</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Adding such words will not automatically make your movie successful – this is offered a post-event descriptive analysis, not a predictive one. I'm not implying any causality here.</div>
<!--[if gte msEquation 12]><m:oMathPara><m:oMath><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;line-height:
115%;font-family:"Cambria Math","serif";mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>Divergence</m:r><m:r>= </m:r></span></i><m:f><m:fPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:fPr><m:num><m:sSup><m:sSupPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:sSupPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:"Cambria Math","serif";
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi;
mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:
AR-SA'><m:r>(</m:r><m:r>mean</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:
"Cambria Math","serif";mso-fareast-font-family:Calibri;mso-fareast-theme-font:
minor-latin;mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:
minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA'><m:r>D</m:r><m:r>1</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>-</m:r><m:r>mean</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:
"Cambria Math","serif";mso-fareast-font-family:Calibri;mso-fareast-theme-font:
minor-latin;mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:
minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA'><m:r>D</m:r><m:r>2</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>)</m:r></span></i></m:e><m:sup><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>2</m:r></span></i></m:sup></m:sSup></m:num><m:den><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>sd</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:"Cambria Math","serif";
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi;
mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:
AR-SA'><m:r>D</m:r><m:r>1</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>+</m:r><m:r>sd</m:r><m:r>(</m:r><m:r>D</m:r><m:r>2)</m:r></span></i></m:den></m:f></m:oMath></m:oMathPara><![endif]--><!--[if !msEquation]--><v:shapetype coordsize="21600,21600" filled="f" id="_x0000_t75" o:preferrelative="t" o:spt="75" path="m@4@5l@4@11@9@11@9@5xe" stroked="f">
<v:stroke joinstyle="miter">
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0">
<v:f eqn="sum @0 1 0">
<v:f eqn="sum 0 0 @1">
<v:f eqn="prod @2 1 2">
<v:f eqn="prod @3 21600 pixelWidth">
<v:f eqn="prod @3 21600 pixelHeight">
<v:f eqn="sum @0 0 1">
<v:f eqn="prod @6 1 2">
<v:f eqn="prod @7 21600 pixelWidth">
<v:f eqn="sum @8 21600 0">
<v:f eqn="prod @7 21600 pixelHeight">
<v:f eqn="sum @10 21600 0">
</v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:formulas>
<v:path gradientshapeok="t" o:connecttype="rect" o:extrusionok="f">
<o:lock aspectratio="t" v:ext="edit">
</o:lock></v:path></v:stroke></v:shapetype><v:shape id="_x0000_i1025" style="height: 30.75pt; width: 201.75pt;" type="#_x0000_t75">
<v:imagedata chromakey="white" o:title="" src="file:///C:\Users\EESHAN~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png">
</v:imagedata></v:shape><!--[endif]--><!--[if gte msEquation 12]><m:oMathPara><m:oMath><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;line-height:
115%;font-family:"Cambria Math","serif";mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>Divergence</m:r><m:r>= </m:r></span></i><m:f><m:fPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:fPr><m:num><m:sSup><m:sSupPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:sSupPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:"Cambria Math","serif";
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi;
mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:
AR-SA'><m:r>(</m:r><m:r>mean</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:
"Cambria Math","serif";mso-fareast-font-family:Calibri;mso-fareast-theme-font:
minor-latin;mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:
minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA'><m:r>D</m:r><m:r>1</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>-</m:r><m:r>mean</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:
"Cambria Math","serif";mso-fareast-font-family:Calibri;mso-fareast-theme-font:
minor-latin;mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:
minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA'><m:r>D</m:r><m:r>2</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>)</m:r></span></i></m:e><m:sup><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>2</m:r></span></i></m:sup></m:sSup></m:num><m:den><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>sd</m:r></span></i><m:d><m:dPr><span
style='font-family:"Cambria Math","serif";mso-ascii-font-family:"Cambria Math";
mso-hansi-font-family:"Cambria Math";font-style:italic;mso-bidi-font-style:
normal'><m:ctrlPr></m:ctrlPr></span></m:dPr><m:e><i style='mso-bidi-font-style:
normal'><span style='font-size:11.0pt;line-height:115%;font-family:"Cambria Math","serif";
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi;
mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:
AR-SA'><m:r>D</m:r><m:r>1</m:r></span></i></m:e></m:d><i
style='mso-bidi-font-style:normal'><span style='font-size:11.0pt;
line-height:115%;font-family:"Cambria Math","serif";mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA'><m:r>+</m:r><m:r>sd</m:r><m:r>(</m:r><m:r>D</m:r><m:r>2)</m:r></span></i></m:den></m:f></m:oMath></m:oMathPara><![endif]--><!--[if !msEquation]--><v:shapetype coordsize="21600,21600" filled="f" id="_x0000_t75" o:preferrelative="t" o:spt="75" path="m@4@5l@4@11@9@11@9@5xe" stroked="f">
<v:stroke joinstyle="miter">
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0">
<v:f eqn="sum @0 1 0">
<v:f eqn="sum 0 0 @1">
<v:f eqn="prod @2 1 2">
<v:f eqn="prod @3 21600 pixelWidth">
<v:f eqn="prod @3 21600 pixelHeight">
<v:f eqn="sum @0 0 1">
<v:f eqn="prod @6 1 2">
<v:f eqn="prod @7 21600 pixelWidth">
<v:f eqn="sum @8 21600 0">
<v:f eqn="prod @7 21600 pixelHeight">
<v:f eqn="sum @10 21600 0">
</v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:f></v:formulas>
<v:path gradientshapeok="t" o:connecttype="rect" o:extrusionok="f">
<o:lock aspectratio="t" v:ext="edit">
</o:lock></v:path></v:stroke></v:shapetype><v:shape id="_x0000_i1025" style="height: 30.75pt; width: 201.75pt;" type="#_x0000_t75">
<v:imagedata chromakey="white" o:title="" src="file:///C:\Users\EESHAN~1\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png">
</v:imagedata></v:shape><!--[endif]--><br />
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
</warning></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
This divergence is just a magnitude - so I had to separate the most related ‘good movie’ keywords list from the ‘bad movie’ keywords list.</div>
<div style="text-align: left;">
So, without further math or ado, the 10 terms that correspond to highest ratings:</div>
<div>
<div>
<div style="text-align: left;">
<br /></div>
</div>
<div>
<blockquote class="tr_bq">
<div style="text-align: left;">
animation, masterpiece, vision, magnificent, production, startling, french, glorious, smashing, grand </div>
</blockquote>
</div>
<div>
<div style="text-align: left;">
<br /></div>
</div>
<div>
<div style="text-align: left;">
And the 10 terms that correspond to the lowest ratings:</div>
</div>
<div>
<div style="text-align: left;">
<br /></div>
</div>
<div>
<blockquote class="tr_bq">
<div style="text-align: left;">
outer, zombies, ancient, woods, experiment, pray, tonight, mayhem, warrior, unleashed</div>
</blockquote>
</div>
<div>
<div style="text-align: left;">
<br />
Again, I've put up <a href="https://github.com/OhLookCake/GutenTag/blob/master/scripts/4_Divergence.R" target="_blank">code to generate these lists</a> on github</div>
<div style="text-align: left;">
<br /></div>
</div>
<div>
<div style="text-align: left;">
If these lists make you have second thoughts about making <i>Warrior Zombies from Outer Space II: Mayhem Unleashed</i>, don't be disheartened - because like I said earlier, there is certainly a correlation, but it’s not necessarily a causal relationship<sup>5</sup>. And hey, I know a bunch of people who would watch the hell out of that movie.</div>
</div>
</div>
<div>
<div style="text-align: left;">
<br /></div>
<div class="MsoNormal">
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Footnotes<o:p></o:p></div>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
<sup>1</sup> Going through and adhering to the legal clauses for use
for the datasets is left as an exercise for the reader<o:p></o:p></div>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
<sup>2</sup> The punctuation has been removed from the data to make
the analysis easier. So if you see “cant”, that’s probably “can't”, and so on.<o:p></o:p></div>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
<sup>3</sup> It is possible that a better metric might have been used,
or even a simpler one, but for some reason, I went with this. Other suggestions
are welcome.<o:p></o:p></div>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
<sup>4</sup> IMDb ratings are arguably, not the best indicators of
movie success, but that's certainly one way of estimating, and there is probably going to a
future post analyzing how reliable a measure this.</div>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
<sup>5</sup> EDIT: Revisiting this, the final two lists of words don't
seem particularly robust. <o:p></o:p></div>
</div>
</div>
<div>
<div style="text-align: left;">
<br /></div>
</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
</div>
</div>Eeshan Malhotrahttp://www.blogger.com/profile/05553017788343435698noreply@blogger.com0