{"id":3185,"date":"2017-04-14T16:12:27","date_gmt":"2017-04-14T21:12:27","guid":{"rendered":"http:\/\/commons.trincoll.edu\/amst-data-driven\/?p=3185"},"modified":"2017-04-15T15:40:19","modified_gmt":"2017-04-15T20:40:19","slug":"classic-data-mapping-lab-5-working-your-twitter-data-into-graphs-and-stats","status":"publish","type":"post","link":"http:\/\/commons.trincoll.edu\/amst-data-driven\/2017\/04\/14\/classic-data-mapping-lab-5-working-your-twitter-data-into-graphs-and-stats\/","title":{"rendered":"Classic Data Mapping &#8211; Lab 5: Working Your Twitter Data into Graphs and Stats"},"content":{"rendered":"<p>In Today&#8217;s post, we are utilizing the classic instruments of Microsoft Excel to produce graphs and statistics that summarize our data. The first avenue we will explore is all the different languages within the data set. In my data set, I found: ar (Arabic), ca (French Canadian), cs (C#), de (German), en (English), en-gb (Great Britain), es (Spanish), es-MX (Mexico), fi (Finnish), fr (French), it (Italian), ja (Japanese), nl (Netherlands), pt (Portuguese), ru (Russian), sv (Swedish),and tr (Turkish). Analyzing the data by volume, English clearly is the language for the most tweets with 2,290. The percentage of tweets in English, as a result, is 76%.<\/p>\n<p><img loading=\"lazy\" class=\" wp-image-3385 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/Chart-brian-300x186.jpg\" alt=\"Chart brian\" width=\"765\" height=\"474\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/Chart-brian-300x186.jpg 300w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/Chart-brian-768x475.jpg 768w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/Chart-brian.jpg 808w\" sizes=\"(max-width: 765px) 100vw, 765px\" \/><\/p>\n<p>The above pie chart proves that the majority of the tweets in the data set are in English which makes sense considering that it is most prominent in the United States. But that is not to say that English is the most dominant language in the United States in comparison to Spanish by this margin. Perhaps there are fewer Spanish American participants on twitter than there are English American participants. Considering the current political climate and the President&#8217;s insistence on the fact that the media portrays fake news, and even those in opposition who claim the same thing about the President, it makes sense that these tweets would stem mostly from the United States. Perhaps in order to be more heard in their expression of voice, those who are predominantly Spanish-speaking chose to write their tweets in English. However these results turned out, it is important to talk about the Twitter language algorithm:<\/p>\n<p>&#8220;To measure performance on Twitter overall, we can simply take a uniform sample of all Tweets and manually annotate it. The problem is that each annotator can recognize only one or two languages, and it\u2019s prohibitively inefficient and expensive to have every annotator look at every Tweet&#8221;\u00a0(&#8220;Evaluating Language Identification Performance | Twitter Blogs&#8221;). Twitter does mention that they cannot determine around 0.18% of the languages on Twitter, so I don&#8217;t believe this statistic is significant. The rest of the languages seem to have news sources of their own that they feel are untrustworthy (especially Arabic considering the climate over in the middle east), or are up to date on the United States.<\/p>\n<p>Now it&#8217;s time to take a look at a number of tweets per day. Given my subset of tweets that I selected to analyze, there are only four dates that are contained within this set, yet it is interesting to look at considering they are the most recent. The dates are between April 12th, 2017 and April 15th (today), 2017.<\/p>\n<p><img loading=\"lazy\" class=\" wp-image-3387 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/days-brian-300x188.jpg\" alt=\"days brian\" width=\"809\" height=\"507\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/days-brian-300x188.jpg 300w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/days-brian-480x300.jpg 480w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/04\/days-brian.jpg 648w\" sizes=\"(max-width: 809px) 100vw, 809px\" \/><\/p>\n<p>In most recent news, the MOAB was dropped in Afghanistan per the order of the United States administration. I could see how this could get people talking about fake news in the sense that there was absolutely no notification about this occurrence prior to the incident, nor was there any reputable coverage on the matter for a while. I remember the first time I heard about it was from a news source in which I have never heard of before, because the major networks didn&#8217;t have access to audiovisual content on the strike, yet this source did. I questioned it immediately, which could be scoped out to imply that many other people felt the same way. There are multiple other political reasons why people could be tweeting about fake news HEAVILY on the 13th and 14th of April.<\/p>\n<table width=\"143\">\n<tbody>\n<tr>\n<td width=\"69\">Average<\/td>\n<td width=\"74\">750<\/td>\n<\/tr>\n<tr>\n<td>Median<\/td>\n<td>742.5<\/td>\n<\/tr>\n<tr>\n<td>Mode<\/td>\n<td>N\/A<\/td>\n<\/tr>\n<tr>\n<td>SUM<\/td>\n<td>3000<\/td>\n<\/tr>\n<tr>\n<td>MAX<\/td>\n<td>1395<\/td>\n<\/tr>\n<tr>\n<td>MIN<\/td>\n<td>120<\/td>\n<\/tr>\n<tr>\n<td>RANGE<\/td>\n<td>1275<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Given the time frame of my analysis, my mean seems to be much higher than most of my classmates because it is a more narrow window and there was a big political event that caused a major spike in the discussion. In that sense, my data does stand out. I chose a smaller subset than most of my other classmates, thus my numbers start to blend in more so. Twitter clearly makes access to this data fairly simple to achieve in that I was able to easily determine trends among all those who had &#8216;fake news&#8217; as a hashtag.<\/p>\n<p>As mentioned, I chose a subset of 3000 tweets to analyze, but given that, I was right up there with the highest number of tweets because of the spike causing event. I stand out in this sense, but my range falls in the middle of my classmates.<\/p>\n<p>What really gets me thinking is the fact that people clearly resort to fake news as an answer for a lot of the contemporary issues that occur. Whether people see something they do not like, do not want to believe, do not believe is portray correctly, is actually wrong or do not see anything at all and believe they should, they will take on this anti-media view. It really does not matter what the event is. The fact that my data only analyzed four days and I received this type of data is incredible and it would be valuable to go back into the set and grab all the data and match them up with other events. I would hypothesize the same as I have in that no matter what political event occurs, a group of people will blame fake news or comment on those who provide recourse to it.<\/p>\n<p>Bibliography:<\/p>\n<p>&#8220;Evaluating Language Identification Performance | Twitter Blogs&#8221;. <i>Blog.twitter.com<\/i>. N.p., 2017. Web. 12 Apr. 2017.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Today&#8217;s post, we are utilizing the classic instruments of Microsoft Excel to produce graphs and statistics that summarize our data. The first avenue we will explore is all the different languages within the data set. In my data set, I found: ar (Arabic), ca (French Canadian), cs (C#), de (German), en (English), en-gb (Great&#8230;<\/p>\n","protected":false},"author":1160,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/3185"}],"collection":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/users\/1160"}],"replies":[{"embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/comments?post=3185"}],"version-history":[{"count":11,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/3185\/revisions"}],"predecessor-version":[{"id":3389,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/3185\/revisions\/3389"}],"wp:attachment":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/media?parent=3185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/categories?post=3185"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/tags?post=3185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}