{"id":2820,"date":"2017-03-27T09:40:42","date_gmt":"2017-03-27T14:40:42","guid":{"rendered":"http:\/\/commons.trincoll.edu\/amst-data-driven\/?p=2820"},"modified":"2017-03-31T15:49:12","modified_gmt":"2017-03-31T20:49:12","slug":"lab-4-visualizing-syria-tweets","status":"publish","type":"post","link":"http:\/\/commons.trincoll.edu\/amst-data-driven\/2017\/03\/27\/lab-4-visualizing-syria-tweets\/","title":{"rendered":"Lab 4: Visualizing #Syria Tweet Connections"},"content":{"rendered":"<div class=\"mceTemp\"><\/div>\n<p><em>Part I: Cleaning (Organizing) Data\u00a0<\/em><em>for\u00a0Visualization<\/em><\/p>\n<p>I chose the first 3,700 tweets from my dataset. \u00a0I have over 130,000 tweets, but this small sample of focus is quite important because it is closer to the beginning of #Trump &#8216;s #muslimban, as well as when #Trump took office, which clearly sparked a lot of tweeting that included my topic #Syria. The tweets are from the earlier portion of February (February 7-8). This lab utilizes Gephi, an <a href=\"https:\/\/gephi.org\/\">interactive visualization platform<\/a>, to assess connections between different twitter users who discuss the same topic of #Syria in their tweets.<\/p>\n<p>After running a python script to extract some of the data from my Excel file of tweets, I was left with a Comma Separated Value (.csv) sheet\u00a0with only two columns (A and B columns). \u00a0Column A was the source, or the source of the tweet (user), and the B column was the connection to another user\/tweeter. \u00a0Some cells in the &#8216;source&#8217; column do not have a matching cell in the B column, or in other words it is blank. \u00a0I believe this is because there was no connection to another user from that source, just someone who tweeted about the subject of #Syria.<\/p>\n<hr \/>\n<p><em>Part II: Les Mis Gephi<\/em><\/p>\n<p>After downloading the Gephi file and opening it, I got a graph that was not yet analyzed to show the connections between the character. See below.<\/p>\n<p><img loading=\"lazy\" class=\"size-medium wp-image-3109 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.34.46-PM-283x300.png\" alt=\"Screen Shot 2017-03-31 at 3.34.46 PM\" width=\"283\" height=\"300\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.34.46-PM-283x300.png 283w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.34.46-PM.png 309w\" sizes=\"(max-width: 283px) 100vw, 283px\" \/><\/p>\n<p>From the list of Nodes, Valjean is ID# 11. Fantine is ID# 23. \u00a0What can be seen from the Data Laboratory tab is that Fantine is most likely the source because of how many targets there are from #23. \u00a0Although Valjean is the main character in Les Mis, Gephi helps to show the complexity of the plot because Fantine has more targets, or connections\u00a0than he does\u00a0by 5. \u00a0There are a total of 77 nodes and 254 edges in the graph. \u00a0Next, I analyzed the graph with various preset algorithms to get a better idea of the relationships\/connections between the characters.<\/p>\n<p>Using Force Atlas algorithm to distribute the data in a pretty way, I got a result that looked like this:<img loading=\"lazy\" class=\"size-medium wp-image-3117 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.58.30-PM-300x181.png\" alt=\"Screen Shot 2017-03-31 at 3.58.30 PM\" width=\"300\" height=\"181\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.58.30-PM-300x181.png 300w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.58.30-PM-768x463.png 768w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-3.58.30-PM.png 783w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>The nodes are extremely clustered so I had to play with the repulsion strength from 200 to 10000 as instructed to better see the relationships and connections from Les Mis. \u00a0Below is the result after increasing the repulsion strength. We can now see how significant certain connections are compared to others based on the edges!<\/p>\n<p><img loading=\"lazy\" class=\" wp-image-3120 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-4.01.43-PM-300x189.png\" alt=\"Screen Shot 2017-03-31 at 4.01.43 PM\" width=\"362\" height=\"228\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-4.01.43-PM-300x189.png 300w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-4.01.43-PM-768x484.png 768w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-4.01.43-PM.png 889w\" sizes=\"(max-width: 362px) 100vw, 362px\" \/><\/p>\n<p>After playing with the attributes of the nodes and giving them a color scheme. I ran the Average Path Length algorithm which shows the avg. number of connections between people in the dataset of tweets (degrees of separation). The diameter is also an important figure, showing the larges value of degrees of separation between two users. \u00a0For the Les Mis graph, the APL is\u00a02.64 and the Diameter is 5&#8211;meaning 2.64 avg degrees between each character and no more than 5 degrees of separation between any two characters in the story. \u00a0Next I played with the Betweeness Centrality to change the size of the nodes; the larger the node, the more connections\/relationships a character has in the story. \u00a0Setting the betweeness centrality under node size settings to Min 10 Max 200, it gave a great contrast of who has the most significance based on number of relationships in the Story. \u00a0As seen below, and as predicted, Fantine is certainly the source and Valjean the main character. Running a\u00a0Modularity algorithm and \u00a0playing with Labels and sizes shows us sub-groups within the larger network of characters. \u00a0Connected Components, which measures if these nodes are connected into certain sub-groups, came back as 1 because all of the characters in the story are connected, it is the Modularity we look at to determine the interactions (Modularity class is what we look for). \u00a0Even though all characters are connected they do not all interact. This is seen in the final Les Mis Visual analysis below. \u00a0After applying a personal colorization to the Modularity class results, filtering a little to limit clutter, I produced my final graph. \u00a0Check it out below and see how each algorithm we ran played a rule in visualizing this Les Mis character relationship graph to tell us whats going on in the story!<\/p>\n<p><img loading=\"lazy\" class=\" wp-image-3137 aligncenter\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-4.37.48-PM-300x204.png\" alt=\"Screen Shot 2017-03-31 at 4.37.48 PM\" width=\"398\" height=\"289\" \/><\/p>\n<hr \/>\n<p><em>Part III: Analyzing the Data<\/em><\/p>\n<p>For this part of the lab I used the Gephi application to visualize the connections between twitter users I captured with my TAGS scraper. For the sake of this lab we used 10,000 as the repulsion strength (size of the graph) because this is a good size for the visualization of roughly 3k tweets. \u00a0Other important figures include the Diameter, and Average Path Length. \u00a0The Diameter for this graph was 14, meaning users were at the most 14 connections or degrees separated . The APL (average path length) is 5.132, meaning on average there were about 5 connections between users in my dataset, or about 5 &#8216;degrees of separation&#8217;. \u00a0Out of 3,700 tweets, there were 422 connected components, meaning 422 connections to different subgroups (different color groups of nodes). \u00a0Not everyone is connected to everyone, but there are a good amount of conversations happening here. I set Betweenness Centrality, or size of nodes at Min 10, Max 300 for a nice proportion to see who is a prominent tweeter on the topic, and who is not.<\/p>\n<figure id=\"attachment_3097\" aria-describedby=\"caption-attachment-3097\" style=\"width: 398px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" class=\" wp-image-3097\" src=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-2.48.41-PM-247x300.png\" alt=\"#Syria\" width=\"398\" height=\"483\" srcset=\"http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-2.48.41-PM-247x300.png 247w, http:\/\/commons.trincoll.edu\/amst-data-driven\/files\/2017\/03\/Screen-Shot-2017-03-31-at-2.48.41-PM.png 630w\" sizes=\"(max-width: 398px) 100vw, 398px\" \/><figcaption id=\"caption-attachment-3097\" class=\"wp-caption-text\">This is the final outcome after around an hour of Gephi processing my data.<\/figcaption><\/figure>\n<p>In analyzing my data visualization, I noticed quite a few things. \u00a0For one, The graph is extremely interesting looking. \u00a0A ring of smaller nodes surrounds a large, almost <a href=\"https:\/\/www.google.com\/search?q=jackson+pollock&amp;espv=2&amp;source=lnms&amp;tbm=isch&amp;sa=X&amp;ved=0ahUKEwi49PeJuYHTAhWa0YMKHcPwCPQQ_AUIBigB&amp;biw=1280&amp;bih=823\">Jackson Pollock<\/a>-esque creation of nodes and edges that are going all over the place. \u00a0The sources which created the outer clusters of nodes are frequently not connected with the inner cluster of nodes and groupings, I believe there is a good explanation for these groups of singletons. \u00a0These clusters seem to be users with little followers on twitter, speaking in broken English about #Syria conflicts on Twitter&#8211;unconnected to the major users\/accounts that are connected to a lot of people, like the more prominent accounts tweeting on the topic during this period in February. \u00a0The most prominent accounts\u00a0include @Amnesty, @Syriatweeten, and @Tahrirlive. \u00a0Each users have very different sized nodes, yet are much more prominent than others. \u00a0@Amnesty stands out the most. \u00a0With 911k followers and a little over 14k tweets, this user has made a significant amount of connections to other users under the topic of #Syria. \u00a0This, however, is logical and somewhat expected. \u00a0Second is @Syriatweeten, which was a very large number of tweets (&gt;250k), but only 3k followers. \u00a0This account is most likely prominent because of the large number of tweets it has, statistically speaking it is only sensical\u00a0this user would show up having tweeted about #Syria on a nearly infinite amount of circumstances<\/p>\n<p>As it Is shown a lot of twitter users remain unconnected with each other and the major accounts associated with the topics they choose to tweet on. \u00a0Because this top\u00a0is foreign in nature and extremely political, I \u00a0expect to see a lot of random &#8216;rant&#8217; tweets on the subject with either little facts or information included but still pertaining to #Syria. \u00a0This particular time period was obviously a great choice for this graphing lab, the clusters and connections created at this time of heavy tweeting on the subject, and right at the time of the #muslimban painted a great graph with phenomal clustering. The singletons tell a great story (as described above), also. The only things I can determine are missing from this graph are some of the common terms used in the middle east as seen in SOME of the tweets that pertain directly to #Syria topics and discussions. \u00a0Examples of these words can be found in the last lab of Word analysis. If \u00a0I was able to capture tweets with these words in them, directly relating so #Syria, I am confident there would be a ton more connections. \u00a0Overall, this lab was very helpful in establishing a good visualization of the connections between twitter users who tweeted on the topic of #Syria collected between Feb 7-8, 2017.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part I: Cleaning (Organizing) Data\u00a0for\u00a0Visualization I chose the first 3,700 tweets from my dataset. \u00a0I have over 130,000 tweets, but this small sample of focus is quite important because it is closer to the beginning of #Trump &#8216;s #muslimban, as well as when #Trump took office, which clearly sparked a lot of tweeting that included&#8230;<\/p>\n","protected":false},"author":1787,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[],"_links":{"self":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/2820"}],"collection":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/users\/1787"}],"replies":[{"embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/comments?post=2820"}],"version-history":[{"count":23,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/2820\/revisions"}],"predecessor-version":[{"id":2835,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/posts\/2820\/revisions\/2835"}],"wp:attachment":[{"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/media?parent=2820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/categories?post=2820"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/commons.trincoll.edu\/amst-data-driven\/wp-json\/wp\/v2\/tags?post=2820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}