With the social network analysis, I hope to gain a better understanding of what connections exist between Twitter users discussing #NoDAPL. For this analysis, I chose to use tweets from February 21-23, yielding a total of approximately 1,800 tweets. This time period includes the days leading up to the forced clearing of the protestors from the Standing Rock camp.
The social network analysis was performed using Gephi. To familiarize myself with the software, I used one of the sample networks provided—Les Miserables. The first choice available upon opening the file is either a directed or undirected graph. For the purposes of the exercise, I chose the undirected graph. However, a directed graph may be useful for my analysis of #NoDAPL tweets, as it will better display the flows of the conversation—whether individuals are directing messages at (or retweeting) more connected users, or whether the connected users are speaking to a large group of followers. In the Les Mis file, there are 77 nodes (characters), and 254 edges (interactions).

The Data Laboratory shows the connections between sources and targets—essentially, who is speaking, and to whom. Valjean, the main character (ID no. 11) only speaks to four other characters, but he is spoken about by considerably more. Fantine (ID no. 23), on the other hand, speaks to Valjean and nine others, but is spoken about by fewer. Valjean speaks to Myriel, Labarre, Mademoiselle Baptiste, and Madame Magliore. Fantine speaks to Valjean, Marguerite, Tholomyes, Fameuil, Blacheville, Dahlia, and Zephine. Curiously, the only apparent overlap between the two is when Fantine speaks directly to Valjean. It will be interesting to see how Fantine is connected to Valjean such that she can speak to him directly, but is never spoken to in return (at least by name).
Running the statistical analyses yielded some additional information. The average path length between nodes was 2.64, and the diameter, or most “degrees of separation,” between nodes was 5.


Several adjustments were necessary to make the network analysis readable. Increasing the repulsion strength separated the nodes from one another, lengthening the edges between them. The relative scale of the connections did not change, but having a stronger “push” between nodes begins to separate them into distinct groups. Changing the parameters for node size to a range from 10-200 magnified the more connected nodes, better showing their level of connectedness. The nodes were ranked by number of connections, and assigned a size within the given range based on that rank.

Running Gephi with my own data, I was able to be more flexible with the settings in order to create a better visualization. I chose a repulsion strength of 6,000, a compromise between the default value and the value used in the Les Mis analysis. Using a repulsion strength of 10,000 moved the nodes too far apart, increasing the overall size of the plot to the point that large nodes and groups were difficult to see clearly. The node size range was kept the same, between 10 and 200.
The statistical analyses of #NoDAPL produced very different results than those from Les Mis. The average path length between users was 5.23, and the diameter was 13. Obviously, a real-world sample of Twitter users will not be as tightly connected as the characters of a play, explaining why both metrics are higher, by a factor of about 2 and 2.5, respectively. With 1,778 nodes and 1,994 edges, I am surprised that the distance between nodes is still relatively short. The fact that the average path length (5.23) is less than half of the maximum path length (13) indicates a tendency of users of the #NoDAPL hashtag to be connected to one another. A larger diameter indicates a larger gap between two users or their groups; this may have many causes, but one likely explanation is that these disconnected users are invested in different issues—perhaps the environment and indigenous rights—under the same banner.

The final visualization reveals several small clusters, centered around a single, larger node. These large nodes appear to be mostly individual activists or advocacy groups, each with thousands or tens of thousands of tweets. User @ruthhhopkins is the largest node, but is not disproportionately larger than other central nodes. Many users are not strongly connected to any others, positioned in small clusters with a degree range of one or two. I chose to filter out singletons, with a degree of 0. I was conflicted about filtering past zero degrees—it created a less cluttered, more understandable plot, but eliminated many of the clusters around larger nodes. These single connections suggest that more connected users are simply being retweeted, rather than really being engaged in conversation. The directional edges support this assumption, with most being pointed at the large nodes, indicating that they were mentioned in other commons, as is practiced when retweeting.
As I mentioned, most of the major nodes are activists involved in the protests of the Dakota Access pipeline—either individuals or organizations. Most of these accounts are highly active. Even those claiming to belong to a single person are most likely managed—the largest node, @ruthhhopkins has almost 65 thousand tweets, and over 44 thousand followers. The other notable group of major nodes is online news organizations. Users @fusion and@UR_Ninja, as well as some others, are general news sites or media-specific branches. Most of the users surrounding the main nodes are unremarkable—regular users with a smaller following and fewer tweets. A couple of interesting groupings did appear, however. @Wmn4Srvl (Women for Survival) was linked with several major news organizations (BBC, ABC, CNN, and others), but not strongly linked to any other nodes. In some cluster, there were two major nodes (@joshfoxfilm and @npr, @potus and @americanindian8, and @fusion and @markruffalo), with a group of shared connections. Most of these pairings had a large, public account (@npr, @potus, and @markruffalo) and a smaller, non-mainstream account. Perhaps these are instances of one account sharing a message directed at another, which is then retweeted by a group of followers.
Although it is interesting to see the small clusters that did form, it appears that the majority of the hashtag users did not have particularly strong connections to other users. Also, while there are several large nodes focused on similar topics, there are few connections between them. I was surprised to see this fragmentation within the community. Instead, I would have expected users with similar concerns to group together, creating clusters based on interest in particular social or environmental issues, rather than simply choosing a particular figure and following them.
Great analysis of “who is talking to who”! I agree that a repulsion strength of 6,000 allowed for us to see the primary clusters, while also revealing that many users are not connected to any cluster. Might this be the result of your hashtag being relatively new? Are a majority of #NoDAPL supporters yet to support a specific activist group?
Unlike your data, mine showed very few clusters surrounding news sources. I wonder if our Twitter data was scraped for the same date(s). If so, then this may explain why you had more clusters centered around new sources and I had next to none.
I am still curios to know if the clusters in your data are tweeting about the same topics. Or, as you mentioned, are some tweeting about indigenous rights and others about environmental rights? You might want to consider looking at the data itself to determine this.
Unlike my analysis, I find it interesting how you were able to focus yours around a few specific days that were important to the nodapl discussion. I think that with your heavy amount of nodes in your graph you were still able to understand and explain who is talking to who. Along with this, your analyzation of the singletons clustering up your plot was a cool thing to hear after seeing your plot as it is. The Twitter users who aren’t part of conversations are still important to the analysis, but I could see how they would cluster the graph. Maybe expand a little more on the two sides of #nodapl on Twitter. Why are some conversations happening and others aren’t? Through your analysis I can see how different my hashtag is to yours because of how it can be used. It is really interesting to see a much larger amount of nodes than mine.