Expand Twitter Follower Network for NGO with Python Tweepy Package

8 min readDec 27, 2018

Remember to follow me on Medium to get instant notification of my new article.

This project is initiated and requested from my boss at UMASH. After the data automation and dashboard for social media and website traffic. I once again visit the analysis on Twitter data. This time, it is not solely the visualization of existing data, I try to utilize the data extracted from Twitter and combine the programming skill I polish from the program, and further achieve the goal.

Problem Intro

UMASH, as a NGO, usually faces the common issue most of NGO is facing, which is hard to acquire new followers. Social media of NGOs usually attracts to specific groups/segments of users. The posts are not able to reach many people, for they are not intriguing, exciting and general enough. It is difficult for the post to be shard and discussed through the social network. Furthermore, NGO usually has limited budget for post promotion, which make the task even more daunting.

Possible Solution

Therefore, my boss brings up some alternative solutions, which I had slightest idea how to execute the plan, for I am just a freshman in the NGO field. Usually, NGOs expands their influence through numerous offline connection and union instead of online exposure. UMASH has one or two relation/event coordinators, who take charge of building relationship with local organization and KOLs. With connection, we hope to link to those key accounts connections as well to expand our network as well as influence at the same time. It can create huger synergy among these NGOs.

How Programming Provides Help

Till this point, you may be still confused how data science or programming skills can be applied in this scenario. It is relatively straightforward.

In this task, I mostly exploit Python Tweepy package to generate name list for our public relation team. With the simple and clear list, coordinators are able to follow and connect the organizations and opinion leaders in various ways, either with email or phone call.

Furthermore, UMASH is possible to acquire followers from those similar organizations and opinion leaders as well. It is another kind of O2O strategy: offline to online.

This is my first time to formally play around with Tweepy package. I am still not very familiar with those APIs. However, with some research on online resources, I got some results and generate several lists for my boss and colleagues, enabling them to initiate the conversation with other organizations.

I break down my execution plan into several steps. I will list them in the next section.

Tools

Tools: Python Spyder

Skills: Data processing with pandas and tweets processing and scraping with tweepy

Methodology & List Generation

1. Generate the list of UMASH’s followers

2. Generate the list of UMASH’s followers’ followers

3. Generate the list of what UMASH’s followers are following

4. Generate the list of followers and their location

From the process above, I try to lay the anchor of programming in UMASH followers, and expand the search margin to their followers and who they are following. It might give us a glimpse of who is our potential followers and opinion leaders.

The fourth point is a little bit exclusive. As the NGO whose target is to convey health information toward farmers in Midwest area, my boss wants to know the geographical distribution of our followers. Therefore, I add this part into my code.

In the next section, I will showcase some of my codes which are used to scrape the data from Twitter using Tweepy. However, due to the query limitation for Twitter, I start from scraping small amounts of data, which will be shown in code as demo.

Generate list of UMASH Followers

To begin with, I decide to get our full follower list from the our UMASH Twitter account. We might check our UMASH page and find that UMASH so far has 553 followers. The layout of Twitter Follower page is really hard for management level to get a list of follower name intuitively. Plus, the Analytics feature in Twitter doesn’t provide follower list as well. Therefore it is a perfect timing to use Tweepy package.

First, I import needed package. Tweepy is a must when dealing with data from Twitter. Pandas is always our best friend for munging and process data.

import tweepy
import pandas as pd

The second step for any Tweepy operation is to provide four keys/tokens from Twitter App Developer. There are many tutorial online teaching how to apply for token. I list one link here, feel free to get your own token.

C_KEY = (Your consumer API key)
C_SECRET = (Your consumer API Secret Key)
A_TOKEN = (Your access token)
A_TOKEN_SECRET = (Your access token secret)

After assigning keys and tokens to different variables, I initiate the authentication and pass it with Tweepy API.

auth = tweepy.OAuthHandler(C_KEY, C_SECRET)
auth.set_access_token(A_TOKEN, A_TOKEN_SECRET)api = tweepy.API(auth)

With variable api, I can start operating the scraping process. Scraping followers of UMASH is relatively easy.

users = tweepy.Cursor(api.followers, screen_name="umash_umn", count = 200).items()

Using the one-line code the cursor object, I can further output the name and screen name of UMASH followers with simple list and for loop. To take some notice, name is the how the users name themselves, while screen name is more like account id.

screennamelst = list()
namelst0 = list()for u in users:   
    namelst0.append(u.name)for u in users:
    screennamelst.append(u.screen_name)

I combine these two lists together with pandas dataframe and output it as csv file for simpler usage.

umash_follower = pd.DataFrame({'screen_name':screennamelst, 'user_name':namelst0})
umash_follower.to_csv('follower.csv', sep=',')

The output of the dataframe is shown below:

With the list, my boss and colleagues might firstly identify roughly our follower category, whether an organization or an individual.

Generate the list of UMASH’s followers’ followers

The second section is basically constructed on the fundamentals of section 1. I construct another for loop to get follower’s followers screen name.

Owing to the limitation of Twitter API, I ran the first five followers in the screennamelst to showcase what I do.

all_follower = list()for i in screennamelst[0:5]:
    for u in tweepy.Cursor(api.followers, screen_name=i,count=200).items():
        all_follower.append(u.screen_name)

Just like what I have done in section 1, I turn the list into pandas dataframe and output it as csv file for easy check and query.

follower_follower_15 = pd.DataFrame({'screen_name':all_follower})
follower_follower_15.to_csv("follower's followers.csv", sep = ',')

The output is shown as follow:

I try to group by the screen name to see if there is any duplicated followers which might cast some possibilities of opinion leaders.

follower_count = follower_follower_15.groupby('screen_name').size().sort_values(ascending = False)

From the sample follower’s followers, I get result that the first follower show twice. The account might be potential opinion leader or followers that has interest in UMASH too.

Generate the list of what UMASH’s followers are following

This task is much harder than task 2. The reason behind is that I didn’t find any API or cursor object which can return whom the user is following. The closest solution is follower’s friend ID, which includes whom the user is following.

Therefore, same as section two, I write two for loops to return list of friend IDs. And I also write it out as csv files.

all_friend2 = list()
for i in screennamelst[6:10]:
    for u in tweepy.Cursor(api.friends_ids, screen_name=i,count=200).items():
        all_friend2.append(u)follower_friend610 = pd.DataFrame({'friend_id': all_friend2})
follower_friend610.to_csv("follower's friend.csv", sep = ',')

The dataframe returned is what UMASH followers are following. It grants me more incentives to conduct the groupby function to see the distribution of what they follows. And the accounts on the top of the ranking list are the organization/individual we might want to connect with.

friend_count = follower_friend610.groupby('friend_id').size().sort_values(ascending = False)
friend_count2 = friend_count[friend_count>=2]

Generate the list of followers and their location

The last task is to find where UMASH followers are based. Just like the issue of what people are following, I didn’t find related API to query follower’s location.

Instead, I locate another breakthrough point to approach the solution, which is user timeline. With tweepy, analyst can easily parse tweets on individual user. For the user_timeline object, it provides numerous metadata and columns for that specific tweet.

I conduct a small experiment below with a single id. I firstly extract two tweets from the account “wiscoag”, an agricultural organization.

tweets = api.user_timeline(screen_name="wiscoag", count=2)
print ("Number of tweets extracted: {}.\n".format(len(tweets)))

Next I print out all related fields a user_timeline object (ResultSet).

for item in tweets:
    print(item._json)

I find the name, screen_name and location related to this tweet can be extracted from the json-like result.

for tweet in tweets[1:2]:
    print (tweet.user.location, '/' ,tweet.user.name)

Here it is, with a simple code, I return the tweet location and name. There is a big assumption behind this operation, which is I assume each account has similar and identical location when tweeting most of the time. Then it will make sense to query the latest tweet and return its location to represent where the account is.

After the small experiment, I load in the previous file of follower and transform the screen name column into list.

df_follower = pd.read_csv('follower.csv')
namelst = df_follower['screen_name'].tolist()

Then I wrote a for loop to append name and location into two empty list.

loclist = []
namelist = []for name in namelst[1:300]:
    try:
        user_tweets = api.user_timeline(screen_name= name, count=2)
    except Exception as e:
        user_tweets = list([0])
    if type(user_tweets) == list:
        loclist.append('Private Account')
        namelist.append('Private Account')
    else:
        for tweet in user_tweets[1:2]:
            loclist.append(tweet.user.location)
            namelist.append(tweet.user.name)

With in the for loop, I use try and except to shun away those locked and private accounts. Furthermore, I use if-else to append correct information to two lists.

Finally, I am able to write the lists into pandas data frames and output as csv file.

df_300 = pd.DataFrame({'1. name':namelist, '2. location':loclist})
df_300.to_csv('follower_location.csv', sep=',')

The result is clear- name and location. Some names are aligned with null value. The reason is probably because the location info is not include when they posted that tweet. But at least I get something out of the tweet data which is representative.

Last Word

Tweepy is a powerful package. Further applications are numerous, such as sentiment analysis and social listening. However, in this article, I explore the fundamental application of tweepy: simply getting the list of followers and related network. It is the first step for UMASH to expand followers both online and offline. I will definitely go further as my internship proceeds.

yunhanfeng/tweepy_scraping_follower_data

I use tweepy to scrape follower data as an NGO analyst during my internship, providing organization an user list to…

github.com

If you like the article, feel free to give me 5+ claps
If you want to read more articles like this, give me 10+ claps
If you want to read articles with different topics, give me 15+ claps and leave the comment hereThank for the reading