[Analytics Intern] How did I fail my first A/B testing in my U.S. internship and what did I learn from it?
Working at UMASH, School of Public at UMN, as the social analytics intern for nearly three months, I completed the visualization for dashboards for yearly and monthly performance, and furthermore, I customized and semi-automatized the process of conducting monthly reports. (BTW don’t forget to follow us on Facebook and Twitter) I started training myself to see the organization operation as well as my daily work with a product management viewpoint.
Challenge came
The first challenge came to me when my boss was pondering over how to spend wisely on the Facebook post campaign to reach our goal. Since we are NGO, we don’t have much budget on promoting our posts and raise awareness to someone who is not our followers. Monthly regular expenses is limited.
My boss tend to spend a same amount of budget on the farm safety posts monthly. They are two goals for Facebook campaign promotion. One is click on links, and the other is engagement. Previously, the organization tended to randomly pick one goal for the monthly promotion post. However, with some PM-related mindset, I therefore suggest my boss, why not we try to apply some AB-testing principles to check which one has higher performance. It then led to my first experiment in U.S.
Process of Experiment Design
I tried to separate the process with several steps listed below. I’ve done some experiments during my previous jobs, most of which were assisted by product managers and engineers. Without colleagues with technical background, I can hardly create a 100% closed environment for the experiment. However, I tried my best to follow the logic of A/B testing.
a) Define the observational factors
Our comparison is between two goals of promotion, link clicks or engagement. We want to see which one can bring the most benefit to our organization, but how can we define benefit? How can we define the key success factors here?
We are testing the post with the out-reached link to UMASH website, so are we targeting at increasing traffic toward the website or the engagement of the posts? I came up with the metric in the end. I’d like to see under two different goals of promotion, which post might reach more users.
Website referrals and post engagement are equally important to UMASH. Therefore, I want to see which goal may lead to more reach. The more reach we get, the less cost per reach is, which means we are more efficient on spending our promotion budget.
b) Construct my assumption
My assumption is quite easy. Under different goals of promotion, I assume that link clicks are more difficult than engagement to achieve. Therefore, Facebook algorithm must force the post with link click goal to reach more users, and it might enhance the chances that the post are viewed by more users.
c) Decide frame of time
We usually run the promotion for a week. To prevent our potential users from seeing the same link with similar post, we decide to run the experiment for two weeks. Post A for first week, and post B for the second week.
d) Fixed everything except one
With the post, tried to convince the social editor and my boss to constraint everything except for the goal we set for promotion to reach more accurate and comparable results for A/B testing.
For the two posts, social editor wrote similar text, picked similar pictures, and aligned with the same link, trying to let users see two posts as different and unbiased.
For the promotion, my boss ran the promotion on the same week day, with same duration, and same target audience.
The very and only changed variables was the goal of the post: link clicks v.s. engagement. Then after two weeks, result can be checked.
e) Result check
It turned out to be the post with the goal setting of link clicks get more reach in the end, which prove my assumption is right. The reach for first post is twice than the second one.
But is result of experiment worth trusting? When looking deeper to the post, I will say the answer is no. Actually, it turned out that I failed my exam.
There are several reason the A/B testing is not trustworthy.
1. I found the duration of two promotion is not the same. Without any notice, the first post end the promotion within three days, while the second post ran the total week. There are not comparable in the restricted time frame.
2. The second huge difference between the posts is the formats are different. In the first post, a user can click the picture and outreach to the URL we provided; however, in the second post it is just a picture. The formats are causing some basic difference, which render the experiments not credible anymore.
What I have learned my failure?
Although the results of the A/B testing may somehow confirm my assumption is correct, I can still fully utilize the finding for next steps, given some variables are not well controlled. I did learn something from the mistakes and I will improve the next time for sure.
1) Communication and follow-up is important
The experiment is designed in such a random manner, but I still need to keep communicate with the team. The problems of formatting the post as well as campaign duration can be easily solved beforehand. All I have to do is to double confirm with stakeholders. Sometimes, team members are not familiar with the process with the experiment, and it is the analyst responsibility to assist them in knowing the process better.
2) Utilize different tools to overcome the limitation of the experiments
For the post format issue, it is the Facebook default of showing the mega picture of the link. Therefore, it will show same picture with the same link attached. After the experiment, I suggest the social editor, in the future, we might use https://picsee.co/?lang=zh-tw to convert the default mega picture, which makes two post more identical regards to the format. Sometimes, it really requires some innovation to overcome this kind of environment limitation.
3) Spend more time on defining success
After the experiment, the team member brought out the issue that, even thought the first post got more reach, the second post attracted more people to like our fan page. During the discussion, the defined key success factors are dynamically changed. A product analyst might never settle with the given information. I should ask back and forth to collect more information and then design the A/B testing in a more comprehensive way.
Conclusion
My first A/B testing in U.S. went not well, but I tried my best to standardize the process and to get more understanding the working model of the team. I learn a lot from the mistakes I made, and I am now ready for my next experiment.