NLP Social Media Research
Project Overview
Recently, there has been much discussion surrounding freedom of speech and anonymity in social media using four platforms 4Chan, Reddit, YouTube, and Twitter. Primarily due to the revelations of Cambridge Analytica and how Cambridge Analytica used Facebook data to influence elections. Many activists and intellectuals have questioned how one can effectively combat this type of manipulation while also respecting people's rights to free speech and privacy. However, freedom of speech is a contentious topic, and there are many facets to consider when discussing it. This report covers the analysis performed on how much freedom of speech varies on social media platforms and compares this across four platforms based on anonymity and moderation. The project aims to use natural language processing on user-generated posts and develop a Speech Scoring metric for analysis. The project results achieved on the analysis showed little to no significant differences in Speech Scores between the four social media platforms.
Platform Anonymity
Each platform has unique qualities which influence how users interact with each other and how they can express their ideas online. YouTube and Twitter both have highly censored cultures where speech is often stifled due to threats of legal action and censorship from corporations and governments. Reddit's moderation efforts vary across content types (e.g., pornography vs. politics) but tend to lean towards moderate levels of censorship for controversial topics. Finally, 4chan has a hands-off approach to moderating content and is highly tolerant of offensive content.
Incognito - No personal data is needed to create an account to interact on the site; a platform example is 4chan.
Concealed - No personal data is needed, but strongly advised at account creation. Users may have to comment on smaller forms or meet account requirements to interact on popular forms; a platform example is Reddit.
Partial - Some personal data is needed for account creation, like an email; a platform example is YouTube.
Limited - Account creation requires an email and a phone number; a platform example is Twitter.
Speech Score Results
Below are the results for each platform. 4chan had the highest Hate, Offensive, and Violent scores. However, the biggest surprise was to see high violent scores homogeneous across all four platforms. The result shows that platform users used words related to hurting others, usually physically hurting. Given the subject matter of the Iran protest, the results are reasonable.
Recommendations
In the project's limited data extractions on the Iran protest for each platform, after performing speech scoring, it seems that there was not much difference between the platforms in terms of anonymity. For instance, 4chan does not have any moderation. Reddit has no profile validation but a small group of volunteer moderators (50 people), compared to YouTube and Twitter, which has a large number (a few thousand moderators). However, regardless of the size of moderators, the Violent Scores are similar between all four platforms.
When the project compares Hate Scores and Offensive Scores, there is quite a noticeable difference between 4Chan and Twitter. Twitter might be devoting more resources towards hate and offensive words. This area might be the most productive for minimizing offensive content while maximizing ad revenue. In contrast, the Violent Scores show it might be less productive to allocate capital to that type of moderation. Either because of the increased difficulty of detection or backlash from occurrences are not as severe.
Therefore, the question is whether companies need to allocate that considerable capital to hiring moderators or whether the process can be automated using a moderation model. For example, does Twitter need thousands of employees to moderate the platform, or can Twitter use machine learning algorithms that can perform moderation, saving companies money? In addition, Twitter and YouTube could adopt Reddit's moderation processes to further reduce the headcount required for moderation.