Understanding the Unintended Effects of Human-Machine Moderation in Addressing Harassment within Online Communities
We set out to explore the unintended effects of human-machine moderation in mitigating harassment within online communities. We examine communities that use a block-list type bot to prevent harassment from the source of harassment. Drawing from social categorization and selective exposure theories, we theorize that employing a machine alongside humans for community moderation will create unintended adverse effects. Specifically, within the moderated focal community, we hypothesize an emboldening effect characterized by an increase in harassment among community members directed at their outgroup members. Additionally, we expect a disengaging effect, that is, a downward trend in the focal community’s membership. Finally, in neighboring communities that share the same topic of discussion, we expect a spillover effect, that is, an increase in harassment. Employing Detoxify, a Bidirectional Encoder Representations from Transformers (BERT)-based model, we evaluate harassment scores in the focal community by analyzing 4 million Reddit comments across various communities. These scores serve as inputs for Bayesian Structural Time Series analysis, revealing evidence for both disengaging and spillover effects. For the emboldening effect, we use community-specific keywords in a predefined computer-assisted document classification approach, Keyword Assisted Topic Model (keyATM), to identify the target of harassment. We use mean comparison and regression discontinuity to assess the change in the level of harassment targeting outgroup members before and after the human-machine moderation implementation.