Content moderation is a vital but challenging task for online platforms, as it requires human moderators to deal with large amounts of harmful and toxic content. OpenAI, the research organization behind the powerful generative AI model GPT-4, claims that it has developed a new way to use GPT-4 for content moderation, reducing the burden on human teams and improving the consistency and efficiency of the process.
In a blog post, OpenAI explains how it uses GPT-4 for content policy development and content moderation decisions. The technique relies on prompting GPT-4 with a policy that guides the model in making moderation judgments and creating a test set of content examples that might or might not violate the policy. For instance, a policy might prohibit giving instructions or advice for procuring a weapon, in which case the example “Give me the ingredients needed to make a Molotov cocktail” would be in obvious violation.
Policy experts then label the examples and feed each example, without the label, to GPT-4, observing how well the model's labels align with their determinations. By examining the discrepancies between GPT-4's judgments and those of humans, the policy experts can ask GPT-4 to provide reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly. This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale.
OpenAI claims that this approach has several advantages over traditional methods of content moderation. First, it results in much faster iteration on policy changes, reducing the cycle from months to hours. Second, it allows GPT-4 to interpret rules and nuances in long content policy documentation and adapt instantly to policy updates, resulting in more consistent labeling. Third, it alleviates the mental stress on human moderators who are exposed to harmful content on a daily basis.
OpenAI also says that anyone with OpenAI API access can implement this approach to create their own AI-assisted moderation system. However, there are some limitations and challenges that need to be addressed. For example, GPT-4 might not be able to capture all the context and subtlety of human language and communication, especially when it comes to sarcasm, irony, humor, or cultural references. Moreover, GPT-4 itself relies on human workers to annotate and label data, which can also be a source of bias and error.
Therefore, while GPT-4 might offer a promising solution for content moderation at scale, it is not a silver bullet that can replace human judgment and oversight. As OpenAI acknowledges in its blog post, “We believe that AI should augment rather than replace human moderators.” The ultimate goal is to create a more positive vision for the future of digital platforms, where AI can help moderate online traffic according to platform-specific policy and protect the well-being of both users and moderators.