/* ---- Google Analytics Code Below */

Wednesday, June 07, 2023

How Do We Know if a Text Is AI-generated?

Useful thoughts, click through for all of it.

How Do We Know if a Text Is AI-generated?  

Different Statistical Approaches to Detecting AI-generated Text.

By Sara A. Metwalli,   in  Towards Data Science

·In the fascinating and rapidly advancing realm of artificial intelligence, one of the most exciting advances has been the development of AI text generation. AI models, like GPT-3, Bloom, BERT, AlexaTM, and other large language models, can produce remarkably human-like text. This is both exciting and concerning at the same time. Such technological advances allow us to be creative in ways we didn’t before. Still, they also open the door to deception. And the better these models get, the more challenging it will be to distinguish between a human-written text and an AI-generated text.

Since the release of ChatGPT, people all over the globe have been testing the limits of such AI models and using them to both gain knowledge, but also, in the case of some students, to solve homework and exams, which challenges the ethical implications of such technology. Especially as these models have become sophisticated enough to mimic human writing styles and maintain context over multiple passages, they still need to be fixed, even if their errors are minor.

That raises an important question, a question I get asked quite often by my friends and family members (I got asked that question many many times since ChatGPT was released…),

How can we know if a text is human-written or AI-generated?

How to Evaluate the Performance of Your ML/ AI Models

An accurate evaluation is the only way to performance improvement

towardsdatascience.com

This question is not new to the research world; detecting AI-generated text, we call this “deep fake text detection.” Today, there are different tools that you can use to detect if a text is human-written or AI-generated, such as GPT-2 by OpenAI. But how do such tools work?

Different approaches are currently used to detect AI-generated text; new techniques are being researched and implemented to detect such text as the models used to generate these texts get more advanced.

This article will explore 5 different statistical approaches that can be used to detect AI-generated text.

Let’s get right to it…

1. N-gram Analysis:

An N-gram is a sequence of N words or tokens from a given text sample. The “N” in N-gram is how many words are in the N-gram. For example:

New York (2-gram).

The Three Musketeers (3-gram).

The group met regularly (4-gram).

Analyzing the frequency of different N-grams in a text makes it possible to determine patterns. For example, among the three N-gram examples we just went through, the first is the most common, and the third is the least common. By tracking the different N-grams, we can decide that they are more or less common in AI-generated text than in human-written text. For instance, an AI might use specific phrases or word combinations more frequently than a human writer. We can find the relation between the frequency of N-grams used by AI vs. humans by training our model on data generated by humans and AI.

2. Perplexity:

If you look up the word perplexed in the English dictionary, it will be defined as surprised or shocked, but, in the context of AI and NLP, in particular, perplexity measures how confidently a language model predicts a text. Estimating the perplexity of a model is done by quantifying how long a model needs to respond to a new text, or in other words, how “surprised” the model is by the new text. For example, an AI-generated text might lower the perplexity of a model; the better the model predicts the text. Perplexity is fast to calculate, which gives it an advantage over other approaches.

3. Burstiness:  .... ' 

No comments: