[LINK] OpenAI just admitted it can't identify AI-generated text.

Sat Jul 29 11:24:27 AEST 2023

I wonder if emergent understanding of the limitations of 'generative AI' 
and in particular LLM-based approaches will undermine one of the very 
silly premises of the last 15-20 years.

The 'big data' spruikers put the proposition that data quality no longer 
matters if you have enough data.  And a lot of data analysis activity 
seems to have proceeded on that assumption.

When you're drawing inferences from data, and using those inferences to 
make decisions, and then implementing the decisions, the key question 
isn't "Was that data generated by a human, an 'unintelligent' artefact, 
or an 'intelligent' artefact?".

Data quality requires that many criteria be satisfied, but the common 
element is the reliability of the association that the data has with any 
real-world phenomenen that it purports to represent.

Overlaid over that is the cluster of issues that underlie 
misinformation, such as selective quotation and incompleteness / 
acontextuality;  and bias and discrimination arising from dominance of 
some value-sets over others.

Inability to pick whether something is 'AI-generated' is an issue, yes. 
  But beneath that are far bigger issues, incl. the laziness, ignorance 
and recklessness of failing to exercise humans' capacity for critical 
thinking, and failing to either assure adequate quality of source-data 
or qualify conclusions reached commensurate with the quality factors.

_________________

On 29/7/23 12:23 am, Stephen Loosley wrote:
> OpenAI just admitted it can't identify AI-generated text.
> 
> That's bad for the internet and it could be really bad for AI models.
> 
> 
> By Alistair Barr Jul 28, 2023 
> https://www.businessinsider.com/openai-cant-identify-ai-generated-text-bad-for-internet-models-2023-7 
> 
> 
> 
> Large language models and AI chatbots are beginning to flood the 
> internet with auto-generated text.
> 
> It's becoming hard to distinguish AI-generated text from human writing.
> OpenAI launched a system to spot AI text, but just shut it down because 
> it didn't work.
> 
> 
> Beep beep boop. Did a machine write that, or did I?
> 
> As the generative AI race picks up, this will be one of the most 
> important questions the technology industry must answer.
> 
> ChatGPT, GPT-4, Google Bard, and other new AI services can create 
> convincing and useful written content. Like all technology, this is 
> being used for good and bad things. It can make writing software code 
> faster and easier, but also churn out factual errors and lies.
> 
> So, developing a way to spot what is AI text versus human text is 
> foundational.
> 
> 
> OpenAI, the creator of ChatGPT and GPT-4, realized this a while ago. In 
> January, it unveiled a "classifier to distinguish between text written 
> by a human and text written by AIs from a variety of providers."
> 
> The company warned that it's impossible to reliably detect all 
> AI-written text.
> 
> However, OpenAI said good classifiers are important for tackling several 
> problematic situations. Those include false claims that AI-generated 
> text was written by a human, running automated misinformation campaigns, 
> and using AI tools to cheat on homework.
> 
> Less than seven months later, the project was scrapped.
> 
> "As of July 20, 2023, the AI classifier is no longer available due to 
> its low rate of accuracy," OpenAI wrote in a recent blog. "We are 
> working to incorporate feedback and are currently researching more 
> effective provenance techniques for text."
> 
> 
> The implications
> 
> If OpenAI can't spot AI writing, how can anyone else? Others are working 
> on this challenge, including a startup called GPTZero. But OpenAI, with 
> Microsoft's backing, is considered the best at this AI stuff.
> 
> Once we can't tell the difference between AI and human text, the world 
> of online information becomes more problematic.
> 
> There are already spammy websites churning out automated content using 
> new AI models. Some of them have been generating ad revenue, along with 
> lies such as "Biden dead. Harris acting President, address 9 a.m." 
> according to Bloomberg.
> 
> This is a very journalistic way of looking at the world. I get it. Not 
> everyone is obsessed with making sure information is accurate. So here's 
> a more worrying possibility for the AI industry:
> 
> If tech companies use AI-produced data inadvertently to train new 
> models, some researchers worry those models will get worse. They will 
> feed on their own automated content and fold in on themselves in what's 
> being called an AI "Model Collapse."
> 
> A group of AI researchers from fancy universities including Oxford, 
> Cambridge and Toronto has been studying what happens when text produced 
> by a GPT-style AI model (like GPT-4) forms most of the training dataset 
> for the next models.
> 
> "We find that use of model-generated content in training causes 
> irreversible defects in the resulting models," they concluded in a 
> recent research paper.
> 
> After seeing what could go wrong, the authors issued a plea and made an 
> interesting prediction.
> 
> "It has to be taken seriously if we are to sustain the benefits of 
> training from large-scale data scraped from the web," they wrote.
> 
> "Indeed, the value of data collected about genuine human interactions 
> with systems will be increasingly valuable in the presence of content 
> generated by LLMs in data crawled from the Internet."
> 
> We can't begin to tackle this existential problem if we can't tell 
> whether a human or a machine wrote something online. I emailed OpenAI to 
> ask about their failed AI text classifier and the implications, 
> including Model Collapse. A spokesperson responded with this statement: 
> "We have nothing to add outside of the update outlined in our blog post."
> 
> I wrote back, just to check if the spokesperson was a human. "Hahaha, 
> yes I am very much a human, appreciate you for checking in though!" they 
> replied.
> 
> -- 
> _______________________________________________
> Link mailing list
> Link at anu.edu.au
> https://mailman.anu.edu.au/mailman/listinfo/link

-- 
Roger Clarke                            mailto:Roger.Clarke at xamax.com.au
T: +61 2 6288 6916   http://www.xamax.com.au  http://www.rogerclarke.com

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA 

Visiting Professor in the Faculty of Law            University of N.S.W.
Visiting Professor in Computer Science    Australian National University

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.anu.edu.au/pipermail/link/attachments/20230729/cd1ec3af/attachment-0001.sig>