[LINK] The NIST GenAI Evaluation

Stephen Loosley stephenloosley at outlook.com
Thu May 2 21:46:39 AEST 2024


National Institute of Standards and Technology

U.S. Department of Commerce


Registration for the NIST GenAI evaluation is now open. You can sign up here.


GenAI: Text-to-Text (T2T)

Evaluating generators and discriminators for AI-generated text vs human-written text.

https://ai-challenges.nist.gov/t2t

Overview
Generators
Discriminators
Schedule
Rules
Resources
Overview

NIST GenAI T2T is an evaluation series that supports research in Generative AI Text-to-Text modality.

Which generative AI models are capable of producing synthetic content that can deceive the best discriminators as well as humans? The performance of generative AI models can be measured by (a) humans and (b) discriminative AI models.

To evaluate the "best" generative AI models, we need the most competent humans and discriminators. The most proficient discriminators are those that possess the highest accuracy in detecting the "best" generative AI models.

Therefore, it is crucial to evaluate both generative AI models (generators) and discriminative AI models (discriminators).

What

The Text-to-Text Generators (T2T-G) task is to automatically generate high-quality summaries given a statement of information needed ("topic") and a set of source documents to summarize. For more details, please see the generator data specification.

The Text-to-Text Discriminators (T2T-D) task is to detect if a target output summary has been generated using a Generative AI system or a Human. For more details, please see the discriminator evaluation plan.

Who

We welcome and encourage teams from academia, industry, and other research labs to contribute to Generative AI research through the GenAI platform.

The platform is designed to support various modalities and technologies, including both "Generators" and "Discriminators".

Generators will supplement the evaluation test material with their own AI-generated content based on the given task (e.g., automatic summarization of documents). These participants will use cutting-edge tools and techniques to create synthetic content. By incorporating this data into our test material, our test sets will evolve in pace with technology advancements.

In the GenAI pilot, generators do “well” when their synthetic content is not detected by humans or AI discriminators.

Discriminators are automatic algorithms identifying whether a piece of media (text, audio, image, video, code) originated from generative AI or a human.

In the GenAI pilot, discriminators do “well” when correctly categorizing the test material produced by AI or Humans.

How

To take part in the GenAI evaluations you need to register on this website and complete the data usage agreement and the data transfer agreement to download/upload the data.

NIST will make all necessary data resources available to the generator and discriminator participants. Each team will receive access to data resources upon completion of all needed data agreement forms and based on the published schedule of each task data release date.

Please refer to the published schedule for data release dates. Once your system is functional, you will be able to upload your data (generators) or system outputs (discriminators) to the challenge website and see your results displayed on the leaderboard.


Task Coordinator
If you have any questions, please email to the NIST GenAI team



More information about the Link mailing list