[LINK] Re: AI search engines

Sat Dec 21 12:51:01 AEDT 2024

Thanks for this work Tony .. and perhaps of interest ..

OpenAI Skips o2 and Debuts New o3 ‘Reasoning’ Model

Reasoning models are supposed to fact-check themselves by producing a 
step-by-step plan to find a correct answer.

By Thomas Maxwell Dec 20, 2024 
https://gizmodo.com/openai-skips-o2-and-debuts-new-o3-reasoning-model-2000541796

OpenAI has unveiled Version o3, a new chain-of-thought “reasoning” model 
that the company claims is its most advanced yet.

The model is not yet available for general use, but safety researchers 
can sign up for a preview starting today.

https://openai.com/index/early-access-for-safety-testing/

OpenAI and others hope that reasoning models will go a long way toward 
solving the pernicious problem of chatbots frequently producing wrong 
answers. Chatbots fundamentally do not “think” like humans and different 
techniques are needed to try and create the best simulacrum of a human 
thought process.

When asked a question, reasoning models pause and consider related 
prompts that could help produce an accurate answer.

For example, if you ask the o3 model, “can habaneros be grown in the 
Pacific Northwest,” the model might lay out a series of questions it 
will research to come to a conclusion, such as “where do habaneros 
typically grow,” “what are the ideal conditions for growing habaneros,” 
and “what type of climate does the Pacific Northwest have.”

Anyone who has used chatbots knows you sometimes have to prompt a 
chatbot with additional follow-ups until it finally gets the right 
result. Reasoning models are supposed to do this additional work for you.

o3 is the successor to o1, OpenAI’s first chain-of-thought reasoning 
model. Reps said they decided to skip the “o2” naming convention “out of 
respect” for the British telecommunications company, but it certainly 
doesn’t hurt that it makes the product sound more advanced.

The company says the new model comes with the ability to adjust its 
reasoning time. Users can choose low, medium, or high reasoning time; 
the greater the compute, the better o3 is supposed to perform. OpenAI 
says it will spend time “red-teaming” the new model with researchers to 
prevent it from producing potentially harmful responses (since again, it 
is not a human and does not know right versus wrong).

Reasoning is the buzzword of the day in the field of generative AI, as 
industry insiders believe it is the next unlock necessary to improve the 
performance of large language models. More compute eventually does not 
offer equivalent performance gains, so new techniques are needed.

Google DeepMind recently unveiled its own reasoning model called Gemini 
Deep Research, which can take 5-10 minutes to generate a report that 
analyzes many sources across the web in order to come to its findings.

https://gizmodo.com/google-releases-faster-gemini-2-0-with-deep-research-2000537349

OpenAI is confident in o3, and offers impressive benchmarks—it says that 
in a Codeforcing testing, which measures coding ability, o3 got a score 
of 2727. For context, a score of 2400 would put an engineer in the 99th 
percentile of programmers. It gets a score of 96.7% on the 2024 American 
Invitational Mathematics Exam, missing just one question.

We will have to see how the model holds up in real-world testing; 
OpenAI’s recently released Sora still needs work. But optimists are 
confident that the problem of accuracy is being solved. Still, tread 
lightly relying using AI models for important work where accuracy is 
necessary.

AI model companies like OpenAI and Perplexity are in a race to become 
the next Google, collecting the world’s knowledge and helping users make 
sense of it all.

They even have search products now that are meant to more directly 
replicate Google with access to real-time web results.

All of these players seem to leapfrog one another with every passing 
day, however. The feeling is somewhat reminiscent of the late ’90s when 
there were a myriad of search engines to choose from—Google, Yahoo, and 
AltaVista, Ask Jeeves, just to name a few, all hoovering up the 
internet’s data and presenting it just with a different UX.

Most of them disappeared after one came along that was supremely better 
than the rest—Google.

OpenAI clearly has a strong lead right now with hundreds of millions of 
monthly active users and a partnership with Apple, but Google has 
received a lot of plaudits recently for advancements in its Gemini models.

The Verge reports that the company is going to soon integrate Gemini 
more deeply into its search interface.

--