[LINK] Re: AI search engines
Stephen Loosley
stephenloosley at outlook.com
Sat Dec 21 12:51:01 AEDT 2024
Thanks for this work Tony .. and perhaps of interest ..
OpenAI Skips o2 and Debuts New o3 ‘Reasoning’ Model
Reasoning models are supposed to fact-check themselves by producing a
step-by-step plan to find a correct answer.
By Thomas Maxwell Dec 20, 2024
https://gizmodo.com/openai-skips-o2-and-debuts-new-o3-reasoning-model-2000541796
OpenAI has unveiled Version o3, a new chain-of-thought “reasoning” model
that the company claims is its most advanced yet.
The model is not yet available for general use, but safety researchers
can sign up for a preview starting today.
https://openai.com/index/early-access-for-safety-testing/
OpenAI and others hope that reasoning models will go a long way toward
solving the pernicious problem of chatbots frequently producing wrong
answers. Chatbots fundamentally do not “think” like humans and different
techniques are needed to try and create the best simulacrum of a human
thought process.
When asked a question, reasoning models pause and consider related
prompts that could help produce an accurate answer.
For example, if you ask the o3 model, “can habaneros be grown in the
Pacific Northwest,” the model might lay out a series of questions it
will research to come to a conclusion, such as “where do habaneros
typically grow,” “what are the ideal conditions for growing habaneros,”
and “what type of climate does the Pacific Northwest have.”
Anyone who has used chatbots knows you sometimes have to prompt a
chatbot with additional follow-ups until it finally gets the right
result. Reasoning models are supposed to do this additional work for you.
o3 is the successor to o1, OpenAI’s first chain-of-thought reasoning
model. Reps said they decided to skip the “o2” naming convention “out of
respect” for the British telecommunications company, but it certainly
doesn’t hurt that it makes the product sound more advanced.
The company says the new model comes with the ability to adjust its
reasoning time. Users can choose low, medium, or high reasoning time;
the greater the compute, the better o3 is supposed to perform. OpenAI
says it will spend time “red-teaming” the new model with researchers to
prevent it from producing potentially harmful responses (since again, it
is not a human and does not know right versus wrong).
Reasoning is the buzzword of the day in the field of generative AI, as
industry insiders believe it is the next unlock necessary to improve the
performance of large language models. More compute eventually does not
offer equivalent performance gains, so new techniques are needed.
Google DeepMind recently unveiled its own reasoning model called Gemini
Deep Research, which can take 5-10 minutes to generate a report that
analyzes many sources across the web in order to come to its findings.
https://gizmodo.com/google-releases-faster-gemini-2-0-with-deep-research-2000537349
OpenAI is confident in o3, and offers impressive benchmarks—it says that
in a Codeforcing testing, which measures coding ability, o3 got a score
of 2727. For context, a score of 2400 would put an engineer in the 99th
percentile of programmers. It gets a score of 96.7% on the 2024 American
Invitational Mathematics Exam, missing just one question.
We will have to see how the model holds up in real-world testing;
OpenAI’s recently released Sora still needs work. But optimists are
confident that the problem of accuracy is being solved. Still, tread
lightly relying using AI models for important work where accuracy is
necessary.
AI model companies like OpenAI and Perplexity are in a race to become
the next Google, collecting the world’s knowledge and helping users make
sense of it all.
They even have search products now that are meant to more directly
replicate Google with access to real-time web results.
All of these players seem to leapfrog one another with every passing
day, however. The feeling is somewhat reminiscent of the late ’90s when
there were a myriad of search engines to choose from—Google, Yahoo, and
AltaVista, Ask Jeeves, just to name a few, all hoovering up the
internet’s data and presenting it just with a different UX.
Most of them disappeared after one came along that was supremely better
than the rest—Google.
OpenAI clearly has a strong lead right now with hundreds of millions of
monthly active users and a partnership with Apple, but Google has
received a lot of plaudits recently for advancements in its Gemini models.
The Verge reports that the company is going to soon integrate Gemini
more deeply into its search interface.
--
More information about the Link
mailing list