Transcript
9A-iJKGmt2I • LLM Council for Code Review: How Multiple AIs Debate and Judge Each Other (Claude, ChatGPT, Gemini)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/FoundationModelsForRobotics/.shards/text-0001.zst#text/0011_9A-iJKGmt2I.txt
Kind: captions
Language: en
All right, let's talk about something
that could totally change how we use AI.
It's this idea of an AI council where,
you know, a bunch of AIs team up to get
you the best possible answer. It's all
about many minds being better than one.
You've been there, right? You ask an LLM
a question, you get this super confident
answer, but there's still that little
voice in your head going, "Hm, is that
really the best I can get? Could another
AI have done better?" And you know what?
You're right to feel that way because
relying on just one AI is like asking a
single expert for their take. I mean, no
matter how brilliant they are, they're
going to have their own biases, their
own blind spots, a certain way of seeing
things. And this is where the big
difference comes in. A single LLM, well,
that's a single point of failure, isn't
it? If it's wrong, it's wrong. But an
LLM council, now you're talking. You're
bringing in all these different
perspectives, creating a whole system of
checks and balances. The result, a much
more solid, reliable answer. Okay, so
where did this whole council idea even
come from? Well, like a lot of really
cool stuff in tech, it started as a
project from a pretty big name in the
field. We're talking about Andre
Carpathy, who's a huge deal in the AI
space. He dropped this project on GitHub
called LMM Council, and the idea was
just brilliant in its simplicity.
Instead of just firing off a question to
one AI, what if you asked a whole group
of them? And then here's the kicker.
What if you had them critique each
other's answers? So, here's how his
process basically works. First, you send
the question to every AI in the council.
Simple enough. Then, step two, each
model gets to review and rank all the
other answers. But, and this is super
important, it's all done anonymously.
That way, you avoid any kind of
favoritism. Finally, a chairman model
takes all that feedback, all those
rankings, and puts it all together into
one final polished answer. But you know
what I love about this? The origin
story. Carpathy himself said, and I
quote, "This project was 99% vibecoded
as a fun Saturday hack." How cool is
that? It's just a perfect reminder that
sometimes the most game-changing ideas
start out as just messing around with
something fun. And of course, an idea
that good wasn't just going to stay a
fun little hack for long. The community
saw the potential right away and just
started running with it. Which brings us
to Reddit, specifically the Claude AI
community. A developer totally inspired
by Karpathy's project decided to build
their own version for a really specific,
really practical purpose. They called it
agent council. And what they did was
create a specialized council of AIS.
We're talking Claude Code, Chat GPT, and
Gemini all focused on one thing,
automated code review. So instead of one
AI looking at the code, you've got this
whole panel of experts giving their
feedback. You can imagine how much more
thorough that review is going to be.
Okay, so we've got a brilliant idea from
a top researcher and a super practical
tool built by the community. But does
this thing actually hold up under some
serious academic testing? Well, you
know, researchers couldn't resist
finding out. So in a paper that popped
up on archive, researchers gave this a
formal name, the language model council
or LMC for short. They defined it as a
system where a bunch of LLMs actually
work together democratically to not just
answer questions but to create the tests
and evaluate each other's performance.
It's a whole self-contained system. And
the big problem they were trying to
solve is something called intramodel
bias. Basically, what that means is if
you use one super powerful model like
GPT40 to judge other AIs, it tends to
like answers that sound well like
itself. It has a built-in preference for
its own style, which isn't always fair
or even the most accurate way to judge.
And what the research showed is that
this council approach really, really
shines when things get subjective. Think
about tasks like judging creative
writing or figuring out emotional
intelligence or how persuasive a piece
of text is. In those cases, you
absolutely need a diversity of opinions.
And listen, they really went for it.
This wasn't just a test with two or
three models to really prove their
point. Their big case study on emotional
intelligence used a council of 20
different large language models. I mean,
that is a seriously robust test. So, the
big moment, they compared the council's
judgments against what actual humans
thought. And the result, it was crystal
clear. The rankings that came from the
language model council were way more in
line with human evaluations than the
rankings from any single AI judge. It
just works better. All right, let's tie
this all together. Why does this whole
collaboration thing work so well? Well,
it really comes down to a few key
things. First, the council approach cuts
down on the bias you'd get from any one
model. It also gives you much clearer,
more distinct rankings of what's good
and what's not. And maybe most
importantly, like we just saw, the final
result is just more consistent with what
we humans think is right. At the end of
the day, it's all about creating that
system of checks and balances. No single
AI gets to be the dictator. By forcing
them to debate and review and come to a
consensus, the council effectively
filters out the individual weaknesses
and gives you a final answer that's way
more reliable, more nuanced, and frankly
more trustworthy. And this all leads to
a really big fascinating question to
think about. We often picture the future
of AI as this race to build one single
all- knowing supermind. But what if
that's completely the wrong way to look
at it? What if the future isn't about a
single super intelligence, but about
collaboration? What if the real
breakthrough is teaching AIs how to work
together, to argue, to debate, and to
build on each other's ideas? You know,
kind of like we do.