File TXT tidak ditemukan.
LLM Council for Code Review: How Multiple AIs Debate and Judge Each Other (Claude, ChatGPT, Gemini)
9A-iJKGmt2I • 2025-12-03
Transcript preview
Open
Kind: captions Language: en All right, let's talk about something that could totally change how we use AI. It's this idea of an AI council where, you know, a bunch of AIs team up to get you the best possible answer. It's all about many minds being better than one. You've been there, right? You ask an LLM a question, you get this super confident answer, but there's still that little voice in your head going, "Hm, is that really the best I can get? Could another AI have done better?" And you know what? You're right to feel that way because relying on just one AI is like asking a single expert for their take. I mean, no matter how brilliant they are, they're going to have their own biases, their own blind spots, a certain way of seeing things. And this is where the big difference comes in. A single LLM, well, that's a single point of failure, isn't it? If it's wrong, it's wrong. But an LLM council, now you're talking. You're bringing in all these different perspectives, creating a whole system of checks and balances. The result, a much more solid, reliable answer. Okay, so where did this whole council idea even come from? Well, like a lot of really cool stuff in tech, it started as a project from a pretty big name in the field. We're talking about Andre Carpathy, who's a huge deal in the AI space. He dropped this project on GitHub called LMM Council, and the idea was just brilliant in its simplicity. Instead of just firing off a question to one AI, what if you asked a whole group of them? And then here's the kicker. What if you had them critique each other's answers? So, here's how his process basically works. First, you send the question to every AI in the council. Simple enough. Then, step two, each model gets to review and rank all the other answers. But, and this is super important, it's all done anonymously. That way, you avoid any kind of favoritism. Finally, a chairman model takes all that feedback, all those rankings, and puts it all together into one final polished answer. But you know what I love about this? The origin story. Carpathy himself said, and I quote, "This project was 99% vibecoded as a fun Saturday hack." How cool is that? It's just a perfect reminder that sometimes the most game-changing ideas start out as just messing around with something fun. And of course, an idea that good wasn't just going to stay a fun little hack for long. The community saw the potential right away and just started running with it. Which brings us to Reddit, specifically the Claude AI community. A developer totally inspired by Karpathy's project decided to build their own version for a really specific, really practical purpose. They called it agent council. And what they did was create a specialized council of AIS. We're talking Claude Code, Chat GPT, and Gemini all focused on one thing, automated code review. So instead of one AI looking at the code, you've got this whole panel of experts giving their feedback. You can imagine how much more thorough that review is going to be. Okay, so we've got a brilliant idea from a top researcher and a super practical tool built by the community. But does this thing actually hold up under some serious academic testing? Well, you know, researchers couldn't resist finding out. So in a paper that popped up on archive, researchers gave this a formal name, the language model council or LMC for short. They defined it as a system where a bunch of LLMs actually work together democratically to not just answer questions but to create the tests and evaluate each other's performance. It's a whole self-contained system. And the big problem they were trying to solve is something called intramodel bias. Basically, what that means is if you use one super powerful model like GPT40 to judge other AIs, it tends to like answers that sound well like itself. It has a built-in preference for its own style, which isn't always fair or even the most accurate way to judge. And what the research showed is that this council approach really, really shines when things get subjective. Think about tasks like judging creative writing or figuring out emotional intelligence or how persuasive a piece of text is. In those cases, you absolutely need a diversity of opinions. And listen, they really went for it. This wasn't just a test with two or three models to really prove their point. Their big case study on emotional intelligence used a council of 20 different large language models. I mean, that is a seriously robust test. So, the big moment, they compared the council's judgments against what actual humans thought. And the result, it was crystal clear. The rankings that came from the language model council were way more in line with human evaluations than the rankings from any single AI judge. It just works better. All right, let's tie this all together. Why does this whole collaboration thing work so well? Well, it really comes down to a few key things. First, the council approach cuts down on the bias you'd get from any one model. It also gives you much clearer, more distinct rankings of what's good and what's not. And maybe most importantly, like we just saw, the final result is just more consistent with what we humans think is right. At the end of the day, it's all about creating that system of checks and balances. No single AI gets to be the dictator. By forcing them to debate and review and come to a consensus, the council effectively filters out the individual weaknesses and gives you a final answer that's way more reliable, more nuanced, and frankly more trustworthy. And this all leads to a really big fascinating question to think about. We often picture the future of AI as this race to build one single all- knowing supermind. But what if that's completely the wrong way to look at it? What if the future isn't about a single super intelligence, but about collaboration? What if the real breakthrough is teaching AIs how to work together, to argue, to debate, and to build on each other's ideas? You know, kind of like we do.
Resume
Categories