Mastering AI Agents with Hugging Face Smolagents: Build Your Own Multi-Tool Chatbot

7XipjLJEGP4 • 2026-01-12

FoundationModelsForRobotics YouTube Transcript

Transcript preview

Open

Kind: captions
Language: en
Have you ever wanted to build your own
AI assistant? I'm not talking about just
a chatbot, but something that can
actually do things. Use tools, browse
the internet, generate images, and wrap
it all up in a slick user interface.
Well, if that sounds exciting, you are
in exactly the right place. Today, we're
going to tear down the mystery behind
all of that. We're doing a deep dive
into the small agents library from
Hugging Face, and you're about to see
just how easy and incredibly powerful
building your own AI agent can. So,
what's on the docket for today? Well,
here's our road map. We'll start by
defining our quest. Then we'll set up
our digital workshop, getting all the
tools we need. After that, the real fun
starts. We're going to bring our agent
to life, teach it some, frankly, amazing
new skills, and then build it a sleek
control panel. By the end of this,
you'll be more than ready to kick off
your very own agent building adventure.
All right, let's jump right into part
one, our quest. You know, the whole
world of AI agents can feel super
intimidating, but our goal today is to
cut right through that noise. We're
going to make this whole process not
just something you can understand, but
something you can actually do. And let's
be real, it's so easy to feel that way.
You hear all these buzzwords flying
around, lang chain, langraph, agentic
loops, and you see these super
complicated diagrams and think, "Nope,
this is way beyond me." Well, today we
are flipping that script completely.
We're going to replace that overwhelming
feeling with a genuine sense of I can do
this by showing you exactly what's going
on under the hood, but in a way that's
so much easier to get your head around.
Now, this isn't just some hypothetical
question. This is our actual goal today.
We are going to build the exact
application you see on screen. A chatbot
that can think for itself, figure out a
plan, use a whole bunch of different
tools to solve a problem, and then give
you the answer. For instance, look at
this prompt. Generate an image of the
chanc of Germany from 2010 playing the
flute. Now, think about that. The agent
first has to figure out who the
chancellor even was back in 2010, and
then it has to use a different tool to
create the image. That kind of
sophisticated multi-step thinking is
exactly what we're going to build
together by the end of this explainer.
Okay, time to roll up our sleeves. You
know, any great project, whether it's a
painting or a piece of code, it all
starts with getting your workshop set up
correctly. So, let's get our digital
tools and materials in order. You know,
the foundation here is actually way
simpler than you might think. It all
kicks off with one single line in your
terminal. Pip install small agents.
That's it. That one command installs
this entire minimalist framework we're
about to use. Next up, we need our API
keys. Think of these like the keys to a
supercars engine. They're what give our
code access to the powerful AI models
that will act as our agents brain. So,
our primary key is going to be a hugging
face token. This thing unlocks a massive
universe of really powerful open- source
models. And getting one is super easy.
Just go to your hugging face profile,
find access tokens, and create a new
one. Now, we're also going to grab keys
for OpenAI and Enthropic. And we're not
doing this because we absolutely need
them right now, but to show you just how
flexible this framework is. You'll see
in a bit just how easy it is to swap
different brains in and out of our
agent. Okay, our workshop is set up, the
tools are laid out. Now for the really
exciting part, the moment we've been
building towards. We're going to put
together the core pieces of our agent
and see it think for the very first
time. First up, we have the tool calling
agent. The best way to think about this
is as the project manager or maybe the
conductor of an orchestra. When you give
it a task, this is the part that looks
at the problem, checks out all the tools
it has available, and decides, okay, I
need this tool, I need to give it this
information, and then I'll do this with
the result. It's the absolute maestro of
the entire operation. Now, if the agent
is the manager, the inference client
model is the connection to the actual
brain, the large language model or LLM
that does all the reasoning. We're going
to start with an open- source model from
HuggingFace called Kimmy K2 Instruct.
You can actually find models like this
yourself. Just go to the models tab on
HuggingFace, filter for text generation
and inference available, and see what
pops up. And here's the best part. Not
only are these models incredibly
powerful, but as we're about to see,
they are ridiculously cost effective
compared to some of the big proprietary
names out there. And this this is where
the magic of AI agents really starts to
click. Our third component is the web
search tool. [snorts] It comes pre-built
with small agents. And with this one
little addition, our agent is no longer
trapped by its training data. It's not
stuck in the past. It can now tap into
the entire internet for real time up to
the second information. This is
literally our agent's window to the
world. So here's the beautiful part.
Putting all these pieces together is
just simple. We initialize the model
telling it which one we want to use.
Then we initialize our tool calling
agent. We give it a list of its tools.
Right now just the web search and we
tell it which model to use for its
brain. And that's that's it. Seriously,
in just a handful of lines of code,
we've built a thinking reasoning AI
agent that is connected to the internet.
Okay, let's take it for a spin. We're
going to ask it something it could never
know from its training data. Now watch
this. The agent actually logs its entire
thought process for us. First, it sees
the query and realizes, hey, I don't
know this. So, it logs a tool call. It
decides to use the web search tool to
find the answer. Then, it gets an
observation back, the result from that
web search. And now, armed with that new
piece of information, it puts it all
together to formulate the final answer.
This right here, this is that famous
agentic loop in action. And small agents
just lays it all out for us, clear as
day. So, our agent can think and it can
search the web. That's already pretty
cool. But the real magic, the thing that
makes agents so special starts right
now. We're going to teach it brand new
custom skills. This is where our agent
goes from being a generic tool to
becoming our specialized assistant. Now,
just take a second to appreciate how
elegant this is. If you've ever tried to
do this from scratch, you know the pain.
You have to write your function and then
you have to manually create this complex
JSON schema just to describe what your
function does to the AI. It's awful. But
with small agents, all of that just
disappears. All you do is write a normal
Python function and add this one little
line at tool right above it. That's it.
The library does all the complicated
background work to make it
understandable for the AI. It's
incredible. Okay, this is absolutely
critical. So listen up. When you make a
custom tool, the single most important
part is not your code. It's the doc
string. That little text description
right underneath the function name. the
AI, the LLM brain, it never ever looks
at your Python code. It only reads this
description. You have to think of this
as the instruction manual you're writing
for the AI. You're telling it exactly
what this tool is for and when it should
use it. A clear, well-written dock
string is the secret to a tool that
works every time. So, let's talk about a
few best practices here. First, always
use type hints. Saying that an input
like prompt is a string stir just helps
the agent structure its requests
properly. Second, and I can't say this
enough, describe your tool thoroughly.
Be super explicit. And here's a little
pro tip for you. If you think you might
ever share your tool on the hugging face
hub for others to use, put your import
statements inside the function itself.
That makes it totally self-contained and
portable. And this this is where it all
comes together. We're going to give our
agent a brand new custom tool, a
function called generate image. Now,
let's go back to that really complex
query from the beginning and watch the
agents logic. It knows it can't make the
image right away because it's missing a
key piece of information. So, what's its
first move? It uses the web search tool
to find out who the chancellor was. As
soon as it gets that answer, Angela
Merkel, it immediately triggers a second
tool call to our image generator,
feeding the output from the first tool
directly into the second. This ability
to chain tools together is what
separates a simple chatbot from a true
AI agent. But what if you don't even
want to write the tool yourself? Well,
this is where things get absolutely
wild. Hugging face spaces has thousands
of public AI apps that people have built
for all sorts of things. And with one
command, tool from space, you can point
your agent to any one of those public
apps and add it to its toolbox. You just
give it the space ID and a quick
description and your agent can now use
that app as if it were its own built-in
skill. It is a massive game-changing
shortcut. So, our agent is now a
multi-talented powerhouse. It can think,
it can search, it can perform custom
tasks. Now, it's time for the final
upgrades. We're going to look at
swapping out its brain and then we're
going to give it a professional-grade
control panel, a real user interface.
Okay, let's talk brains. We started with
that open model Ky K2 which costs about
two bucks for every million tokens of
output, but small agents makes it so
easy to switch. You can import the open
AI model class and pop in a model like
GPT40, which for that same million
tokens might run you closer to $14. You
can even use that exact same class to
connect to other providers like
Anthropic just by changing the base URL.
This flexibility is key. It lets you
pick the right balance of power and
price for whatever you're building. I
mean, this chart really puts it in
perspective. The blue bar is our open
model at $2. The red one is the
proprietary model at 14. That's a seven
times price difference. When you're just
starting out, developing and
experimenting and running tons of tests,
that cost difference is an absolute
gamecher. Starting with these powerful
but super affordable open models is just
a smarter way to build. And now for the
grand finale. This might be the most
magical part of the whole thing. We've
got this amazing agent running in our
code, but how do we let other people use
it? How do we share it? Well, it's
almost laughably simple. You import
Graddio UI from the library. You call it
with your agent and then you just type.
That is it. Two lines of code and that
command instantly spins up a complete
professional sharable web application
for your agent. It even gives you a
public URL. It's just wow. We have
covered so much ground today. We went
from just an idea all the way to a fully
functional multi-tool AI agent with its
own web app. So, let's just take a quick
second to recap the powerful new skills
you've just picked up and talk about
what's next for you on your own agent
building adventure. I mean, just look at
what you can do now. You know how to set
up the entire environment. You can
initialize an agent with a brain and
tools. You can build your own custom
tools from scratch. You can pull in
thousands of other tools from the
HuggingFace hub. And you can deploy the
whole thing as a web app for anyone to
use. You've now seen the entire life
cycle of creating an AI agent from the
first line of code to the final pro. But
really, this isn't the end. It's the
starting line. The real question now is
what are you going to build with these
new skills? Are you going to make an
agent that automates your email or one
that interacts with your favorite APIs
to get work done? Maybe a creative
partner to help you write or code. The
possibilities are, and I mean this
literally, endless. I really hope you
enjoyed this deep dive into small
agents. It is such a fantastic, simple
way to really understand how agents work
under the hood. We're going to be
exploring more advanced frameworks like
Open AI's assistance and laying chain in
future explainers to solve even bigger
problems. So, if you want to continue on
this journey with us, make sure you
subscribe. Thanks so much for watching,
and I can't wait to see you in the next
one.

Resume

Berikut adalah rangkuman komprehensif dan terstruktur berdasarkan transkrip yang Anda berikan.

***

# Membangun Asisten AI Canggih dengan 'Small Agents' dari Hugging Face: Tutorial Lengkap

### Inti Sari (Executive Summary)
Video ini membimbing penonton langkah demi langkah dalam membuat asisten AI yang fungsional—bukan sekadar chatbot biasa—menggunakan pustaka "small agents" dari Hugging Face. Tutorial ini mendemistifikasi pengembangan AI agent yang sering dianggap rumit (seperti dengan LangChain), dengan menunjukkan cara membangun agen yang dapat berpikir, merencanakan aksi, menggunakan alat (tools), menjelajah web, dan menghasilkan gambar. Pembahasan mencakup mulai dari persiapan lingkungan, pembuatan alat kustom, integrasi dengan aplikasi publik, hingga deployment antarmuka web yang siap pakai dengan efisiensi biaya yang tinggi.

---

### Poin-Poin Kunci (Key Takeaways)
*   **Konsep AI Agent**: Berbeda dengan chatbot biasa, AI agent memiliki kemampuan untuk menggunakan *tools*, merencanakan langkah (*reasoning*), dan menjalankan tugas berantai untuk menjawab pertanyaan kompleks.
*   **Pustaka "Small Agents"**: Solusi yang lebih mudah dan tidak mengintimidasi dibandingkan framework besar lainnya, memungkinkan integrasi cepat dengan berbagai model LLM.
*   **Pentingnya Docstring**: Dalam membuat alat kustom, *docstring* (dokumentasi fungsi) adalah yang paling penting karena dibaca oleh AI sebagai "manual instruksi", bukan kode programnya.
*   **Efisiensi Biaya**: Penggunaan model open-source (seperti Kimmy K2) jauh lebih hemat biaya ($2 per juta token) dibandingkan model proprietary (seperti GPT-4o sekitar $14), memberikan fleksibilitas dalam pengembangan.
*   **Integrasi Hugging Face Spaces**: Agen dapat dengan mudah menggunakan ribuan aplikasi AI publik yang sudah ada sebagai "kemampuan" tambahannya tanpa perlu membuat dari nol.
*   **Deployment Instan**: Mengubah agen menjadi aplikasi web profesional yang dapat dibagikan hanya membutuhkan dua baris kode menggunakan Gradio.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Pendahuluan dan Perencanaan Proyek
Video dimulai dengan memperkenalkan konsep membangun asisten AI yang lengkap dengan antarmuka pengguna (UI). Tujuannya adalah menciptakan agen yang tidak hanya mengobrol, tetapi juga dapat menggunakan alat, mencari informasi terbaru di internet, dan membuat gambar. Contoh kasus yang diberikan adalah membuat gambar "Kanselir Jerman tahun 2010 yang sedang memainkan seruling", yang membutuhkan kemampuan agen untuk mengidentifikasi orang tersebut terlebih dahulu sebelum membuat gambar.

#### 2. Persiapan Lingkungan dan Komponen Inti
*   **Instalasi**: Langkah awal adalah menginstal pustaka melalui perintah `pip install small agents`.
*   **API Keys**: Diperlukan token dari Hugging Face sebagai kunci utama, serta opsi untuk mengintegrasikan API dari OpenAI dan Anthropic untuk fleksibilitas.
*   **Komponen Utama**:
    *   **Tool Calling Agent**: Bertindak sebagai manajer atau konduktor yang memutuskan alat mana yang digunakan.
    *   **Inference Client Model (LLM)**: Otak dari agen. Tutorial ini memulai dengan model open-source "Kimmy K2 Instruct" dari Hugging Face karena biayanya yang efektif.
    *   **Web Search Tool**: Alat bawaan yang memberikan akses internet real-time.

#### 3. Mekanisme Kerja Agent (Agentic Loop)
Setelah model dan agen diinisialisasi, demo menunjukkan bagaimana agen memproses pertanyaan:
1.  Menerima pertanyaan (Query).
2.  Menyadari bahwa ia tidak memiliki informasi tersebut.
3.  Memanggil alat pencarian web (Tool Call).
4.  Menerima hasil pencarian (Observation).
5.  Merumuskan jawaban akhir berdasarkan observasi. Siklus ini disebut sebagai *agentic loop*.

#### 4. Membuat Keterampilan Kustom (Custom Tools)
Pembahasan lanjut ke cara memberikan kemampuan khusus pada agen:
*   **Cara Pembuatan**: Cukup tulis fungsi Python normal dan tambahkan dekorator `@tool`. Tidak perlu skema JSON yang rumit.
*   **Best Practices**:
    *   **Docstring**: Bagian terpenting. AI membaca deskripsi ini untuk memahami fungsi tersebut, bukan kode di dalamnya.
    *   **Type Hints**: Penggunaan tipe data sangat disarankan.
    *   **Imports**: Letakkan *import* di dalam fungsi agar portabel.
*   **Chaining Tools**: Contoh diberikan dengan membuat alat `generate image`. Ketika diminta gambar Kanselir Jerman 2010, agen berulah secara cerdas: menggunakan pencarian web untuk menemukan "Angela Merkel", lalu menggunakan alat pembuat gambar untuk merealisasikan permintaan.

#### 5. Integrasi Hugging Face Spaces
Video menunjukkan kekuatan ekosistem Hugging Face. Dengan satu perintah `tool from space`, pengguna dapat menghubungkan agen ke ribuan aplikasi AI publik yang tersedia di Hugging Face Spaces. Pengguna hanya perlu menyediakan ID Space dan deskripsi singkat, dan agen akan menggunakan aplikasi eksternal tersebut seolah-olah itu adalah kemampuan bawaannya.

#### 6. Fleksibilitas Model dan Analisis Biaya
Salah satu keunggulan utama adalah kemudahan mengganti "otak" (model) agen:
*   Pengguna dapat beralih dari model open-source (Ky K2) ke model proprietary seperti GPT-4o hanya dengan mengubah *base URL*.
*   **Perbandingan Biaya**: Model open-source berbiaya sekitar $2 per juta token, sedangkan model proprietary sekitar $14. Terdapat selisih tujuh kali lipat.
*   **Saran**: Memulai pembangunan dengan model open-source yang kuat namun terjangkau adalah strategi yang lebih cerdas.

#### 7. Deployment Antarmuka (Grand Finale)
Tahap terakhir adalah membuat antarmuka pengguna agar agen dapat digunakan orang lain:
*   Menggunakan `Gradio UI` dari pustaka yang sama.
*   Hanya membutuhkan dua baris kode untuk memanggil UI dengan agen yang telah dibuat.
*   Sistem ini akan *spin-up* aplikasi web profesional yang lengkap dan memberikan URL publik yang bisa langsung dibagikan.

---

### Kesimpulan & Pesan Penutup
Video menutup dengan merangkum perjalanan pembelajaran: mulai dari menyiapkan lingkungan, menginisialisasi agen dengan otak dan alat, membangun alat kustom, mengintegrasikan ribuan alat dari Hugging Face Hub, hingga mendeploy semuanya sebagai aplikasi web. Pesan utamanya adalah bahwa membangun AI agent yang canggih kini menjadi lebih mudah, fleksibel, dan terjangkau, serta terbuka bagi siapa saja untuk mulai bereksperimen.

Read

file updated 2026-02-12 02:45:09 UTC