jyo's blog

Building PokéAgents: When AI Agents Paint Together (Part 2)

pokeagent-painting

In Part 1, we saw how personality changed the conversation. Now, let’s see what happens when we ask them to stop talking and start painting.

Chatting is easy. You can ramble, you can hedge, you can politely disagree. But art is unforgiving. You either put the pixel in the right spot, or you don't.

After seeing my agents argue about gym designs in text, I wanted to raise the stakes. I wanted to see if they could coordinate spatially. Could four distinct personalities, unable to speak to one another, work together to paint a single coherent image?

I built a new "Paint Mode" to find out. Here is what happened.

The Setup: Digital Lego

I didn't want them to generate a full image like Midjourney. That’s too easy. I wanted them to build it, brick by brick.

I set up a blank 16x16 grid where every cell started as white (#FFFFFF). I gave the team a single, simple objective: "Paint a Tree".

Then I imposed some strict rules in the code:

The Silence: They cannot talk to each other. No strategizing in chat.

The View: They can see the entire grid — which pixels are filled and what color they are — but they don't know who painted what.

The Move: On their turn, they can change exactly 5 pixels. That's it.

The Experiment: The Silent Tree

I hit start.

Turn 1: Bulbasaur went first. He ignored the ground completely. Being the nature-loving type, he went straight for the foliage. Action: Painted 5 green pixels floating in the upper-middle of the grid. Reasoning: "A tree is defined by its leaves. Starting with the canopy."

Turn 2: Alakazam This is where it gets interesting. Alakazam logs in. He sees a random cluster of green pixels floating in the void. A generic AI might have just added more green nearby. But Alakazam is the "Analyst" archetype. He looked at the coordinates of the green cluster, calculated the center point, and dropped 5 brown pixels at the very bottom of the grid. Reasoning: "Leaves require support. Constructing the trunk foundation to align with the canopy."

Turn 3: Squirtle Squirtle looked at the green top and the brown bottom. The tree was technically "there," but the scene was empty. He didn't touch the tree. Instead, he painted 5 light blue pixels in the top corner. Reasoning: "The subject is clear, but it lacks context. Adding sky."

Turn 4: Charmander I was worried Charmander would just burn the thing down. Instead, he placed 3 orange pixels near the blue sky and 2 red ones inside Bulbasaur’s green leaves. Reasoning: "It's too boring! Adding a sun and some fruit."

The "Telepathy" of Pixels

The most cool part wasn't the drawing — it was the inference.

When Alakazam saw those green pixels, he didn't need a text message from Bulbasaur saying "I'm making the top." He inferred the intent from the geometry. He realized that for those green pixels to be "leaves," there had to be a trunk beneath them.

In biology, this is called Stigmergy — it's how termites build massive cathedrals without a blueprint. They respond to the state of the structure itself. My agents were doing the same thing. They were reading each other's minds through hex codes.

When It Goes Wrong

Of course, it wasn't always harmonious. Personality clashes look different in pixel art.

In a later run, I asked them to paint an "Ocean Scene."

Squirtle spent his turns meticulously filling the bottom rows with deep blue.

Pikachu, trying to be high-energy, kept dropping bright yellow pixels right in the middle of the water.

Alakazam eventually got frustrated. On his turn, he actually overwrote Pikachu’s yellow pixels with dark blue.

Alakazam's Reasoning: "The yellow creates too much contrast and breaks the pattern. correcting."

It was a visual argument. Pikachu wanted chaos; Alakazam wanted order.

Why This Matters

We're used to AI that generates everything in one shot. You type a prompt, you get a result.

But this experiment showed me that the future of AI might not be about generation, but collaboration. The "final image" here was just a blocky, low-res tree. But the process was startlingly human. It involved anticipation, specialization, and even non-verbal corrections.

If we can get AI agents to understand "intent" just by looking at the partial work of a teammate — whether that's a half-written codebase or a half-painted grid — we're one step closer to agents that actually work with us, rather than just for us.