Vibecoding: A Practical Workflow for Real Shipping

In the last year, I’ve shipped six Figma plugins (one with 37k+ users), a restaurant-matching app (Twibble), and Pewbeam (an AI agent that projects scriptures to screen as a pastor preaches, in under 2 seconds). I’m highlighting credibility to set you up for the practical ideas ahead. This article lays out the exact loop I use daily: pitfalls, patterns, prompts, and concrete examples you can adapt.

vibecoding

What “Coding with AI” Is (for me)

I’m not sold on the word vibecoding. If you’ve actually tried coding with AI, you know there’s nothing “vibe” about it; it’s disciplined, iterative work. You’re at a disadvantage if you don’t know how to code and you don’t know how to prompt. It’s a misconception that you can toss a few words at a model and get clean, production-ready code. Maybe one day, but not yet.

As you read, I’ll use vibecoding and coding with AI interchangeably. Coding with AI is about mastering prompting, understanding the basics of code (bonus if you know HTML, CSS, and JS), and having the patience to face countless “I can see the issue” moments without getting frustrated. For me, AI is like a genius junior engineer: it knows a lot but lacks experience, and it only excels when you guide it clearly.

What actually happens inside an AI model

Understanding how models work helps you use them better:

Pre-training. LLMs learn to predict the next token from huge text corpora. They don't memorize every sentence; they adjust billions of parameters to capture patterns.
Architecture: Transformers. With self‑attention each token can "look at" others to capture long‑range links (like variable definitions ↔ usage). Positional encoding adds order, and multi‑head attention spots different patterns in parallel.
Instruction tuning and RLHF. The raw model is fine‑tuned on instruction–response pairs and aligned with human preferences so it follows directions better and avoids bad behavior.
Inference. Your prompt is tokenized, run through the network, and the next token is sampled from a probability distribution.
- Temperature controls randomness (low = deterministic, high = creative).
- Top‑p / nucleus sampling narrows choices to the smallest set whose cumulative probability ≥ p.
- Context window is how much text the model can "see" at once.
Tool use and retrieval. RAG fetches external docs and feeds them in. MCP is an open standard that lets AI apps call tools and databases safely.

The biggest pitfalls I see (and how to fix them)

a) Hallucinations. Models can sound confident and still be wrong. Think autocomplete without ground truth. A lot of this is why the Model Context Protocol exists: to give models structured access to the context and tools they need. Your job is to be context‑specific and define boundaries so the model knows how far to go.

Practical prompting plays you can drop into your loop:

Ask for sources or evidence.
Weak: “Explain why my Next.js build is slow.”
Strong: “Explain likely causes for a slow Next.js build and cite the specific config files or code paths you’re inferring from (e.g., next.config.mjs, babel.config.js, pages/_app.tsx).”
Set clear boundaries.
“Refactor this React hook to remove race conditions. If you lack enough code context to guarantee correctness, say exactly what’s missing and stop.”
Break down the task.
Step 1: “List three hypotheses for why this PostgreSQL query is slow based on the provided schema.”
Step 2: "For hypothesis 1, propose an index and show the exact CREATE INDEX statement. If a needed column is missing, say so."
Use MCP or a retrieval step to provide ground‑truth docs (Figma spec, API schema, DB schema).
Constrain the output format.
"Return a migration plan as valid JSON."
Require tests or proofs.
"Propose a refactor and include pytest tests that reproduce the bug. If you can't reproduce it from the snippet, list the missing fixture."
Make the model verify itself.
“Generate a SQL query, then restate the schema, the join keys, and why the query cannot duplicate rows. Flag any unverified assumptions.”

b) Token limits. Context windows vary. If your input is too long, parts get dropped or summarized, and the model starts guessing. It also affects cost because pricing is per token. More words ≠ better results. Keep prompts tight and scoped.

Tactics that help:

Chunk your code. Don’t paste a whole repo or ask for all features at once. Start with “Here’s auth.ts. Later I’ll share db.ts. First, fix auth.”
Iterative prompting. First: “List likely bugs in this hook.” Then: “Fix bug #2 only.” Smaller steps, fewer wasted tokens.
Externalize context. Store big schemas and docs elsewhere and inject only what’s needed.
Prompt compression. Draft long, then compress it with your chat model: “Summarize this into a prompt for a coding AI without losing context; keep my tone.”
Model choice. Some models offer larger context windows; switch when you need more capacity.

c) Security. This is much better now, but don't get lax. Keep secrets in .env. Before you deploy, run a quick security pass: check for logs that leak sensitive data, tighten CORS, and sanitize inputs. Also use linters and type checks. For JS/TS, start with ESLint and typescript‑eslint.

The “which model is best?” dilemma

People debate this every day. There's no single winner, and the landscape moves fast. My approach: use more than one model and pick the best tool for each job. This is why I like Cursor as a coding assistant (it lets me switch models easily). I also keep an eye on tools like Warp and Windsurf, and I try Claude Code or Grok depending on the task.

My four operating modes

Now that the jargon is out of the way, here’s my process:

Product Requirements Document (PRD). This is important. You don’t have to write it from scratch, but having one is a big advantage. If you have an idea, ask your model to draft a PRD and tailor it. I tell it to include a vibecoding plan that chunks the work into stages. If you think you’ll tell AI to build your app in one shot and it will appear, you’re joking. It’s also dangerous; you’ll invite bugs you’ll fix later. Save the PRD as .md for better model handling.
Scaffold. When you’re ready, feed the PRD to the model, ask it to study it, then scaffold the project. It should write specs, constraints, acceptance tests, and create folders or stubs. Tell it not to invent runnable code if context is missing.
Drive. Iterate in tiny diffs. Follow your vibecoding plan feature by feature. Test after each slice. Because some models forget, ask the model to create a memory.md and append changes after every session. Before writing new code, it should check memory.md.
Review. Crucial. If you can code, add at least one human‑written test the model didn’t anticipate. If you can’t, have the model write tests, then still run linters, unit tests, and type checks. I also run a build before committing.

What you should never do

Ask for a whole app in one go.
Merge code you didn’t run locally.
Let the model invent APIs you won’t own long‑term.
Skip tests and telemetry “for later.”
Paste secrets or customer data into chats.

Prompt patterns that never fail me

Spec → tests first
Goal: “Users can save a restaurant to Favorites.”
Constraints: offline‑first, dedupe on key, sync later.
Ask: “Write 3 acceptance tests that will pass when done.”
Difficult bug, Thanos ring
“Using first‑principles thinking, approach this bug like a senior engineer.”
Senior‑engineer critique
“Critique this function for naming, complexity, and failure modes. Suggest a smaller version. Then show the patch.”
Strict instruction
“Do not invent APIs or code. If you hit a roadblock, stop and ask for help.”

What “good” AI coding should mean to you

Lead time to first slice: hours → minutes.
Review delta: the percentage of AI‑generated code you changed before merge (should drop over time).
Exploration breadth: you compared at least two viable options before deciding.
Bug source mix: spec vs. model vs. integration (track it).
Cost per merged LOC: a rough metric to keep token spend honest.

Quick reference links (for the jargon we used)

LLMs, Tokens, Transformer, Self‑attention (visual guide), Instruction tuning, RLHF, Temperature & Top‑p, Context window, RAG, MCP, CREATE INDEX (Postgres), pytest, ESLint, typescript‑eslint, Cursor, Warp, Windsurf, Claude Code, Grok

Additional resources

What “Coding with AI” Is (for me)

What actually happens inside an AI model

The biggest pitfalls I see (and how to fix them)

The “which model is best?” dilemma

My four operating modes

What you should never do

Prompt patterns that never fail me

What “good” AI coding should mean to you

Quick reference links (for the jargon we used)

Additional resources

Model Context Protocol (MCP)

Proper vibecoding (prompting for coding, workflow, and patterns)

How LLMs work (friendly intros)

Avoiding hallucination (especially in tool/MCP workflows)

Coding with Cursor and Claude Code