CI for AI features

Fail the build when an AI feature regresses.

PromptGate is a GitHub App that turns prompt and model changes into checked CI status, the same way a failing unit test fails the build. Catch silent regressions before your users do.

Built by Yelle Software. Free tier for one repo. BYO LLM key.

Who PromptGate is for

  • Engineering managers shipping AI features

    You own a repo with one or more AI-touching code paths and you merge prompt or model changes weekly. Your evals live in a notebook. You have no automated regression bar between "merge to main" and "ship to prod."

  • Platform engineers and staff engineers

    You want CI to fail loudly when a colleague swaps a model or tweaks a system prompt and quality drops. You do not have budget to build an in-house eval platform.

  • Teams 20–200 engineers

    You are past the prototype phase. AI features are in production, customers depend on them, and "we tested it locally" is no longer a release strategy.

How it works

  1. Install the GitHub App

    Grant access to the repos with AI code paths.

  2. Drop in promptgate.yaml

    Point at one or more eval suites. Deterministic judges (string, regex, JSON schema, function-call shape) and LLM judges (BYO key) both work out of the box.

  3. Every PR gets a required status check

    Regressions fail the build. The dashboard shows the diff between the PR run and main, so reviewers can decide whether the regression is intentional.

Join the waitlist

We are validating intent before we ship. Tell us which AI feature in your product is hardest to test today and we will email you when PromptGate is open for installs.

We only use this to email you about PromptGate access.

FAQ

Does PromptGate run my LLM calls for me?

No. You bring your own LLM provider key (OpenAI today, more providers on the roadmap). PromptGate orchestrates the eval runs and reports pass/fail. Your prompts, your data, your provider, your bill.

What does the free tier cover?

One repo, one eval suite, 100 LLM-judge runs per month, and 30 days of run history. That is enough to prove the workflow on a side project or a single production path.

What about GitLab or Bitbucket?

GitHub at launch. GitLab support is on the roadmap for later this year. Bitbucket is demand-driven.

How do you store my evals?

Eval configs live in your repo as promptgate.yaml. We store run results (pass/fail, judge output, durations) in our managed Postgres with an S3-compatible artifact store behind it. Inputs and outputs you mark as sensitive can be hashed before storage.

Who is building this?

Yelle Software, a small custom-software studio. PromptGate is our first own-IP SaaS.