u/Few-Ad-1358

▲ 0 r/webdev

Devs using AI coding agents: where does trust break in your workflow?

For people using AI coding agents in real codebases, I’m trying to understand the actual workflow — not the hype version.

When you give an agent a task, what usually happens?

- Do you write a detailed plan/spec first?
- Do you give it a short GitHub issue and let it figure things out?
- Do you review mainly after the PR/diff is done?
- Do you break work into tiny tasks because larger ones get risky?

I’m especially curious where your time goes:

- How much time do you spend planning before the agent writes code?
- How much time do you spend reviewing/fixing after it writes code?
- At what point do you stop trusting the agent?
- What mistakes happen most often?
- scope drift
- wrong assumptions
- touching unrelated files
- missing tests
- passing CI but still doing the wrong thing
- messy PRs
- hard-to-review diffs

What are you currently doing to make AI-written code safer?

- strict prompts
- checklists
- CI/tests
- manual PR review
- asking the agent for a plan first
- limiting file access/scope
- smaller issues
- another agent reviewing the first one
- something else?

One thing I’m trying to figure out:

**If you wanted 99% confidence before merging AI-written code, what would need to be true?**

For example, would you want:

- a better pre-coding plan?
- a way to lock the agent to approved scope?
- proof of what tests/checks it ran?
- a summary comparing the final diff against the original issue?
- a warning when the agent touches unrelated files?
- a trust score/check on the PR?
- something more like CI, but for agent behavior instead of just tests?

Also: would adding this kind of gate feel useful, or would it feel like annoying process overhead?

Trying to learn how people actually work with coding agents today, and what would make them trustworthy enough for serious team usage.

reddit.com
u/Few-Ad-1358 — 2 days ago
▲ 0 r/github

Devs using AI coding agents: where does trust break in your workflow?

For people using AI coding agents in real codebases, I’m trying to understand the actual workflow — not the hype version.

When you give an agent a task, what usually happens?

- Do you write a detailed plan/spec first?
- Do you give it a short GitHub issue and let it figure things out?
- Do you review mainly after the PR/diff is done?
- Do you break work into tiny tasks because larger ones get risky?

I’m especially curious where your time goes:

- How much time do you spend planning before the agent writes code?
- How much time do you spend reviewing/fixing after it writes code?
- At what point do you stop trusting the agent?
- What mistakes happen most often?
- scope drift
- wrong assumptions
- touching unrelated files
- missing tests
- passing CI but still doing the wrong thing
- messy PRs
- hard-to-review diffs

What are you currently doing to make AI-written code safer?

- strict prompts
- checklists
- CI/tests
- manual PR review
- asking the agent for a plan first
- limiting file access/scope
- smaller issues
- another agent reviewing the first one
- something else?

One thing I’m trying to figure out:

**If you wanted 99% confidence before merging AI-written code, what would need to be true?**

For example, would you want:

- a better pre-coding plan?
- a way to lock the agent to approved scope?
- proof of what tests/checks it ran?
- a summary comparing the final diff against the original issue?
- a warning when the agent touches unrelated files?
- a trust score/check on the PR?
- something more like CI, but for agent behavior instead of just tests?

Also: would adding this kind of gate feel useful, or would it feel like annoying process overhead?

Trying to learn how people actually work with coding agents today, and what would make them trustworthy enough for serious team usage.

reddit.com
u/Few-Ad-1358 — 2 days ago

Devs using AI coding agents: where does trust break in your workflow?

For people using AI coding agents in real codebases, I’m trying to understand the actual workflow — not the hype version.

When you give an agent a task, what usually happens?

- Do you write a detailed plan/spec first?
- Do you give it a short GitHub issue and let it figure things out?
- Do you review mainly after the PR/diff is done?
- Do you break work into tiny tasks because larger ones get risky?

I’m especially curious where your time goes:

- How much time do you spend planning before the agent writes code?
- How much time do you spend reviewing/fixing after it writes code?
- At what point do you stop trusting the agent?
- What mistakes happen most often?
- scope drift
- wrong assumptions
- touching unrelated files
- missing tests
- passing CI but still doing the wrong thing
- messy PRs
- hard-to-review diffs

What are you currently doing to make AI-written code safer?

- strict prompts
- checklists
- CI/tests
- manual PR review
- asking the agent for a plan first
- limiting file access/scope
- smaller issues
- another agent reviewing the first one
- something else?

One thing I’m trying to figure out:

**If you wanted 99% confidence before merging AI-written code, what would need to be true?**

For example, would you want:

- a better pre-coding plan?
- a way to lock the agent to approved scope?
- proof of what tests/checks it ran?
- a summary comparing the final diff against the original issue?
- a warning when the agent touches unrelated files?
- a trust score/check on the PR?
- something more like CI, but for agent behavior instead of just tests?

Also: would adding this kind of gate feel useful, or would it feel like annoying process overhead?

Trying to learn how people actually work with coding agents today, and what would make them trustworthy enough for serious team usage.

reddit.com
u/Few-Ad-1358 — 2 days ago

Devs using AI coding agents: where does trust break in your workflow?

For people using AI coding agents in real codebases, I’m trying to understand the actual workflow — not the hype version.

When you give an agent a task, what usually happens?

- Do you write a detailed plan/spec first?
- Do you give it a short GitHub issue and let it figure things out?
- Do you review mainly after the PR/diff is done?
- Do you break work into tiny tasks because larger ones get risky?

I’m especially curious where your time goes:

- How much time do you spend planning before the agent writes code?
- How much time do you spend reviewing/fixing after it writes code?
- At what point do you stop trusting the agent?
- What mistakes happen most often?
- scope drift
- wrong assumptions
- touching unrelated files
- missing tests
- passing CI but still doing the wrong thing
- messy PRs
- hard-to-review diffs

What are you currently doing to make AI-written code safer?

- strict prompts
- checklists
- CI/tests
- manual PR review
- asking the agent for a plan first
- limiting file access/scope
- smaller issues
- another agent reviewing the first one
- something else?

One thing I’m trying to figure out:

**If you wanted 99% confidence before merging AI-written code, what would need to be true?**

For example, would you want:

- a better pre-coding plan?
- a way to lock the agent to approved scope?

- proof of what tests/checks it ran?
- a summary comparing the final diff against the original issue?
- a warning when the agent touches unrelated files?
- a trust score/check on the PR?
- something more like CI, but for agent behavior instead of just tests?

Also: would adding this kind of gate feel useful, or would it feel like annoying process overhead?

Trying to learn how people actually work with coding agents today, and what would make them trustworthy enough for serious team usage.

reddit.com
u/Few-Ad-1358 — 2 days ago
▲ 1 r/ExperiencedDevs+1 crossposts

Devs using AI coding agents: where does trust break in your workflow?

For people using AI coding agents in real codebases, I’m trying to understand the actual workflow — not the hype version.

When you give an agent a task, what usually happens?

- Do you write a detailed plan/spec first?
- Do you give it a short GitHub issue and let it figure things out?
- Do you review mainly after the PR/diff is done?
- Do you break work into tiny tasks because larger ones get risky?

I’m especially curious where your time goes:

- How much time do you spend planning before the agent writes code?
- How much time do you spend reviewing/fixing after it writes code?
- At what point do you stop trusting the agent?
- What mistakes happen most often?
- scope drift
- wrong assumptions
- touching unrelated files
- missing tests
- passing CI but still doing the wrong thing
- messy PRs
- hard-to-review diffs

What are you currently doing to make AI-written code safer?

- strict prompts
- checklists
- CI/tests
- manual PR review
- asking the agent for a plan first
- limiting file access/scope
- smaller issues
- another agent reviewing the first one
- something else?

One thing I’m trying to figure out:

**If you wanted 99% confidence before merging AI-written code, what would need to be true?**

For example, would you want:

- a better pre-coding plan?
- a way to lock the agent to approved scope?

- proof of what tests/checks it ran?
- a summary comparing the final diff against the original issue?
- a warning when the agent touches unrelated files?
- a trust score/check on the PR?
- something more like CI, but for agent behavior instead of just tests?

Also: would adding this kind of gate feel useful, or would it feel like annoying process overhead?

Trying to learn how people actually work with coding agents today, and what would make them trustworthy enough for serious team usage.

reddit.com
u/Few-Ad-1358 — 2 days ago
▲ 1 r/github+1 crossposts

Roast my PR summary format: I'm trying to compress AI-generated PRs into a 60-second risk assessment. Would this actually save you time?

Example PR from n8n-io/n8n #30589

# Evidence Brief: n8n-io/n8n#30589


**Risk:**
 HIGH  
**Confidence:**
 MEDIUM  
**Scope clarity:**
 PARTIAL


**Title:**
 test(core): Add Playwright LangSmith eval scaffolding (no-changelog)


## 60-Second  Receipt
### Open reviewer/bot concerns
- 6 open concern(s): 4 medium, 2 low
- 
**MEDIUM / bot_review**
 by codecov[bot] — ## [Codecov](https://app.codecov.io/gh/n8n-io/n8n/pull/30589?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_…
- 
**MEDIUM / bot_review**
 by cubic-dev-ai[bot] — 
**1 issue found**
 across 11 files <details> <summary>Prompt for AI agents (unresolved issues)</summary> ```text Check if these issues are valid — if so, understand the root cause…
- 
**MEDIUM / bot_review**
 by cubic-dev-ai[bot] `packages/testing/playwright/fixtures/langsmith.ts` — <!-- metadata:{"confidence":9} --> P2: The timeout fallback in `Promise.race` is never cleared, so a pending 30s timer can keep the worker alive after flush completes. <details> <…
- ... 3 more concern(s) in fetched PR discussion


### Decision
- 
**BLOCK BEFORE MERGE**
 — CI/checks are failing or cancelled; resolve or explain before relying on the PR.


### Blast Radius
- 11 file(s) changed
- supply-chain/deps — package, lockfile, workflow, or install surface touched
- CI/checks — validation status is part of the review surface
- sensitive files — config, agent instructions, security, release, or repo-control files touched
- external effects — network/transport, dependency fetch, or external trust boundary may change


### Claims vs Evidence
- No explicit author/agent implementation claims extracted.


### Top 3 Falsifiable Reviewer Checks
1. 
**CI/checks need attention**
 — Falsify this: Which failed, errored, or cancelled checks need attention before review? Receipt: “CI: PR Quality Checks / Ownership Acknowledgement: failure”; “CI: PR Quality Checks / Required PR Quality Checks: failure”
2. 
**Dependency/supply-chain changed**
 — Falsify this: Do dependency, lockfile, package-manager, or CI install changes preserve trusted sources, pinned versions, reproducible installs, and expected vulnerability posture? Receipt: “packages/testing/playwright/package.json”; “pnpm-lock.yaml”
3. 
**Sensitive path changed**
 — Falsify this: Do these sensitive files match the intended scope and have adequate verification? Receipt: “packages/testing/playwright/package.json”; “packages/testing/playwright/playwright.config.ts”


### Validation Receipt
- CI/check aggregate: success 41, failure 2, skipped 10
- Failed/error/cancelled checks needing attention: CI: PR Quality Checks / Ownership Acknowledgement: failure; CI: PR Quality Checks / Required PR Quality Checks: failure
- Passing/neutral/skipped checks: 51 total; examples: CI: Check merge source and destination / enforce-bundle-branches-only-in-private: skipped; CI: PR Quality Checks / Handle /size-limit-override: skipped; Build: Windows / build: success; CI: CLA Check / Verify CLA signatures: success; CI: Check PR Title / check-pr-title: success
- Reported validation: UNVERIFIED — `pnpm --filter=n8n-playwright typecheck` clean
- Reported validation: UNVERIFIED — `pnpm --filter=n8n-playwright test:evals:smoke` (offline) → 2 passed, 2 skipped
- Reported validation: UNVERIFIED — ... 3 more omitted


### Assumptions / Unknowns
- No linked issue or task reference
- No explicit acceptance criteria
- No local repo checkout provided; deeper call-site/test context not expanded
- Reported validation was not independently run by this CLI
- ... 1 more omitted



<details>
<summary>Full evidence details</summary>


## Changed Files
- .gitignore
- packages/testing/playwright/fixtures/eval-base.ts
- packages/testing/playwright/fixtures/langsmith.ts
- packages/testing/playwright/package.json
- packages/testing/playwright/playwright-projects.ts
- packages/testing/playwright/playwright.config.ts
- packages/testing/playwright/reporters/langsmith-eval.ts
- packages/testing/playwright/tests/evals/_smoke/anthropic.spec.ts
- ... 3 more omitted



## CI / Check Evidence
- CI/check aggregate: success 41, failure 2, skipped 10
- Failed/error/cancelled checks needing attention: CI: PR Quality Checks / Ownership Acknowledgement: failure; CI: PR Quality Checks / Required PR Quality Checks: failure
- Passing/neutral/skipped checks: 51 total; examples: CI: Check merge source and destination / enforce-bundle-branches-only-in-private: skipped; CI: PR Quality Checks / Handle /size-limit-override: skipped; Build: Windows / build: success; CI: CLA Check / Verify CLA signatures: success; CI: Check PR Title / check-pr-title: success



## Reported Validation
- UNVERIFIED (reported by PR author): `pnpm --filter=n8n-playwright typecheck` clean
- UNVERIFIED (reported by PR author): `pnpm --filter=n8n-playwright test:evals:smoke` (offline) → 2 passed, 2 skipped
- UNVERIFIED (reported by PR author): Same with `LANGSMITH_TRACING=true LANGSMITH_API_KEY=...` → 3 passed, 1 skipped; runs visible in LangSmith `playwright` project with `passed` feedback
- UNVERIFIED (reported by PR author): Same with `ANTHROPIC_API_KEY=...` → real `claude-haiku-4-5-20251001` call captured in LangSmith
- UNVERIFIED (reported by PR author): Worker-scoped flush verified (suite duration jumps from 0ms-flush to ~600ms when tracing on — proves batch HTTP flush is happening)



## Scope Evidence
**Signals**
- PR title present: test(core): Add Playwright LangSmith eval scaffolding (no-changelog)
- PR body present
- Branch name available: qa-playwright-langsmith-eval-scaffold
- 3 commit(s) available for scope inference



**Gaps**
- No linked issue or task reference
- No explicit acceptance criteria



## Claims Check
- No explicit agent/task claims extracted.


## Risk Signals
- 
**MEDIUM / supply_chain_security_change**
  - Evidence: packages/testing/playwright/package.json
  - Evidence: pnpm-lock.yaml
  - Evidence: ... 10 more omitted
  - Human question: Do dependency, lockfile, package-manager, or CI install changes preserve trusted sources, pinned versions, reproducible installs, and expected vulnerability posture?
- 
**MEDIUM / sensitive_path**
  - Evidence: packages/testing/playwright/package.json
  - Evidence: packages/testing/playwright/playwright.config.ts
  - Evidence: ... 1 more omitted
  - Human question: Do these sensitive files match the intended scope and have adequate verification?
- 
**MEDIUM / large_diff**
  - Evidence: 371 additions, 84 deletions
  - Human question: Can this PR be reviewed safely as one unit, or should it be split?
- 
**HIGH / failing_ci**
  - Evidence: CI: PR Quality Checks / Ownership Acknowledgement: failure
  - Evidence: CI: PR Quality Checks / Required PR Quality Checks: failure
  - Human question: Which failed, errored, or cancelled checks need attention before review?


## Context Gaps
- No local repo checkout provided; deeper call-site/test context not expanded



</details>
u/Few-Ad-1358 — 5 days ago