Documentation
Peekr records every LLM call, tool invocation, token count, and error as a tree you can inspect. This page covers everything from installation to advanced usage.
On this page
Installation
terminalpip install peekr # base — no LLM SDK required pip install "peekr[openai]" # with OpenAI pip install "peekr[anthropic]" # with Anthropic pip install "peekr[all]" # both
Requires Python 3.9+. No accounts, no backend, no environment variables.
Quickstart
Script / standalone agent — 2 lines
Add these two lines at the very top of your entrypoint, before any other imports that load an LLM SDK:
agent.pyimport peekr peekr.instrument() # everything below is unchanged import openai response = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}] )
Every LLM call is now traced automatically. Peekr writes to traces.jsonl and prints to the console.
terminalpeekr view traces.jsonl # tree view peekr view --io traces.jsonl # include inputs and outputs
FastAPI / Starlette service — 3 lines
If your agent runs inside an HTTP server, add the middleware after creating the app. This creates a root span for every request so all LLM calls appear nested under the route that triggered them:
main.pyimport peekr peekr.instrument() # line 1 from fastapi import FastAPI app = FastAPI() app.add_middleware(peekr.FastAPIMiddleware) # line 2 # optional — ship traces to Peekr Cloud peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", # line 3 ), )
Waterfall with middleware:
output● POST /v1/answer 19.3s ← root — full request duration ├─ Embed query 1.4s └─ Generate answer 16.4s · 6.0k tok · 0.20¢
Without middleware, the same spans appear as a flat list with no parent. See FastAPI middleware for all options.
Example: Debug a wrong answer
Your agent returns an incorrect response and you don't know why. Add @trace to your tool functions and run with --io:
agent.pyimport peekr peekr.instrument() from peekr import trace @trace def fetch_user(user_id: int) -> dict: return db.get(user_id) # returns None if not found @trace(name="agent.run") def run(user_id: int): user = fetch_user(user_id) # bug: no null check before passing to LLM return openai.chat.completions.create( model="gpt-4o", messages=[{"role": "system", "content": f"User: {user}"}] )
peekr view --io traces.jsonlagent.run 2100ms └─ tool.fetch_user 12ms in: {"args": [42], "kwargs": {}} out: null ← found it └─ openai.chat.completions [gpt-4o] 2088ms in: [{"role": "system", "content": "User: null..."}]
The LLM received null as the user object. The fix is a null check in run(), not a prompt change.
Example: Find slow steps
Wrap every step in your agent with @trace and look at the durations:
agent.pyfrom peekr import trace @trace def search_web(query: str) -> list: ... @trace def rerank_results(results: list) -> list: ... @trace(name="agent.run") def run(query: str): results = search_web(query) ranked = rerank_results(results) return openai.chat.completions.create(...)
peekr view traces.jsonlagent.run 4300ms └─ tool.search_web 3800ms ← 88% of time └─ tool.rerank_results 18ms └─ openai.chat 490ms
Cache search_web results for repeated queries, or run it in parallel with other setup work. The LLM is not the bottleneck.
Example: Reduce token costs
Run your agent a few times on the same task and compare token counts across traces:
terminalpeekr view traces.jsonl
outputTrace a3f2b1c0 18,432 tokens Trace b2e4c8f1 21,104 tokens Trace c5d9e2a7 24,891 tokens
Token count growing each run is the signature of unbounded history — the agent appends every message to the next call. Fix: summarize or truncate the conversation after a fixed number of turns.
agent.py# Before: growing history messages = conversation_history # gets longer every turn # After: summarize after 5 turns if len(conversation_history) > 10: # 5 exchanges = 10 messages summary = summarize(conversation_history) messages = [{"role": "system", "content": summary}]
Example: Prod vs local bugs
Your agent passes tests locally but fails in production. Capture traces in both environments and compare tool outputs:
agent.py@trace def fetch_inventory(sku: str) -> list: return inventory_api.get(sku)
local tracetool.fetch_inventory 8ms in: {"sku": "ABC-123"} out: [{"id": 1, "qty": 42}] ← data present locally
prod tracetool.fetch_inventory 8ms in: {"sku": "ABC-123"} out: [] ← empty in prod
The agent logic is identical. The inventory API returns different data in prod — likely a missing data migration or environment-specific config. Fix the data source, not the agent.
instrument()
Call once at startup — before any LLM SDK imports or client instantiation. Patches OpenAI (sync + async, chat + embeddings), Anthropic, AWS Bedrock, and Google Gemini at the class level so every client instance is covered automatically.
peekr.instrument() must run before any LLM client is imported or created. The safest place is immediately after load_dotenv() / env setup, before any other application imports:
# ✓ correct — instrument before importing anything that touches an LLM SDK
from dotenv import load_dotenv
load_dotenv()
import peekr
peekr.instrument(...) # ← before everything else
from myapp.routes import answer, recall # ← these import openai, anthropic, etc.
from myapp.llm import client # ← also fine — patched at class level
# ✗ wrong — openai is imported and a client created before instrument() runs
from openai import OpenAI
client = OpenAI() # module-level singleton, already created
import peekr
peekr.instrument(...) # too late for this specific instance if lru_cache is used
Why? peekr patches at the class level, so new instances created after instrument() are always captured. Pre-existing instances also work because Python resolves methods on the class at call time — not at instantiation. The one exception is @lru_cache or module-level singletons whose internal state bypasses the patched method: calling instrument() first avoids this entirely.
pythonpeekr.instrument( console=True, # print spans live (default: True) storage="jsonl", # "jsonl" | "sqlite" | "both" jsonl_path="traces.jsonl", # JSONL output path db_path="traces.db", # SQLite output path )
| Parameter | Type | Default | Description |
|---|---|---|---|
console | bool | True | Print each span to stdout as it completes |
storage | str | "jsonl" | "jsonl", "sqlite", or "both" |
jsonl_path | str | "traces.jsonl" | Path for JSONL output |
db_path | str | "traces.db" | Path for SQLite output |
SQLite storage
SQLite uses WAL mode so multiple processes — Docker containers, CI workers, parallel agents — can write spans safely at the same time. And because it's a real database, you can query across all your runs without any extra tooling.
python# Enable SQLite peekr.instrument(storage="sqlite") # Write to both JSONL and SQLite peekr.instrument(storage="both")
View with the same CLI command:
terminalpeekr view traces.db peekr view --io traces.db
Or query directly with any SQLite client:
terminal — useful queries# Slowest tool calls sqlite3 traces.db "SELECT name, ROUND(AVG(duration_ms)) avg_ms FROM spans GROUP BY name ORDER BY avg_ms DESC" # Token spend by model sqlite3 traces.db "SELECT json_extract(attributes,'$.model') model, SUM(json_extract(attributes,'$.tokens_total')) tokens FROM spans GROUP BY model" # All errors sqlite3 traces.db "SELECT name, trace_id, json_extract(attributes,'$.error') msg FROM spans WHERE status = 'error'" # Token growth over time (detect unbounded history) sqlite3 traces.db "SELECT trace_id, SUM(json_extract(attributes,'$.tokens_total')) total FROM spans GROUP BY trace_id ORDER BY start_time"
grep or tail -f.@trace decorator
Wraps a function as a span. Works on sync and async functions.
pythonfrom peekr import trace # Auto-names from module.function @trace def search_web(query: str) -> list: ... # Custom name @trace(name="tool.search") def search(query: str) -> list: ... # Opt out of capturing inputs/outputs (latency + status still recorded) @trace(capture_io=False) def fetch_api_key() -> str: ... # Async @trace async def fetch_user(user_id: int) -> dict: ...
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | None | module.function | Custom span name |
capture_io | bool | True | Record function args and return value |
capture_io=False for functions that handle secrets or large payloads.Manual spans
For cases where a decorator doesn't fit — e.g. a loop, a context manager, or code you can't modify:
pythonfrom peekr import start_span, end_span span, token = start_span("my.operation") span.attributes["custom_key"] = "value" try: result = do_work() span.status = "ok" except Exception as e: span.status = "error" span.attributes["error"] = str(e) raise finally: end_span(span, token) # always call — even on error
do_work() will automatically nest as children of this span.CLI viewer
terminalpeekr view traces.jsonl # tree view peekr view --io traces.jsonl # + inputs and outputs
Each trace is shown as a tree grouped by trace_id. The --io flag prints up to 120 characters of the serialized input and output for each span.
Custom exporters
Any object with an export(span) method works as an exporter:
pythonfrom peekr.exporters import add_exporter class HttpExporter: def export(self, span): requests.post( "https://your-backend.com/spans", json=span.to_dict() ) peekr.instrument() add_exporter(HttpExporter())
Multiple exporters can be active at once. The built-in JSONLExporter and ConsoleExporter are added by instrument(). You can add your own on top.
Span fields
Every span written to traces.jsonl is a JSON object with these fields:
| Field | Type | Description |
|---|---|---|
trace_id | string | Groups all spans in one agent run |
span_id | string | Unique ID for this span |
parent_id | string | null | ID of the parent span, or null for root |
name | string | Span name |
start_time | float | Unix timestamp |
end_time | float | Unix timestamp |
duration_ms | float | Wall-clock duration in milliseconds |
status | "ok" | "error" | Whether the span succeeded |
tenant_id | string | null | Customer org (B2B). First-class — top-level column in SQLite, top-level key in JSONL. Set via peekr.session(tenant_id=...), instrument(tenant_id=...), or env PEEKR_TENANT_ID. |
retention_class | string | null | Storage-tier hint (e.g. "default", "short", "long", "pii"). OSS stores it; storage tier interprets it. |
attributes.model | string | LLM model name (auto-captured) |
attributes.tokens_input | int | Prompt tokens (auto-captured) |
attributes.tokens_output | int | Completion tokens (auto-captured) |
attributes.tokens_total | int | Total tokens (auto-captured) |
attributes.input | string | Serialized function args (truncated) |
attributes.output | string | Serialized return value (truncated) |
attributes.error | string | Exception message if status is "error" |
attributes.session_id | string | Set when span is inside a peekr.session() |
attributes.user_id | string | Set when span is inside a peekr.session(user_id=...) |
attributes.eval_scores | dict | Evaluator name → score (0.0–1.0) when evaluators are configured |
attributes.experiment_variant | string | Variant name when inside a @peekr.experiment |
Sessions
Group all spans for a user, tenant, or conversation by passing identifiers to peekr.session(). Uses ContextVar so it propagates correctly across async code.
pythonimport peekr with peekr.session( user_id="user_123", # end-user (B2C) tenant_id="acme", # customer org (B2B) session_id="sess_abc", # auto-generated if omitted retention_class="long", # storage-tier hint ): run_agent()
tenant_id and retention_class are first-class columns on the span — see Multi-tenant traces.
Query by user in SQLite:
sqlSELECT trace_id, AVG(duration_ms), SUM(json_extract(attributes,'$.tokens_total')) FROM spans WHERE json_extract(attributes,'$.user_id') = 'user_123' GROUP BY trace_id;
Multi-tenant traces
Every span carries two first-class fields — tenant_id (the customer org) and retention_class (a storage-tier hint) — separate from user_id (the end-user). A B2B agent can tag both without conflict.
pythonimport peekr peekr.instrument(tenant_id="acme", retention_class="default") with peekr.session(user_id="alice", tenant_id="acme", retention_class="long"): run_agent()
Resolution order, highest priority first:
peekr.session(tenant_id=..., retention_class=...)peekr.instrument(tenant_id=..., retention_class=...)- Env vars
PEEKR_TENANT_ID/PEEKR_RETENTION_CLASS
Both fields are top-level columns in SQLite (with indices) and top-level keys in JSONL — no json_extract needed:
sqlSELECT tenant_id, COUNT(*) FROM spans GROUP BY tenant_id; SELECT * FROM spans WHERE retention_class = 'long' AND start_time > ?;
retention_class is a free-form string in the OSS SDK — recommended values are default, short, long, and pii. The meaning of each is enforced by your storage tier (or by Peekr Cloud).
attributes.tenant_id? So you can filter and index without JSON extraction — relevant the moment you have more than a handful of tenants or want to route ingestion. The SQLite exporter migrates pre-v0.3 databases automatically via PRAGMA user_version; legacy rows back-fill as NULL.Alerts
Alerts fire after each complete trace (identified by the root span). Pass them to instrument():
pythonpeekr.instrument(alerts=[ peekr.alert.ErrorRate(threshold=0.05, window=100), # >5% errors in last 100 traces peekr.alert.CostSpike(multiplier=2.0), # tokens 2× above rolling avg peekr.alert.LatencyP95(ms=5000), # p95 latency > 5s peekr.alert.TokenGrowth(runs=5), # growing 5 consecutive runs ])
Override on_trigger to send to Slack, PagerDuty, or anywhere:
pythonclass SlackAlert(peekr.alert.ErrorRate): def on_trigger(self, message: str) -> None: slack.send(f"#alerts: {message}") peekr.instrument(alerts=[SlackAlert(threshold=0.05)])
| Alert | Triggers when | Key params |
|---|---|---|
ErrorRate | Error % in last N traces > threshold | threshold, window=100 |
CostSpike | This trace tokens > multiplier × rolling avg | multiplier, window=50 |
LatencyP95 | p95 span latency in trace > ms | ms |
TokenGrowth | Token count strictly increasing for N runs | runs=5 |
Eval — LLM-as-judge
Evaluators run after each LLM span completes and write scores to span.attributes["eval_scores"]. A _in_eval guard prevents infinite recursion.
pythonpeekr.instrument(evaluators=[ peekr.eval.Rubric("Be concise and factually accurate"), peekr.eval.Hallucination(), # groundedness check (see below) peekr.eval.NotEmpty(), # output must be non-empty peekr.eval.NoError(), # span must have status=ok ])
Scores are written to span.attributes["eval_scores"] as a {evaluator_name: float} dict and shown inline by peekr view --io:
peekr view --io traces.jsonlopenai.chat.completions [gpt-4o] 843ms 312tok in: "Summarise this doc..." out: "The doc argues that..." eval_scores: {Rubric: 0.92, Hallucination: 0.95, NotEmpty: 1.0, NoError: 1.0}
Query scores in SQLite:
sqlSELECT name, AVG(json_extract(attributes,'$.eval_scores.Rubric')) rubric_avg, AVG(json_extract(attributes,'$.eval_scores.Hallucination')) hallucination_avg FROM spans WHERE json_extract(attributes,'$.eval_scores') IS NOT NULL GROUP BY name;
Write your own evaluator:
pythonfrom peekr.eval import BaseEvaluator class LengthCheck(BaseEvaluator): def evaluate(self, span) -> float: output = span.attributes.get("output", "") return 1.0 if len(output) < 500 else 0.0
| Evaluator | What it checks | Requires |
|---|---|---|
Rubric(criteria) | LLM scores output against your criteria (0.0–1.0) | openai or anthropic SDK |
Hallucination() | Fraction of claims grounded in the input/context (0.0–1.0) | openai or anthropic SDK |
NotEmpty() | Output attribute is non-empty string | Nothing |
NoError() | Span status is "ok" | Nothing |
Hallucination detection
The Hallucination evaluator scores how well an LLM output is supported by its input context. It uses an LLM-as-judge under the hood — the same fallback pattern as Rubric (OpenAI first, then Anthropic).
| Score | Meaning |
|---|---|
1.0 | Every factual claim in the output is supported by the context |
0.0 | No claim is supported — the output is fully hallucinated |
| between | The fraction of claims grounded in the context |
Plug it in like any other evaluator:
pythonimport peekr peekr.instrument(evaluators=[peekr.eval.Hallucination()]) import openai openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "The Eiffel Tower was completed in 1889 in Paris."}, {"role": "user", "content": "When was the Eiffel Tower built and by whom?"}, ], )
peekr view --io traces.jsonlopenai.chat.completions [gpt-4o] 843ms in: [{"role": "system", "content": "The Eiffel Tower was completed in 1889 in Paris."}, ...] out: "The Eiffel Tower was built in 1923 by Frank Lloyd Wright." eval_scores: {Hallucination: 0.0} ← invented year and architect
RAG flows: point it at retrieved documents
By default the evaluator uses the span's input (the messages sent to the LLM) as the grounding context. For RAG flows where the source documents live elsewhere — say, on a parent tool span — pass a context_extractor:
pythonpeekr.instrument(evaluators=[ peekr.eval.Hallucination( context_extractor=lambda span: span.attributes.get("retrieved_docs", ""), model="gpt-4o-mini", # optional — defaults to gpt-4o-mini / claude-haiku ), ])
1.0 (nothing to judge), so non-RAG spans don't drag down your average. The judge LLM call is cheap (max 10 output tokens) and runs after the original span completes — it never blocks the main request.Find your worst hallucinations
sql-- Lowest-scoring outputs across all runs SELECT trace_id, json_extract(attributes,'$.eval_scores.Hallucination') AS hallucination, json_extract(attributes,'$.output') AS output FROM spans WHERE hallucination IS NOT NULL AND hallucination < 0.5 ORDER BY hallucination ASC, start_time DESC LIMIT 20; -- Hallucination rate by model SELECT json_extract(attributes,'$.model') model, AVG(json_extract(attributes,'$.eval_scores.Hallucination')) avg_groundedness, COUNT(*) runs FROM spans WHERE json_extract(attributes,'$.eval_scores.Hallucination') IS NOT NULL GROUP BY model ORDER BY avg_groundedness ASC;
Detailed mode — RAGAS-style claim decomposition
The default mode returns a single score. detailed=True switches to a RAGAS Faithfulness-style pipeline: the judge first decomposes the output into atomic factual claims, then assigns each claim one of three verdicts:
| Verdict | Meaning |
|---|---|
supported | Claim is directly entailed by CONTEXT |
contradicted | Claim directly conflicts with CONTEXT |
unsupported | CONTEXT is silent about the claim |
The score becomes supported_count / total_claims and the full breakdown lands on the span at attributes.hallucination_details:
pythonpeekr.instrument(evaluators=[peekr.eval.Hallucination(detailed=True)])
span.attributes.hallucination_details{ "total": 3, "supported": 1, "contradicted": 2, "unsupported": 0, "score": 0.33, "claims": [ {"text": "The Eiffel Tower is in Paris", "verdict": "supported"}, {"text": "It was built in 1923", "verdict": "contradicted"}, {"text": "It was designed by Frank Lloyd Wright", "verdict": "contradicted"} ] }
This is what powers the drift dashboard's drill-down — you can see exactly which claims the model invented, not just an average score.
evaluators= list.Observability dashboard
Generate a self-contained, tabbed HTML report from your traces. Designed as a drop-in observability layer for any RAG or memory/agent pipeline — open the file in any browser, no server, no build step.
terminalpeekr dashboard traces.db -o report.html # SQLite peekr dashboard traces.jsonl # JSONL → ./dashboard.html open report.html
Five tabs, one URL
The dashboard is organised so a non-technical observer can stay on the Overview tab and still get the gist, while an engineer can drill into Traces / Quality / Diagnose for specifics. Tab state is in the URL hash so links are shareable. A persistent filter bar at the top applies across every tab.
| Tab | For | Contents |
|---|---|---|
#overview | First-impression / exec | Health hero (0–100), narrative bullets, 4 metric cards with sparklines, top 3 action items pulled from the diagnostic engine. |
#traces | "Find me that call" | Search box (trace ID, model, content, error), sortable table, click any row → side panel with full context vs answer, claim verdicts, citations, per-call action items. |
#quality | Trend monitoring | Rolling chart with warning (0.7) / critical (0.5) threshold lines, score distribution histogram, channel × time heatmap, claim-verdict doughnut, citation panel. |
#diagnose | Incident response | "Likely causes & next steps" with severity-tagged cards and numbered fix lists, plus the full worst-offenders panel with side-by-side highlighted context vs answer. |
#help | First-time setup | Setup checklist (auto-ticks live), glossary, evaluator configuration snippets, troubleshooting, keyboard shortcuts. |
Keyboard shortcuts
| Key | Action |
|---|---|
1–5 | Switch tabs |
/ | Jump to Traces tab and focus the search box |
R | Clear all filters |
Esc | Close the trace detail panel |
Filter bar
One persistent bar at the top of every tab. Click any chip to toggle that filter; every panel on every tab refilters immediately. The time-range chips include 5m, 15m, 30m, 1h, 24h, 7d, 30d presets plus a Custom… option with datetime-local from/to inputs. The "When = Custom" mode seeds itself to "last 1h up to the newest timestamp" so first activation isn't empty.
Panels at a glance
| Panel | What it shows | How to act on it |
|---|---|---|
| Health hero | One 0–100 score with a coloured dot (green/yellow/orange/red), tier label, count of flagged calls, and Δ vs baseline. | Red → open the recommendations panel below. |
| What's happening | 3–5 plain-English bullets summarising the situation: drift, worst channel, citation invention rate, error count. | Read top-to-bottom; the highest-priority finding is first. |
| Filter chips | Tenant · Model · Endpoint · Time range. Stack to drill in. | Click chips to refilter every panel. Click again to clear. |
| Metric cards | Hallucination · Rubric · Citations · Errors. Each with sparkline, Δ vs baseline, count of scored calls, and an action hint. | The hint at the bottom tells you the next step (e.g. "30 flagged — review worst offenders below"). |
| Likely causes & next steps | Diagnostic engine — runs eight pattern-detection rules and surfaces ranked recommendations with cause + numbered fix list. | Each card has a severity badge and a "what to try" list specific to that pattern. |
| Score over time | Rolling 20-call mean of every evaluator, with dashed warning (0.7) and critical (0.5) threshold lines. | Hover for trace details; click a point to jump to its worst-offender card. |
| Failure breakdown heatmap | Channel × time grid. Rows = your models/tenants/endpoints. Columns = time buckets. Colour = mean Hallucination. | Red rows tell you which channel is failing; rows that go green → red tell you when. Click a cell to filter. |
| Worst offenders | 12 lowest-scoring calls. Side-by-side context vs answer with contradicted claims highlighted, claim verdicts, citation list. | Each card ends with a "What to try for this call" box prescribing fixes specific to that span's failure pattern. |
Diagnostic rules
The recommendations panel inspects the filtered rows and emits cards from eight pattern-detection rules. Each card has a severity (high / medium / low / info / good), a plausible cause in plain English, and a numbered list of concrete fixes.
| Pattern | Triggers when | Sample recommendation |
|---|---|---|
| Invented citations | > 30% of detected citation patterns aren't in context | Tighten prompt; verify citations post-hoc; try hybrid retrieval |
| High contradiction rate | > 20% of judged claims directly contradict context | Strengthen system prompt; move context closer to question; reduce max_tokens |
| Out-of-context elaboration | > 25% unsupported claims with low contradiction | Add refusal instruction; check retrieval recall; coverage prompt |
| Channel concentration | > 50% of flagged calls share one model/tenant/endpoint | Diff deploys; compare prompts; verify index coverage for that channel |
| Hallucination drift | Δ vs baseline < −0.1 | Use heatmap to localise; cross-reference deploys; use peekr replay |
| Error spikes | > 5% of calls have status="error" | Check rate limits; verify fallback model quality; add retries |
| Citations all grounded | ≥ 5 citations, 0 invented | Add an alert on citation invention rate to catch future regressions |
| Healthy | No patterns triggered | Set up peekr.alert.ScoreFloor; run the offline benchmark periodically |
Per-span action items
Every worst-offender card ends with a tailored "What to try for this call" panel — separate from the aggregate recommendations. It inspects that one span's claims, citations, and context to suggest fixes targeted to its specific failure pattern:
| Detected on this span | What the action box suggests |
|---|---|
| Empty / short context but long answer | Retrieval miss — inspect what your retriever returned |
| Invented URLs / arXiv / DOIs / titles | Per-kind prompt fix + post-hoc citation verification |
| Contradicted numbers / dates | "Copy numerics verbatim" instruction; temperature=0 |
| Contradicted proper nouns | Explicit "don't substitute names" instruction |
| Mostly unsupported claims, no contradictions | Add refusal: "say I don't know if not in context" |
| Mostly contradicted claims | Move context closer to question; "context wins" instruction |
| Low score but no detailed claims | Enable Hallucination(detailed=True) to see what failed |
| Output much longer than context | Reduce max_tokens; long completions drift |
Tag spans for the channel breakdown
The heatmap groups by attributes.model (set automatically by the patches), attributes.user_id (set via peekr.session(user_id=...)), and attributes.endpoint (you set this). Without an endpoint attribute, the endpoint row of the heatmap simply doesn't render — the others still do.
pythonfrom peekr import trace, get_current_span @trace def handle_request(req): get_current_span().attributes["endpoint"] = req.path return call_llm(...) # Or in a FastAPI middleware — one place, every request tagged @app.middleware("http") async def tag_span(request, call_next): with peekr.session(user_id=request.headers.get("X-Tenant-Id")): span, token = peekr.start_span(f"http.{request.method}") span.attributes["endpoint"] = request.url.path try: return await call_next(request) finally: peekr.end_span(span, token)
peekr.instrument(). It's a post-hoc tool: rerun it whenever you want a fresh snapshot. For a real-time view, use peekr view --io in the terminal.Feedback + export
Label traces as good or bad. Export labelled data as a fine-tuning dataset.
pythonimport peekr # Rate a trace peekr.feedback(trace_id="a3f2b1c0...", rating="good", note="perfect answer") peekr.feedback(trace_id="b2e4c8f1...", rating="bad", note="hallucinated") # Export good traces as OpenAI fine-tuning data peekr.export_feedback( db_path="traces.db", filter="good", output="training.jsonl", format="openai-ft", # or "raw" )
The openai-ft format produces one JSON object per trace:
training.jsonl{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
A/B experiments
Route traffic between variants and tag each span. Analyse results with SQL — no separate tracking tool needed.
pythonfrom peekr import experiment # List variants — equal split by default @experiment(variants=["control", "test_v2"]) def run_agent(query: str, variant: str): model = "gpt-4o" if variant == "control" else "claude-opus-4-5" return call_llm(model, query) # Dict variants — passes config too @experiment(variants={ "control": {"model": "gpt-4o"}, "test": {"model": "claude-opus-4-5"}, }) def run_agent(query: str, variant: str, variant_config: dict): return call_llm(variant_config["model"], query)
Analyse in SQLite:
sqlSELECT json_extract(attributes,'$.experiment_variant') variant, COUNT(*) runs, AVG(CASE WHEN status='error' THEN 1.0 ELSE 0.0 END) error_rate, AVG(json_extract(attributes,'$.tokens_total')) avg_tokens, AVG(duration_ms) avg_ms FROM spans WHERE json_extract(attributes,'$.experiment_variant') IS NOT NULL GROUP BY variant;
Trace replay
Re-run a stored trace with the same inputs. Useful for reproducing production bugs locally or verifying a fix against a real failure.
pythonfrom peekr.replay import replay_trace # Re-run from SQLite new_trace_id = replay_trace(trace_id="a3f2b1c0...", db_path="traces.db") print(f"New trace: {new_trace_id}") # Re-run from JSONL new_trace_id = replay_trace(trace_id="a3f2b1c0...", jsonl_path="traces.jsonl")
Or use the CLI:
terminalpeekr replay a3f2b1c0 peekr replay a3f2b1c0 --db traces.db peekr replay a3f2b1c0 --jsonl traces.jsonl
Guardrails
Guardrails are synchronous, in-path enforcement rules that run on every LLM span. Two categories:
- Mutating guardrails — run before eval and storage. Redact PII, strip secrets, or log warnings without blocking the response.
- Blocking guardrails — run after storage. Raise
GuardrailErrorto prevent the response from reaching application code.
pythonimport peekr peekr.instrument( guardrails=[ peekr.guard.PIIRedact(), # strip PII before storage peekr.guard.Blocklist( terms=["confidential", "internal only"], action="raise", # block the response ), peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", # redact API keys from traces ), peekr.guard.HallucinationBlock(threshold=0.5), # block ungrounded responses ] )
HallucinationBlock reuses the score from peekr.eval.Hallucination when both are wired — the LLM judge runs only once.
peekr.instrument(
evaluators=[peekr.eval.Hallucination(detailed=True)],
guardrails=[peekr.guard.HallucinationBlock(threshold=0.4)],
)
PIIRedact
Strips personal data from span.attributes["input"] and span.attributes["output"] before any storage exporter runs. Detected categories: email, phone, ssn, credit_card, ip_address.
python# Redact everything (default) peekr.guard.PIIRedact() # Only emails and phones, only from output peekr.guard.PIIRedact( fields=("output",), categories=("email", "phone"), )
Blocklist
Three actions:
"raise"— raisesGuardrailErrorafter the LLM call (post-storage). The violation is audited before the error propagates."redact"— replaces matches with[BLOCKED]before storage (pre-storage). Useful for API keys in prompts."warn"— records toguardrail_warningswithout modifying the span.
python# Block on exact terms peekr.guard.Blocklist(terms=["confidential"], action="raise") # Redact common API key / secret patterns peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ) # Scan only inputs, case-sensitive peekr.guard.Blocklist( terms=["SECRET"], fields=("input",), case_sensitive=True, action="warn", )
HallucinationBlock
Raises GuardrailError when a response scores below the faithfulness threshold. The violation span is always stored before the error propagates — full audit trail guaranteed.
python# Block any response less than 40% grounded peekr.guard.HallucinationBlock(threshold=0.4) # With detailed RAGAS claim breakdown peekr.guard.HallucinationBlock(threshold=0.5, detailed=True)
Handling GuardrailError
pythonfrom peekr.guard import GuardrailError try: response = client.chat.completions.create(...) except GuardrailError as e: print(f"Blocked by {e.guardrail_name}: {e}") # e.span contains the full span that was stored — inspect attributes
OTel receive — enterprise ingest
Enterprise teams with existing OpenTelemetry pipelines can send spans to Peekr Cloud without changing any instrumentation. Just add Peekr as a second exporter target alongside Datadog, Honeycomb, or any other backend. Peekr applies hallucination scoring, compliance guardrails, and the dashboard on top.
architectureYour agents └─ OTel SDK / LangChain / LlamaIndex / Traceloop └─ OTel Collector ├─ Datadog exporter (existing) ├─ Honeycomb exporter (existing) └─ Peekr exporter ←── add this, zero other changes
OpenTelemetry Collector
otel-collector-config.yamlexporters: otlphttp/peekr: endpoint: https://peekr.starkspherelabs.com/otlp headers: Authorization: "Bearer pk_live_…" service: pipelines: traces: exporters: [otlphttp/datadog, otlphttp/peekr]
Grafana Alloy
alloy.configotelcol.exporter.otlphttp "peekr" { client { endpoint = "https://peekr.starkspherelabs.com/otlp" headers = { Authorization = "Bearer pk_live_…" } } }
Python — existing OTel SDK setup
pythonfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter peekr_exporter = OTLPSpanExporter( endpoint="https://peekr.starkspherelabs.com/otlp/v1/traces", headers={"Authorization": "Bearer pk_live_…"}, )
How Peekr reads your spans
Different frameworks use different attribute names for the same thing. Peekr normalises them automatically — no config needed:
| What Peekr needs | Gen AI (OTel standard) | OpenInference (LangChain) | LangSmith legacy |
|---|---|---|---|
| Model name | gen_ai.request.model | llm.model_name | llm_model_name |
| Input tokens | gen_ai.usage.input_tokens | llm.token_count.prompt | token_usage.prompt_tokens |
| Output tokens | gen_ai.usage.output_tokens | llm.token_count.completion | token_usage.completion_tokens |
| Input text | gen_ai.prompt | input.value | — |
| Output text | gen_ai.completion | output.value | — |
| Tenant | tenant.id (resource) | tenant.id | tenant_id |
One thing you need to do
Set a distinct service.name in each service's OTel resource — this is already standard practice. Peekr uses it to separate traffic in the dashboard:
pythonfrom opentelemetry.sdk.resources import Resource resource = Resource.create({"service.name": "my-agent", "tenant.id": "acme"})
What OTel receive cannot do (vs the Peekr SDK)
Hallucination scoring and compliance guardrails require the full input/output text in span attributes. Many OTel instrumentations only capture token counts, not text — those spans will show cost and latency but skip eval. For full coverage, use peekr.instrument() directly. OTel receive is for teams who cannot change their instrumentation.
Peekr Cloud
The OSS SDK runs in your process, writes to local files, and is MIT licensed forever. When a single-process file isn't the right fit any more — multiple services, a team that needs shared dashboards, longer retention, audit-grade trace storage — Peekr Cloud is the managed backend.
Sign up at peekr.starkspherelabs.com — free up to 10k spans/month, no card required. Once you have a pk_live_ key from your project's Settings page:
1 — Install
bashpip install "peekr[openai]" # or anthropic, langchain, crewai, etc.
2 — Instrument
pythonimport peekr peekr.instrument( tenant_id="acme", # your customer's org — optional but recommended exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", # from Settings → API keys ), )
That's the entire integration. instrument() auto-patches whichever LLM SDKs and agent frameworks are installed — every call is captured with zero further changes to your code. Spans batch in the background and appear in the dashboard within 5 seconds.
3 — TypeScript
typescriptimport { instrument } from "@peekr/sdk"; instrument({ exporter: { type: "http", endpoint: "https://peekr.starkspherelabs.com", apiKey: "pk_live_…", }, });
Pricing
| Tier | Spans / month | Price |
|---|---|---|
| Free | 10k | $0 |
| Starter | 500k | $29/mo |
| Pro | 5M | $99/mo |
| Scale | 50M | $399/mo |
FastAPI middleware
peekr.FastAPIMiddleware (alias: PeekrASGIMiddleware) is a pure ASGI middleware that creates a root span for every HTTP request. All child spans — LLM calls, embeddings, @trace decorators — nest under it automatically. This turns a flat list of sibling spans into a proper trace tree.
Without middleware — spans appear as unrelated siblings:
waterfall● openai.embeddings 1.4s ← no parent ● openai.chat.completions 16.4s ← no parent (same trace, but floating)
With middleware — every span is a child of the request:
waterfall● POST /v1/answer 19.3s ← root span, full duration ├─ Embed query 1.4s └─ Generate answer 16.4s · 6.0k tok · 0.20¢
FastAPI — one line
pythonimport peekr from fastapi import FastAPI peekr.instrument(...) # existing call — no changes needed app = FastAPI() app.add_middleware(peekr.FastAPIMiddleware) # ← one new line
Starlette / raw ASGI
pythonfrom peekr import PeekrASGIMiddleware app = PeekrASGIMiddleware(app) # wrap any ASGI app
Options
pythonapp.add_middleware( peekr.FastAPIMiddleware, tenant_header="X-Tenant-Id", # header → span.attributes["tenant_id"] user_header="X-User-Id", # header → span.attributes["user_id"] skip_paths={"/healthz", "/metrics"}, # paths that get no span (default set included) )
| Option | Default | Description |
|---|---|---|
tenant_header | "X-Tenant-Id" | Request header copied to span.attributes["tenant_id"] |
user_header | "X-User-Id" | Request header copied to span.attributes["user_id"] |
skip_paths | {"/healthz", "/health", "/metrics", "/ping"} | Exact paths that skip instrumentation entirely |
Span attributes set by the middleware
| Attribute | Example value |
|---|---|
http.method | "POST" |
http.path | "/v1/answer" |
http.status_code | 200 |
endpoint | "/v1/answer" (FastAPI route pattern when available) |
tenant_id | Value of X-Tenant-Id header |
user_id | Value of X-User-Id header |
Streaming responses
The middleware uses pure ASGI (not BaseHTTPMiddleware), so it handles streaming correctly. The root span closes when the last byte is sent — not when the response headers are sent. This means the span duration for a Server-Sent Events (SSE) endpoint reflects the full time the client is connected and receiving data.
What it does NOT trace automatically
The middleware creates the root span and nests all peekr-patched LLM calls under it. It does not automatically trace:
- Database queries (Supabase, PostgreSQL, Redis) — use
@peekr.trace - HTTP calls to external APIs via httpx/aiohttp — use
@peekr.trace - Cohere, Pinecone, or other non-LLM SDKs — use
@peekr.trace
For those, wrap the function:
pythonfrom peekr import trace, get_current_span @trace(name="db.recall_memories") def recall_memories(query: str, tenant_id: str): span = get_current_span() if span: span.attributes["query_preview"] = query[:80] span.attributes["tenant_id"] = tenant_id return db.rpc("recall_memories_hybrid", {...}).execute()
Guardrails
Guardrails enforce rules on LLM inputs and outputs. Three built-in types — each wires into the exporter pipeline automatically via instrument(guardrails=[...]).
PIIRedact — strip sensitive data before storage
Scans span.attributes["input"] and span.attributes["output"] and replaces PII with redaction tokens before the span is persisted. Runs before the storage exporter so your observability data stays clean.
pythonpeekr.instrument( guardrails=[ peekr.guard.PIIRedact(), # strips email, phone, SSN, card, IP peekr.guard.PIIRedact( fields=("output",), # only scan outputs categories=("email", "phone"), # specific categories only ), ] )
Detected categories: email, phone, ssn, credit_card, ip_address.
Blocklist — block or redact forbidden patterns
Three actions: "raise" (abort the call), "redact" (replace with [BLOCKED]), "warn" (log only). Use Blocklist.COMMON_SECRETS to catch OpenAI, Anthropic, GitHub, and Slack API keys.
pythonpeekr.instrument( guardrails=[ # Redact API keys from stored traces peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ), # Block calls where the input contains forbidden terms peekr.guard.Blocklist( terms=["confidential", "internal only"], action="raise", fields=("input",), # checked PRE-CALL — the API is never invoked ), ] )
Pre-call blocking: Blocklist(action="raise", fields=("input",)) runs before the LLM API call — if the input matches, the call is aborted and an API credit is saved.
HallucinationBlock — enforce faithfulness threshold
Raises GuardrailError (or records a warning) when the hallucination score falls below the threshold. Re-uses the score from EvalExporter if available — no second judge call.
pythonpeekr.instrument( evaluators=[peekr.eval.Hallucination(detailed=True)], guardrails=[ # Block responses below 40% grounded — raises GuardrailError peekr.guard.HallucinationBlock(threshold=0.4), # Warn only — records violation but lets the response through peekr.guard.HallucinationBlock(threshold=0.2, action="warn"), ] )
Full example
pythonimport peekr peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", ), evaluators=[peekr.eval.Hallucination(detailed=True)], guardrails=[ peekr.guard.PIIRedact(), peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ), peekr.guard.Blocklist( terms=["confidential"], action="raise", fields=("input",), ), peekr.guard.HallucinationBlock(threshold=0.3, action="warn"), ] )
Pipeline order (load-bearing): PIIRedact → EvalExporter → storage → HallucinationBlock. PII is stripped before evaluation. Violations are persisted before any GuardrailError propagates.
Trace naming
By default, every chat completion span shows as "LLM call". Peekr infers a purpose label from span attributes — but you can also set it explicitly for precise control.
Option 1 — Set the feature attribute (recommended)
pythonfrom peekr import trace, get_current_span @trace(name="llm.generate_answer") async def generate_answer(query: str) -> str: span = get_current_span() if span: span.attributes["feature"] = "generate_answer" span.attributes["query_preview"] = query[:80] ... # LLM call inside here becomes a child span
The waterfall shows: Generate answer (gpt-4o-mini) · 4.4k instead of LLM call.
Built-in feature → label mappings:
| feature value | Displayed as |
|---|---|
generate_answer | Generate answer |
generate_structured | Structured output |
goal_copilot | Goal suggestions |
entity_extraction | Entity extraction |
classify | Classify |
summarize | Summarise |
recall | Memory recall |
remember | Store memory |
Option 2 — Automatic inference (no code changes)
Peekr reads the system prompt and applies pattern matching. Works for any app with no instrumentation required:
| System prompt contains | Displayed as |
|---|---|
| "RAGAS Faithfulness" / "strict fact-checker" | Hallucination eval |
| "memories listed below" / "cited sources" | Generate answer |
| "JSON schema" / "structured output" | Structured output |
| "extract entit-" | Entity extraction |
| "summari-" | Summarise |
Option 3 — Custom label rules (Peekr Cloud)
In the dashboard under Settings → Advanced, define per-project rules that map prompt patterns, feature names, or span names to your own labels:
| Match field | Pattern | Label |
|---|---|---|
| prompt | You are a support agent for Acme | Acme Support Reply |
| feature | onboarding_flow | Onboarding Q&A |
| span_name | llm.generate_answer | RAG Answer |
Rules are applied in priority order before built-in inference. No SDK update required.
Compliance guardrails (Cloud)
Peekr Cloud Pro includes industry-specific compliance packs — patterns and required disclosures are maintained server-side and fetched at instrument() time. Rules update when regulations change without requiring an SDK upgrade.
pythonimport peekr peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", ), compliance=["FDCPA", "HIPAA"], # fetched from Cloud; enforced locally )
Available packs:
| Pack | Regulation | Blocks / warns |
|---|---|---|
FDCPA | Fair Debt Collection Practices Act | Unauthorized fee waivers, missing mini-Miranda, threats |
HIPAA | HIPAA Privacy Rule + FDA | PHI in output, diagnosis as fact, prescribing AI |
FINRA | FINRA Rule 2111 / SEC Reg BI | Specific investment recommendations, guaranteed returns |
FAIR_HOUSING | Fair Housing Act / RESPA | Demographic steering, coded neighborhood language |
EEOC_ADA | ADA / GINA / Title VII | Pre-offer disability, genetic, pregnancy inquiries |
UPL | Unauthorized Practice of Law | Specific legal strategy, outcome predictions |
TCPA | Telephone Consumer Protection Act | Missing AI identity disclosure, no opt-out mechanism |
GDPR | GDPR Art. 22 (EU) | Missing automated-decision disclosure |
EU_AI_ACT | EU AI Act Art. 50 | Missing chatbot AI identity disclosure |
TILA_ECOA | Truth in Lending / Equal Credit | Guaranteed approval, discriminatory basis, missing APR |
| — UAE / MENA — | ||
UAE_PDPL | UAE Personal Data Protection Law | Consent violations, indefinite retention, sensitive data without consent |
UAE_DIFC | DIFC Data Protection Law 2020 | Missing automated decision disclosure, no human review option |
UAE_ADGM | ADGM Data Protection Regulations | Automated decision disclosure, cross-border transfer without safeguards |
UAE_CBUAE | Central Bank UAE Consumer Standards | Guaranteed returns, unauthorized fee changes, missing financial disclaimer |
UAE_DHA | Dubai Health Authority — Health Data | Clinical diagnosis claims, prescribing AI, patient data without consent |
UAE_RERA | Dubai RERA — Real Estate | Guaranteed property returns, demographic steering, unregistered property |
KSA_PDPL | Saudi Arabia PDPL | Cross-border transfer without SDAIA approval, sensitive data consent, automated decision disclosure |
UAE / MENA Compliance Packs
UAE and Saudi Arabia are building major AI hubs (Dubai AI Roadmap 2031, ADGM AI Framework, NEOM) while simultaneously introducing data protection and sector-specific AI regulations. Peekr's MENA packs are modelled on the actual regulatory frameworks but should be reviewed by local legal counsel before production deployment.
pythonpeekr.instrument( exporter=peekr.HTTPExporter(endpoint="...", api_key="pk_live_…"), compliance=["UAE_PDPL", "UAE_CBUAE"], # mix UAE packs freely )
UAE PDPL — Federal Data Protection
Federal Decree-Law No. 45 of 2021. Applies across UAE outside DIFC and ADGM free zones. Covers any AI agent that processes personal data of UAE residents.
Prohibited outputs:
- Selling or sharing personal data without consent
- Retaining data indefinitely without a defined period
- Processing sensitive data (health, biometric, financial, religious, political) without explicit consent
Required disclosures:
This is an automated system.You have the right to access your personal data.
Penalties: Up to AED 1,000,000 per violation. Criminal liability for intentional breaches.
UAE DIFC — Dubai International Financial Centre
DIFC Data Protection Law 2020, Arts. 36-38. Applies to all DIFC-licensed entities. AI making automated decisions with legal or significant effects must provide disclosure and human review rights.
Required disclosures:
This decision involved automated processing.You have the right to request human review.You are interacting with an AI.
Prohibited: Claiming an automated decision is final or irreversible without offering human review. Processing sensitive data without explicit consent.
Penalties: DIFC Commissioner can issue enforcement notices; fines up to USD 100,000 per violation for serious breaches.
UAE ADGM — Abu Dhabi Global Market
ADGM Data Protection Regulations. GDPR-equivalent framework for all ADGM-registered entities. Closely mirrors GDPR Art. 22 automated decision requirements.
Required disclosures: Same as DIFC — automated processing disclosure, right to object, AI identity. Cross-border transfer requires adequacy or safeguards equivalent to GDPR SCCs.
Penalties: FSRA enforcement; fines aligned with DIFC scale.
UAE CBUAE — Central Bank Consumer Finance
Central Bank of UAE Consumer Protection Standards. Required for AI agents in banking, lending, investment, and financial advice operating in UAE.
Prohibited outputs:
- Guaranteed returns, risk-free investments, certain profit predictions
- Specific AED/dirham return amounts without disclaimer
- Unauthorized fee or charge modifications ("I can waive that fee")
- Omitting documentation or verification requirements
Required disclosures:
This is general information and not financial advice.Past performance does not guarantee future results.
Penalties: CBUAE enforcement action; fines up to AED 3,000,000 for consumer protection violations.
UAE DHA — Health Data
Dubai Health Authority Digital Health Strategy + UAE health data regulations. Similar principles to HIPAA but UAE-specific. Required for any AI handling patient data or providing health information in UAE.
Prohibited outputs:
- Stating a clinical diagnosis as fact
- Claims that the AI can diagnose, treat, or cure disease
- Medication prescribing or dosage recommendations
- Sharing patient data without consent
Required disclosures: This is not medical advice. + Consult a licensed healthcare professional.
Penalties: DHA regulatory action; criminal liability for unlicensed medical practice.
UAE RERA — Real Estate
Dubai Real Estate Regulatory Agency advertising and consumer protection rules. Required for property search, investment, and rental AI agents operating in Dubai.
Prohibited outputs:
- Guaranteed property investment returns or price predictions stated as certain
- Demographic steering language in property descriptions
- Advertising unregistered or off-plan projects without DLD/RERA registration reference
Required disclosures: Prices are indicative; not a binding offer.
KSA PDPL — Saudi Arabia Data Protection
Saudi Arabia Personal Data Protection Law (Royal Decree M/19, 2021; updated 2023, enforced September 2023). Applies to any AI processing data of Saudi residents regardless of where the processor is located.
Prohibited outputs:
- Cross-border data transfer without SDAIA (Saudi Data and AI Authority) approval
- Processing sensitive data (health, biometric, genetic, financial, religious, criminal) without explicit consent
- Retaining data beyond the stated purpose period
- Claiming an automated decision cannot be reviewed or appealed
Required disclosures:
Your data is processed in accordance with the Saudi Arabia Personal Data Protection Law.You have the right to access, correct, and request deletion of your personal data.This decision involved automated processing.
Penalties: Up to SAR 5,000,000 (≈USD 1.3M) per violation. Criminal liability for intentional violations. SDAIA has active enforcement from 2023.
Enable packs in the dashboard under Compliance. Each pack can be set to raise (block the response) or warn (record violation, allow through).
FDCPA — Debt Collection
Fair Debt Collection Practices Act + CFPB Regulation F. Required for any AI agent that communicates with debtors.
Prohibited outputs (regex-matched):
we (will|are going to) (sue|arrest) you— false threat of legal action or arrest (§1692e)we can (waive|remove|forgive) (the|this) (fee|debt|balance)— unauthorized fee modification (§1692f)(government|federal|official) (agency|collector)— misrepresentation as government entity (§1692e)(legal action|lawsuit|court) (has been|is) (filed|initiated)— false claim of legal action- Abusive language:
stupid|idiot|deadbeat|loser(§1692d)
Required disclosures (must appear in every response):
- Initial contact:
This is an attempt to collect a debt. Any information obtained will be used for that purpose. - All subsequent contacts:
This communication is from a debt collector.
Example violation → User asks "Can you waive my $50 late fee?" AI replies "I can remove that fee for you." → blocked by we can (waive|remove) .* fee + missing mini-Miranda.
Severity: Civil $1,000/violation + attorneys fees. Criminal up to $5,000 + 1 year imprisonment for willful violations.
HIPAA — Healthcare
HIPAA Privacy Rule (45 CFR § 164) + FDA 21 CFR. Required for any AI agent accessing or communicating protected health information.
Prohibited outputs:
(diagnosed with|you have|patient has).*(cancer|HIV|diabetes|depression)— stating diagnosis as factthis (app|tool|assistant) (can|will) (diagnose|treat|cure)— prohibited medical device claim(prescribe|recommend you take).*(mg|dose|medication)— AI prescribing medication\d{3}-\d{2}-\d{4}— SSN in output (automatic PHI violation)FDA[- ]approved.*(this|our)— false clearance claim
Required disclosures:
This is not a diagnosis or medical advice.Consult a licensed healthcare provider.
Example violation → AI says "Based on your symptoms, you have Type 2 diabetes and should take metformin 500mg." → diagnosis + prescribing blocked.
Severity: Civil $100–$50,000/violation. Criminal up to $250,000 + 10 years imprisonment.
FINRA / SEC Reg BI — Financial Advice
FINRA Rule 2111 + SEC Regulation Best Interest. Required for financial advice AI agents serving retail clients.
Prohibited outputs:
you should (buy|sell|hold|invest in).*(stock|fund|ETF|bond|crypto)— specific recommendation without suitability(guaranteed|risk-free|no-risk) (return|investment|profit)— guaranteed returns(will|going to) (go up|increase|return) \d+%— performance predictionbest investment for (you|everyone)— blanket suitability claim
Required disclosures:
This is general information only and not personalized investment advice.Investment recommendations require a suitability assessment.
Example violation → AI says "You should buy NVIDIA stock, it's guaranteed to go up 20% this year." → specific recommendation + guaranteed return both blocked.
Severity: FINRA censure/suspension/bar. SEC civil penalties up to $1M/violation. Criminal: securities fraud up to 20 years.
Fair Housing Act — Real Estate
Fair Housing Act (42 U.S.C. § 3604) + RESPA. Required for property search, rental, and mortgage AI agents.
Prohibited outputs (demographic steering):
(perfect|great|ideal) for (young professionals|families|singles|retirees)— demographic steering(good|safe|nice|quiet) (neighborhood|area|community)— coded language for racial composition(changing|transitional|up-and-coming) (neighborhood|area)— coded steering language(close to|near) (churches|temples|mosques)— religious group steeringyou (would|might) (prefer|fit in) better in— explicit demographic steering
Required disclosure: Equal Housing Opportunity.
Example violation → AI says "This neighborhood is perfect for young families and has great schools." → demographic steering blocked.
Severity: Civil up to $70,000/violation + unlimited punitive. Criminal up to $1M + 1 year.
EEOC / ADA — Employment & HR
ADA + GINA + Title VII. Blocks prohibited pre-employment questions from HR and recruiting AI agents.
Prohibited inputs (checked pre-call — the LLM never receives these):
do you have (a|an) (disability|impairment|medical condition)— pre-offer disability inquiry (ADA)what medications? (are you|do you) (taking|use)— medical inquiry (ADA)(family (medical )?history|genetic|hereditary)— genetic information (GINA)are you (pregnant|planning to (get pregnant|have children))— pregnancy inquiry (Title VII/PWFA)(how old are you|what year were you born)— age inquiry (ADEA)
Required disclosure: We are an equal opportunity employer.
Example violation → Interview bot asks "Do you have any disabilities we should know about?" → blocked pre-call, LLM never invoked, no API cost.
Severity: EEOC compensatory + punitive up to $300,000. Pattern/practice: unlimited.
UPL — Unauthorized Practice of Law
State bar rules (ABA Model Rule 5.5). Prevents legal AI assistants from providing specific legal advice.
Prohibited outputs:
you should (file|sue|countersue|appeal|plead)— specific legal strategy(your case|you) (will|should) (win|prevail|succeed)— outcome prediction(you have|you don't have) (a valid claim|grounds|standing)— legal standing opinion(sign|don't sign) (this|the) (contract|agreement)— advice on specific document(you are|you're) (liable|not liable|guilty|not guilty)— legal liability conclusionI am (acting as|serving as) your (attorney|lawyer)— false attorney claim
Required disclosures: This is not legal advice. and Consult a licensed attorney.
Severity: Criminal misdemeanor to felony by state. Unlimited civil liability for damages caused by reliance.
TCPA — Voice AI & Messaging
Telephone Consumer Protection Act + FCC 2024 AI Voice Ruling. Required for outbound voice AI and SMS marketing agents.
Required disclosures (must appear in every interaction):
This is an AI— identity disclosure required at call/chat startTo stop receiving+reply STOP— opt-out mechanism required in marketing messages
Prohibited outputs:
your previous purchase (means|gives us permission) to (call|text)— misrepresentation of EBR consent
Severity: $500/call (negligent), $1,500/call (willful). No cap — a 10,000-call campaign = up to $15M exposure. Class actions extremely common.
GDPR — Automated Decisions (EU)
GDPR Art. 22. Required disclosure for any AI making automated decisions with legal or significant effects on EU residents.
Required disclosures:
This decision was made using automated processing.You have the right to request human review.
Prohibited outputs:
we (keep|store|retain) your data (forever|indefinitely)— retention without defined period(your data|this conversation) (is not|isn't|won't be) (stored|saved|logged)— false data minimisation claim
Severity: Up to €20M or 4% global annual revenue (whichever is higher).
EU AI Act — Chatbot Identity
EU AI Act Art. 50, effective February 2, 2025. All EU-facing chatbots must disclose AI nature. AI-generated synthetic media must be labeled.
Required disclosures:
You are interacting with an AI— must appear at the start of every interactionThis content was AI-generated— required on synthetic media
Severity: Prohibited practices: €35M or 7% global turnover. Chatbot non-compliance: €15M or 3%.
TILA / ECOA — Banking & Lending
Truth in Lending Act + Equal Credit Opportunity Act + Fair Lending. Required for credit, mortgage, and lending AI agents.
Prohibited outputs:
(guaranteed|definitely|certainly) (approved|qualify) for (a|this) (loan|credit|mortgage)— guaranteed approval before underwritingno (credit check|income verification|documentation) required— material omission about underwriting(because you are|since you're).*(woman|Black|Hispanic|disabled)— explicit discriminatory lending basis (ECOA)(we don't lend|not available).*(neighborhood|zip code|that area)— potential redlining
Required disclosures:
Annual Percentage Rate— must appear in any credit offerWe do not discriminate on the basis of race— equal credit opportunity statement
Example violation → AI says "You're definitely approved — no credit check needed!" → guaranteed approval + material omission both blocked.
Severity: TILA criminal $5,000 + 1 year. ECOA $10,000/violation + punitive. Fair lending: DOJ referral, unlimited compensatory.