Documentation

Peekr records every LLM call, tool invocation, token count, and error as a tree you can inspect. This page covers everything from installation to advanced usage.

On this page
Getting started
Installation Quickstart

Installation

terminal
pip install peekr # base — no LLM SDK required pip install "peekr[openai]" # with OpenAI pip install "peekr[anthropic]" # with Anthropic pip install "peekr[all]" # both

Requires Python 3.9+. No accounts, no backend, no environment variables.

Quickstart

Script / standalone agent — 2 lines

Add these two lines at the very top of your entrypoint, before any other imports that load an LLM SDK:

agent.py
import peekr peekr.instrument() # everything below is unchanged import openai response = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}] )

Every LLM call is now traced automatically. Peekr writes to traces.jsonl and prints to the console.

terminal
peekr view traces.jsonl # tree view peekr view --io traces.jsonl # include inputs and outputs

FastAPI / Starlette service — 3 lines

If your agent runs inside an HTTP server, add the middleware after creating the app. This creates a root span for every request so all LLM calls appear nested under the route that triggered them:

main.py
import peekr peekr.instrument() # line 1 from fastapi import FastAPI app = FastAPI() app.add_middleware(peekr.FastAPIMiddleware) # line 2 # optional — ship traces to Peekr Cloud peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", # line 3 ), )

Waterfall with middleware:

output
● POST /v1/answer 19.3s ← root — full request duration ├─ Embed query 1.4s └─ Generate answer 16.4s · 6.0k tok · 0.20¢

Without middleware, the same spans appear as a flat list with no parent. See FastAPI middleware for all options.

Example: Debug a wrong answer

Your agent returns an incorrect response and you don't know why. Add @trace to your tool functions and run with --io:

agent.py
import peekr peekr.instrument() from peekr import trace @trace def fetch_user(user_id: int) -> dict: return db.get(user_id) # returns None if not found @trace(name="agent.run") def run(user_id: int): user = fetch_user(user_id) # bug: no null check before passing to LLM return openai.chat.completions.create( model="gpt-4o", messages=[{"role": "system", "content": f"User: {user}"}] )
peekr view --io traces.jsonl
agent.run 2100ms └─ tool.fetch_user 12ms in: {"args": [42], "kwargs": {}} out: null ← found it └─ openai.chat.completions [gpt-4o] 2088ms in: [{"role": "system", "content": "User: null..."}]

The LLM received null as the user object. The fix is a null check in run(), not a prompt change.

Example: Find slow steps

Wrap every step in your agent with @trace and look at the durations:

agent.py
from peekr import trace @trace def search_web(query: str) -> list: ... @trace def rerank_results(results: list) -> list: ... @trace(name="agent.run") def run(query: str): results = search_web(query) ranked = rerank_results(results) return openai.chat.completions.create(...)
peekr view traces.jsonl
agent.run 4300ms └─ tool.search_web 3800ms ← 88% of time └─ tool.rerank_results 18ms └─ openai.chat 490ms

Cache search_web results for repeated queries, or run it in parallel with other setup work. The LLM is not the bottleneck.

Example: Reduce token costs

Run your agent a few times on the same task and compare token counts across traces:

terminal
peekr view traces.jsonl
output
Trace a3f2b1c0 18,432 tokens Trace b2e4c8f1 21,104 tokens Trace c5d9e2a7 24,891 tokens

Token count growing each run is the signature of unbounded history — the agent appends every message to the next call. Fix: summarize or truncate the conversation after a fixed number of turns.

agent.py
# Before: growing history messages = conversation_history # gets longer every turn # After: summarize after 5 turns if len(conversation_history) > 10: # 5 exchanges = 10 messages summary = summarize(conversation_history) messages = [{"role": "system", "content": summary}]

Example: Prod vs local bugs

Your agent passes tests locally but fails in production. Capture traces in both environments and compare tool outputs:

agent.py
@trace def fetch_inventory(sku: str) -> list: return inventory_api.get(sku)
local trace
tool.fetch_inventory 8ms in: {"sku": "ABC-123"} out: [{"id": 1, "qty": 42}] ← data present locally
prod trace
tool.fetch_inventory 8ms in: {"sku": "ABC-123"} out: [] ← empty in prod

The agent logic is identical. The inventory API returns different data in prod — likely a missing data migration or environment-specific config. Fix the data source, not the agent.

instrument()

Call once at startup — before any LLM SDK imports or client instantiation. Patches OpenAI (sync + async, chat + embeddings), Anthropic, AWS Bedrock, and Google Gemini at the class level so every client instance is covered automatically.

Call order matters. peekr.instrument() must run before any LLM client is imported or created. The safest place is immediately after load_dotenv() / env setup, before any other application imports:
# ✓ correct — instrument before importing anything that touches an LLM SDK
from dotenv import load_dotenv
load_dotenv()

import peekr
peekr.instrument(...)   # ← before everything else

from myapp.routes import answer, recall   # ← these import openai, anthropic, etc.
from myapp.llm import client               # ← also fine — patched at class level
# ✗ wrong — openai is imported and a client created before instrument() runs
from openai import OpenAI
client = OpenAI()          # module-level singleton, already created

import peekr
peekr.instrument(...)   # too late for this specific instance if lru_cache is used
Why? peekr patches at the class level, so new instances created after instrument() are always captured. Pre-existing instances also work because Python resolves methods on the class at call time — not at instantiation. The one exception is @lru_cache or module-level singletons whose internal state bypasses the patched method: calling instrument() first avoids this entirely.
python
peekr.instrument( console=True, # print spans live (default: True) storage="jsonl", # "jsonl" | "sqlite" | "both" jsonl_path="traces.jsonl", # JSONL output path db_path="traces.db", # SQLite output path )
ParameterTypeDefaultDescription
consoleboolTruePrint each span to stdout as it completes
storagestr"jsonl""jsonl", "sqlite", or "both"
jsonl_pathstr"traces.jsonl"Path for JSONL output
db_pathstr"traces.db"Path for SQLite output

SQLite storage

SQLite uses WAL mode so multiple processes — Docker containers, CI workers, parallel agents — can write spans safely at the same time. And because it's a real database, you can query across all your runs without any extra tooling.

python
# Enable SQLite peekr.instrument(storage="sqlite") # Write to both JSONL and SQLite peekr.instrument(storage="both")

View with the same CLI command:

terminal
peekr view traces.db peekr view --io traces.db

Or query directly with any SQLite client:

terminal — useful queries
# Slowest tool calls sqlite3 traces.db "SELECT name, ROUND(AVG(duration_ms)) avg_ms FROM spans GROUP BY name ORDER BY avg_ms DESC" # Token spend by model sqlite3 traces.db "SELECT json_extract(attributes,'$.model') model, SUM(json_extract(attributes,'$.tokens_total')) tokens FROM spans GROUP BY model" # All errors sqlite3 traces.db "SELECT name, trace_id, json_extract(attributes,'$.error') msg FROM spans WHERE status = 'error'" # Token growth over time (detect unbounded history) sqlite3 traces.db "SELECT trace_id, SUM(json_extract(attributes,'$.tokens_total')) total FROM spans GROUP BY trace_id ORDER BY start_time"
SQLite is ideal for Docker and CI where multiple processes share a single file. JSONL is better for quick local debugging where you want to grep or tail -f.

@trace decorator

Wraps a function as a span. Works on sync and async functions.

python
from peekr import trace # Auto-names from module.function @trace def search_web(query: str) -> list: ... # Custom name @trace(name="tool.search") def search(query: str) -> list: ... # Opt out of capturing inputs/outputs (latency + status still recorded) @trace(capture_io=False) def fetch_api_key() -> str: ... # Async @trace async def fetch_user(user_id: int) -> dict: ...
ParameterTypeDefaultDescription
namestr | Nonemodule.functionCustom span name
capture_ioboolTrueRecord function args and return value
Inputs and outputs are serialized to JSON and truncated at 500 characters. Use capture_io=False for functions that handle secrets or large payloads.

Manual spans

For cases where a decorator doesn't fit — e.g. a loop, a context manager, or code you can't modify:

python
from peekr import start_span, end_span span, token = start_span("my.operation") span.attributes["custom_key"] = "value" try: result = do_work() span.status = "ok" except Exception as e: span.status = "error" span.attributes["error"] = str(e) raise finally: end_span(span, token) # always call — even on error
Any spans started inside do_work() will automatically nest as children of this span.

CLI viewer

terminal
peekr view traces.jsonl # tree view peekr view --io traces.jsonl # + inputs and outputs

Each trace is shown as a tree grouped by trace_id. The --io flag prints up to 120 characters of the serialized input and output for each span.

Custom exporters

Any object with an export(span) method works as an exporter:

python
from peekr.exporters import add_exporter class HttpExporter: def export(self, span): requests.post( "https://your-backend.com/spans", json=span.to_dict() ) peekr.instrument() add_exporter(HttpExporter())

Multiple exporters can be active at once. The built-in JSONLExporter and ConsoleExporter are added by instrument(). You can add your own on top.

Span fields

Every span written to traces.jsonl is a JSON object with these fields:

FieldTypeDescription
trace_idstringGroups all spans in one agent run
span_idstringUnique ID for this span
parent_idstring | nullID of the parent span, or null for root
namestringSpan name
start_timefloatUnix timestamp
end_timefloatUnix timestamp
duration_msfloatWall-clock duration in milliseconds
status"ok" | "error"Whether the span succeeded
tenant_idstring | nullCustomer org (B2B). First-class — top-level column in SQLite, top-level key in JSONL. Set via peekr.session(tenant_id=...), instrument(tenant_id=...), or env PEEKR_TENANT_ID.
retention_classstring | nullStorage-tier hint (e.g. "default", "short", "long", "pii"). OSS stores it; storage tier interprets it.
attributes.modelstringLLM model name (auto-captured)
attributes.tokens_inputintPrompt tokens (auto-captured)
attributes.tokens_outputintCompletion tokens (auto-captured)
attributes.tokens_totalintTotal tokens (auto-captured)
attributes.inputstringSerialized function args (truncated)
attributes.outputstringSerialized return value (truncated)
attributes.errorstringException message if status is "error"
attributes.session_idstringSet when span is inside a peekr.session()
attributes.user_idstringSet when span is inside a peekr.session(user_id=...)
attributes.eval_scoresdictEvaluator name → score (0.0–1.0) when evaluators are configured
attributes.experiment_variantstringVariant name when inside a @peekr.experiment

Sessions

Group all spans for a user, tenant, or conversation by passing identifiers to peekr.session(). Uses ContextVar so it propagates correctly across async code.

python
import peekr with peekr.session( user_id="user_123", # end-user (B2C) tenant_id="acme", # customer org (B2B) session_id="sess_abc", # auto-generated if omitted retention_class="long", # storage-tier hint ): run_agent()

tenant_id and retention_class are first-class columns on the span — see Multi-tenant traces.

Query by user in SQLite:

sql
SELECT trace_id, AVG(duration_ms), SUM(json_extract(attributes,'$.tokens_total')) FROM spans WHERE json_extract(attributes,'$.user_id') = 'user_123' GROUP BY trace_id;

Multi-tenant traces

Every span carries two first-class fields — tenant_id (the customer org) and retention_class (a storage-tier hint) — separate from user_id (the end-user). A B2B agent can tag both without conflict.

python
import peekr peekr.instrument(tenant_id="acme", retention_class="default") with peekr.session(user_id="alice", tenant_id="acme", retention_class="long"): run_agent()

Resolution order, highest priority first:

  1. peekr.session(tenant_id=..., retention_class=...)
  2. peekr.instrument(tenant_id=..., retention_class=...)
  3. Env vars PEEKR_TENANT_ID / PEEKR_RETENTION_CLASS

Both fields are top-level columns in SQLite (with indices) and top-level keys in JSONL — no json_extract needed:

sql
SELECT tenant_id, COUNT(*) FROM spans GROUP BY tenant_id; SELECT * FROM spans WHERE retention_class = 'long' AND start_time > ?;

retention_class is a free-form string in the OSS SDK — recommended values are default, short, long, and pii. The meaning of each is enforced by your storage tier (or by Peekr Cloud).

Why first-class instead of attributes.tenant_id? So you can filter and index without JSON extraction — relevant the moment you have more than a handful of tenants or want to route ingestion. The SQLite exporter migrates pre-v0.3 databases automatically via PRAGMA user_version; legacy rows back-fill as NULL.

Alerts

Alerts fire after each complete trace (identified by the root span). Pass them to instrument():

python
peekr.instrument(alerts=[ peekr.alert.ErrorRate(threshold=0.05, window=100), # >5% errors in last 100 traces peekr.alert.CostSpike(multiplier=2.0), # tokens 2× above rolling avg peekr.alert.LatencyP95(ms=5000), # p95 latency > 5s peekr.alert.TokenGrowth(runs=5), # growing 5 consecutive runs ])

Override on_trigger to send to Slack, PagerDuty, or anywhere:

python
class SlackAlert(peekr.alert.ErrorRate): def on_trigger(self, message: str) -> None: slack.send(f"#alerts: {message}") peekr.instrument(alerts=[SlackAlert(threshold=0.05)])
AlertTriggers whenKey params
ErrorRateError % in last N traces > thresholdthreshold, window=100
CostSpikeThis trace tokens > multiplier × rolling avgmultiplier, window=50
LatencyP95p95 span latency in trace > msms
TokenGrowthToken count strictly increasing for N runsruns=5

Eval — LLM-as-judge

Evaluators run after each LLM span completes and write scores to span.attributes["eval_scores"]. A _in_eval guard prevents infinite recursion.

python
peekr.instrument(evaluators=[ peekr.eval.Rubric("Be concise and factually accurate"), peekr.eval.Hallucination(), # groundedness check (see below) peekr.eval.NotEmpty(), # output must be non-empty peekr.eval.NoError(), # span must have status=ok ])

Scores are written to span.attributes["eval_scores"] as a {evaluator_name: float} dict and shown inline by peekr view --io:

peekr view --io traces.jsonl
openai.chat.completions [gpt-4o] 843ms 312tok in: "Summarise this doc..." out: "The doc argues that..." eval_scores: {Rubric: 0.92, Hallucination: 0.95, NotEmpty: 1.0, NoError: 1.0}

Query scores in SQLite:

sql
SELECT name, AVG(json_extract(attributes,'$.eval_scores.Rubric')) rubric_avg, AVG(json_extract(attributes,'$.eval_scores.Hallucination')) hallucination_avg FROM spans WHERE json_extract(attributes,'$.eval_scores') IS NOT NULL GROUP BY name;

Write your own evaluator:

python
from peekr.eval import BaseEvaluator class LengthCheck(BaseEvaluator): def evaluate(self, span) -> float: output = span.attributes.get("output", "") return 1.0 if len(output) < 500 else 0.0
EvaluatorWhat it checksRequires
Rubric(criteria)LLM scores output against your criteria (0.0–1.0)openai or anthropic SDK
Hallucination()Fraction of claims grounded in the input/context (0.0–1.0)openai or anthropic SDK
NotEmpty()Output attribute is non-empty stringNothing
NoError()Span status is "ok"Nothing

Hallucination detection

The Hallucination evaluator scores how well an LLM output is supported by its input context. It uses an LLM-as-judge under the hood — the same fallback pattern as Rubric (OpenAI first, then Anthropic).

ScoreMeaning
1.0Every factual claim in the output is supported by the context
0.0No claim is supported — the output is fully hallucinated
betweenThe fraction of claims grounded in the context

Plug it in like any other evaluator:

python
import peekr peekr.instrument(evaluators=[peekr.eval.Hallucination()]) import openai openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "The Eiffel Tower was completed in 1889 in Paris."}, {"role": "user", "content": "When was the Eiffel Tower built and by whom?"}, ], )
peekr view --io traces.jsonl
openai.chat.completions [gpt-4o] 843ms in: [{"role": "system", "content": "The Eiffel Tower was completed in 1889 in Paris."}, ...] out: "The Eiffel Tower was built in 1923 by Frank Lloyd Wright." eval_scores: {Hallucination: 0.0} ← invented year and architect

RAG flows: point it at retrieved documents

By default the evaluator uses the span's input (the messages sent to the LLM) as the grounding context. For RAG flows where the source documents live elsewhere — say, on a parent tool span — pass a context_extractor:

python
peekr.instrument(evaluators=[ peekr.eval.Hallucination( context_extractor=lambda span: span.attributes.get("retrieved_docs", ""), model="gpt-4o-mini", # optional — defaults to gpt-4o-mini / claude-haiku ), ])
Spans with empty output or no available context return 1.0 (nothing to judge), so non-RAG spans don't drag down your average. The judge LLM call is cheap (max 10 output tokens) and runs after the original span completes — it never blocks the main request.

Find your worst hallucinations

sql
-- Lowest-scoring outputs across all runs SELECT trace_id, json_extract(attributes,'$.eval_scores.Hallucination') AS hallucination, json_extract(attributes,'$.output') AS output FROM spans WHERE hallucination IS NOT NULL AND hallucination < 0.5 ORDER BY hallucination ASC, start_time DESC LIMIT 20; -- Hallucination rate by model SELECT json_extract(attributes,'$.model') model, AVG(json_extract(attributes,'$.eval_scores.Hallucination')) avg_groundedness, COUNT(*) runs FROM spans WHERE json_extract(attributes,'$.eval_scores.Hallucination') IS NOT NULL GROUP BY model ORDER BY avg_groundedness ASC;
Hallucination scoring uses an LLM judge, which is itself imperfect and costs tokens. Treat it as a useful smoke alarm — a sudden drop in average groundedness is a strong signal — not as ground truth for any single trace.

Detailed mode — RAGAS-style claim decomposition

The default mode returns a single score. detailed=True switches to a RAGAS Faithfulness-style pipeline: the judge first decomposes the output into atomic factual claims, then assigns each claim one of three verdicts:

VerdictMeaning
supportedClaim is directly entailed by CONTEXT
contradictedClaim directly conflicts with CONTEXT
unsupportedCONTEXT is silent about the claim

The score becomes supported_count / total_claims and the full breakdown lands on the span at attributes.hallucination_details:

python
peekr.instrument(evaluators=[peekr.eval.Hallucination(detailed=True)])
span.attributes.hallucination_details
{ "total": 3, "supported": 1, "contradicted": 2, "unsupported": 0, "score": 0.33, "claims": [ {"text": "The Eiffel Tower is in Paris", "verdict": "supported"}, {"text": "It was built in 1923", "verdict": "contradicted"}, {"text": "It was designed by Frank Lloyd Wright", "verdict": "contradicted"} ] }

This is what powers the drift dashboard's drill-down — you can see exactly which claims the model invented, not just an average score.

Detailed mode uses one judge call per span (just with more output tokens — JSON, capped at 800). Use simple mode for cheap continuous monitoring across many traces, and detailed mode for the spans you want to investigate. You can switch by re-running with a different evaluators= list.

Observability dashboard

Generate a self-contained, tabbed HTML report from your traces. Designed as a drop-in observability layer for any RAG or memory/agent pipeline — open the file in any browser, no server, no build step.

terminal
peekr dashboard traces.db -o report.html # SQLite peekr dashboard traces.jsonl # JSONL → ./dashboard.html open report.html

Five tabs, one URL

The dashboard is organised so a non-technical observer can stay on the Overview tab and still get the gist, while an engineer can drill into Traces / Quality / Diagnose for specifics. Tab state is in the URL hash so links are shareable. A persistent filter bar at the top applies across every tab.

TabForContents
#overviewFirst-impression / execHealth hero (0–100), narrative bullets, 4 metric cards with sparklines, top 3 action items pulled from the diagnostic engine.
#traces"Find me that call"Search box (trace ID, model, content, error), sortable table, click any row → side panel with full context vs answer, claim verdicts, citations, per-call action items.
#qualityTrend monitoringRolling chart with warning (0.7) / critical (0.5) threshold lines, score distribution histogram, channel × time heatmap, claim-verdict doughnut, citation panel.
#diagnoseIncident response"Likely causes & next steps" with severity-tagged cards and numbered fix lists, plus the full worst-offenders panel with side-by-side highlighted context vs answer.
#helpFirst-time setupSetup checklist (auto-ticks live), glossary, evaluator configuration snippets, troubleshooting, keyboard shortcuts.

Keyboard shortcuts

KeyAction
15Switch tabs
/Jump to Traces tab and focus the search box
RClear all filters
EscClose the trace detail panel

Filter bar

One persistent bar at the top of every tab. Click any chip to toggle that filter; every panel on every tab refilters immediately. The time-range chips include 5m, 15m, 30m, 1h, 24h, 7d, 30d presets plus a Custom… option with datetime-local from/to inputs. The "When = Custom" mode seeds itself to "last 1h up to the newest timestamp" so first activation isn't empty.

Panels at a glance

PanelWhat it showsHow to act on it
Health heroOne 0–100 score with a coloured dot (green/yellow/orange/red), tier label, count of flagged calls, and Δ vs baseline.Red → open the recommendations panel below.
What's happening3–5 plain-English bullets summarising the situation: drift, worst channel, citation invention rate, error count.Read top-to-bottom; the highest-priority finding is first.
Filter chipsTenant · Model · Endpoint · Time range. Stack to drill in.Click chips to refilter every panel. Click again to clear.
Metric cardsHallucination · Rubric · Citations · Errors. Each with sparkline, Δ vs baseline, count of scored calls, and an action hint.The hint at the bottom tells you the next step (e.g. "30 flagged — review worst offenders below").
Likely causes & next stepsDiagnostic engine — runs eight pattern-detection rules and surfaces ranked recommendations with cause + numbered fix list.Each card has a severity badge and a "what to try" list specific to that pattern.
Score over timeRolling 20-call mean of every evaluator, with dashed warning (0.7) and critical (0.5) threshold lines.Hover for trace details; click a point to jump to its worst-offender card.
Failure breakdown heatmapChannel × time grid. Rows = your models/tenants/endpoints. Columns = time buckets. Colour = mean Hallucination.Red rows tell you which channel is failing; rows that go green → red tell you when. Click a cell to filter.
Worst offenders12 lowest-scoring calls. Side-by-side context vs answer with contradicted claims highlighted, claim verdicts, citation list.Each card ends with a "What to try for this call" box prescribing fixes specific to that span's failure pattern.

Diagnostic rules

The recommendations panel inspects the filtered rows and emits cards from eight pattern-detection rules. Each card has a severity (high / medium / low / info / good), a plausible cause in plain English, and a numbered list of concrete fixes.

PatternTriggers whenSample recommendation
Invented citations> 30% of detected citation patterns aren't in contextTighten prompt; verify citations post-hoc; try hybrid retrieval
High contradiction rate> 20% of judged claims directly contradict contextStrengthen system prompt; move context closer to question; reduce max_tokens
Out-of-context elaboration> 25% unsupported claims with low contradictionAdd refusal instruction; check retrieval recall; coverage prompt
Channel concentration> 50% of flagged calls share one model/tenant/endpointDiff deploys; compare prompts; verify index coverage for that channel
Hallucination driftΔ vs baseline < −0.1Use heatmap to localise; cross-reference deploys; use peekr replay
Error spikes> 5% of calls have status="error"Check rate limits; verify fallback model quality; add retries
Citations all grounded≥ 5 citations, 0 inventedAdd an alert on citation invention rate to catch future regressions
HealthyNo patterns triggeredSet up peekr.alert.ScoreFloor; run the offline benchmark periodically

Per-span action items

Every worst-offender card ends with a tailored "What to try for this call" panel — separate from the aggregate recommendations. It inspects that one span's claims, citations, and context to suggest fixes targeted to its specific failure pattern:

Detected on this spanWhat the action box suggests
Empty / short context but long answerRetrieval miss — inspect what your retriever returned
Invented URLs / arXiv / DOIs / titlesPer-kind prompt fix + post-hoc citation verification
Contradicted numbers / dates"Copy numerics verbatim" instruction; temperature=0
Contradicted proper nounsExplicit "don't substitute names" instruction
Mostly unsupported claims, no contradictionsAdd refusal: "say I don't know if not in context"
Mostly contradicted claimsMove context closer to question; "context wins" instruction
Low score but no detailed claimsEnable Hallucination(detailed=True) to see what failed
Output much longer than contextReduce max_tokens; long completions drift

Tag spans for the channel breakdown

The heatmap groups by attributes.model (set automatically by the patches), attributes.user_id (set via peekr.session(user_id=...)), and attributes.endpoint (you set this). Without an endpoint attribute, the endpoint row of the heatmap simply doesn't render — the others still do.

python
from peekr import trace, get_current_span @trace def handle_request(req): get_current_span().attributes["endpoint"] = req.path return call_llm(...) # Or in a FastAPI middleware — one place, every request tagged @app.middleware("http") async def tag_span(request, call_next): with peekr.session(user_id=request.headers.get("X-Tenant-Id")): span, token = peekr.start_span(f"http.{request.method}") span.attributes["endpoint"] = request.url.path try: return await call_next(request) finally: peekr.end_span(span, token)
The dashboard reads from JSONL or SQLite — whatever you configured in peekr.instrument(). It's a post-hoc tool: rerun it whenever you want a fresh snapshot. For a real-time view, use peekr view --io in the terminal.

Feedback + export

Label traces as good or bad. Export labelled data as a fine-tuning dataset.

python
import peekr # Rate a trace peekr.feedback(trace_id="a3f2b1c0...", rating="good", note="perfect answer") peekr.feedback(trace_id="b2e4c8f1...", rating="bad", note="hallucinated") # Export good traces as OpenAI fine-tuning data peekr.export_feedback( db_path="traces.db", filter="good", output="training.jsonl", format="openai-ft", # or "raw" )

The openai-ft format produces one JSON object per trace:

training.jsonl
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

A/B experiments

Route traffic between variants and tag each span. Analyse results with SQL — no separate tracking tool needed.

python
from peekr import experiment # List variants — equal split by default @experiment(variants=["control", "test_v2"]) def run_agent(query: str, variant: str): model = "gpt-4o" if variant == "control" else "claude-opus-4-5" return call_llm(model, query) # Dict variants — passes config too @experiment(variants={ "control": {"model": "gpt-4o"}, "test": {"model": "claude-opus-4-5"}, }) def run_agent(query: str, variant: str, variant_config: dict): return call_llm(variant_config["model"], query)

Analyse in SQLite:

sql
SELECT json_extract(attributes,'$.experiment_variant') variant, COUNT(*) runs, AVG(CASE WHEN status='error' THEN 1.0 ELSE 0.0 END) error_rate, AVG(json_extract(attributes,'$.tokens_total')) avg_tokens, AVG(duration_ms) avg_ms FROM spans WHERE json_extract(attributes,'$.experiment_variant') IS NOT NULL GROUP BY variant;

Trace replay

Re-run a stored trace with the same inputs. Useful for reproducing production bugs locally or verifying a fix against a real failure.

python
from peekr.replay import replay_trace # Re-run from SQLite new_trace_id = replay_trace(trace_id="a3f2b1c0...", db_path="traces.db") print(f"New trace: {new_trace_id}") # Re-run from JSONL new_trace_id = replay_trace(trace_id="a3f2b1c0...", jsonl_path="traces.jsonl")

Or use the CLI:

terminal
peekr replay a3f2b1c0 peekr replay a3f2b1c0 --db traces.db peekr replay a3f2b1c0 --jsonl traces.jsonl
Replay re-runs the stored LLM inputs through the live SDK. The agent itself is not re-invoked — only the LLM calls are replayed. This means tool calls are not replicated, but you get a new trace showing exactly what the model produces with those inputs today.

Guardrails

Guardrails are synchronous, in-path enforcement rules that run on every LLM span. Two categories:

python
import peekr peekr.instrument( guardrails=[ peekr.guard.PIIRedact(), # strip PII before storage peekr.guard.Blocklist( terms=["confidential", "internal only"], action="raise", # block the response ), peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", # redact API keys from traces ), peekr.guard.HallucinationBlock(threshold=0.5), # block ungrounded responses ] )
Combine with evaluators. HallucinationBlock reuses the score from peekr.eval.Hallucination when both are wired — the LLM judge runs only once.
peekr.instrument(
    evaluators=[peekr.eval.Hallucination(detailed=True)],
    guardrails=[peekr.guard.HallucinationBlock(threshold=0.4)],
)

PIIRedact

Strips personal data from span.attributes["input"] and span.attributes["output"] before any storage exporter runs. Detected categories: email, phone, ssn, credit_card, ip_address.

python
# Redact everything (default) peekr.guard.PIIRedact() # Only emails and phones, only from output peekr.guard.PIIRedact( fields=("output",), categories=("email", "phone"), )

Blocklist

Three actions:

python
# Block on exact terms peekr.guard.Blocklist(terms=["confidential"], action="raise") # Redact common API key / secret patterns peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ) # Scan only inputs, case-sensitive peekr.guard.Blocklist( terms=["SECRET"], fields=("input",), case_sensitive=True, action="warn", )

HallucinationBlock

Raises GuardrailError when a response scores below the faithfulness threshold. The violation span is always stored before the error propagates — full audit trail guaranteed.

python
# Block any response less than 40% grounded peekr.guard.HallucinationBlock(threshold=0.4) # With detailed RAGAS claim breakdown peekr.guard.HallucinationBlock(threshold=0.5, detailed=True)

Handling GuardrailError

python
from peekr.guard import GuardrailError try: response = client.chat.completions.create(...) except GuardrailError as e: print(f"Blocked by {e.guardrail_name}: {e}") # e.span contains the full span that was stored — inspect attributes
Guardrails in Peekr Cloud. Every violation, redaction, and warning is recorded on the span and visible in the Guardrails tab of your project dashboard at peekr.starkspherelabs.com.

OTel receive — enterprise ingest

Enterprise teams with existing OpenTelemetry pipelines can send spans to Peekr Cloud without changing any instrumentation. Just add Peekr as a second exporter target alongside Datadog, Honeycomb, or any other backend. Peekr applies hallucination scoring, compliance guardrails, and the dashboard on top.

architecture
Your agents └─ OTel SDK / LangChain / LlamaIndex / Traceloop └─ OTel Collector ├─ Datadog exporter (existing) ├─ Honeycomb exporter (existing) └─ Peekr exporter ←── add this, zero other changes

OpenTelemetry Collector

otel-collector-config.yaml
exporters: otlphttp/peekr: endpoint: https://peekr.starkspherelabs.com/otlp headers: Authorization: "Bearer pk_live_…" service: pipelines: traces: exporters: [otlphttp/datadog, otlphttp/peekr]

Grafana Alloy

alloy.config
otelcol.exporter.otlphttp "peekr" { client { endpoint = "https://peekr.starkspherelabs.com/otlp" headers = { Authorization = "Bearer pk_live_…" } } }

Python — existing OTel SDK setup

python
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter peekr_exporter = OTLPSpanExporter( endpoint="https://peekr.starkspherelabs.com/otlp/v1/traces", headers={"Authorization": "Bearer pk_live_…"}, )

How Peekr reads your spans

Different frameworks use different attribute names for the same thing. Peekr normalises them automatically — no config needed:

What Peekr needsGen AI (OTel standard)OpenInference (LangChain)LangSmith legacy
Model namegen_ai.request.modelllm.model_namellm_model_name
Input tokensgen_ai.usage.input_tokensllm.token_count.prompttoken_usage.prompt_tokens
Output tokensgen_ai.usage.output_tokensllm.token_count.completiontoken_usage.completion_tokens
Input textgen_ai.promptinput.value
Output textgen_ai.completionoutput.value
Tenanttenant.id (resource)tenant.idtenant_id

One thing you need to do

Set a distinct service.name in each service's OTel resource — this is already standard practice. Peekr uses it to separate traffic in the dashboard:

python
from opentelemetry.sdk.resources import Resource resource = Resource.create({"service.name": "my-agent", "tenant.id": "acme"})

What OTel receive cannot do (vs the Peekr SDK)

Hallucination scoring and compliance guardrails require the full input/output text in span attributes. Many OTel instrumentations only capture token counts, not text — those spans will show cost and latency but skip eval. For full coverage, use peekr.instrument() directly. OTel receive is for teams who cannot change their instrumentation.

Peekr Cloud

The OSS SDK runs in your process, writes to local files, and is MIT licensed forever. When a single-process file isn't the right fit any more — multiple services, a team that needs shared dashboards, longer retention, audit-grade trace storage — Peekr Cloud is the managed backend.

Sign up at peekr.starkspherelabs.com — free up to 10k spans/month, no card required. Once you have a pk_live_ key from your project's Settings page:

1 — Install

bash
pip install "peekr[openai]" # or anthropic, langchain, crewai, etc.

2 — Instrument

python
import peekr peekr.instrument( tenant_id="acme", # your customer's org — optional but recommended exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", # from Settings → API keys ), )

That's the entire integration. instrument() auto-patches whichever LLM SDKs and agent frameworks are installed — every call is captured with zero further changes to your code. Spans batch in the background and appear in the dashboard within 5 seconds.

3 — TypeScript

typescript
import { instrument } from "@peekr/sdk"; instrument({ exporter: { type: "http", endpoint: "https://peekr.starkspherelabs.com", apiKey: "pk_live_…", }, });

Pricing

TierSpans / monthPrice
Free10k$0
Starter500k$29/mo
Pro5M$99/mo
Scale50M$399/mo

FastAPI middleware

peekr.FastAPIMiddleware (alias: PeekrASGIMiddleware) is a pure ASGI middleware that creates a root span for every HTTP request. All child spans — LLM calls, embeddings, @trace decorators — nest under it automatically. This turns a flat list of sibling spans into a proper trace tree.

Without middleware — spans appear as unrelated siblings:

waterfall
● openai.embeddings 1.4s ← no parent ● openai.chat.completions 16.4s ← no parent (same trace, but floating)

With middleware — every span is a child of the request:

waterfall
● POST /v1/answer 19.3s ← root span, full duration ├─ Embed query 1.4s └─ Generate answer 16.4s · 6.0k tok · 0.20¢

FastAPI — one line

python
import peekr from fastapi import FastAPI peekr.instrument(...) # existing call — no changes needed app = FastAPI() app.add_middleware(peekr.FastAPIMiddleware) # ← one new line

Starlette / raw ASGI

python
from peekr import PeekrASGIMiddleware app = PeekrASGIMiddleware(app) # wrap any ASGI app

Options

python
app.add_middleware( peekr.FastAPIMiddleware, tenant_header="X-Tenant-Id", # header → span.attributes["tenant_id"] user_header="X-User-Id", # header → span.attributes["user_id"] skip_paths={"/healthz", "/metrics"}, # paths that get no span (default set included) )
OptionDefaultDescription
tenant_header"X-Tenant-Id"Request header copied to span.attributes["tenant_id"]
user_header"X-User-Id"Request header copied to span.attributes["user_id"]
skip_paths{"/healthz", "/health", "/metrics", "/ping"}Exact paths that skip instrumentation entirely

Span attributes set by the middleware

AttributeExample value
http.method"POST"
http.path"/v1/answer"
http.status_code200
endpoint"/v1/answer" (FastAPI route pattern when available)
tenant_idValue of X-Tenant-Id header
user_idValue of X-User-Id header

Streaming responses

The middleware uses pure ASGI (not BaseHTTPMiddleware), so it handles streaming correctly. The root span closes when the last byte is sent — not when the response headers are sent. This means the span duration for a Server-Sent Events (SSE) endpoint reflects the full time the client is connected and receiving data.

What it does NOT trace automatically

The middleware creates the root span and nests all peekr-patched LLM calls under it. It does not automatically trace:

For those, wrap the function:

python
from peekr import trace, get_current_span @trace(name="db.recall_memories") def recall_memories(query: str, tenant_id: str): span = get_current_span() if span: span.attributes["query_preview"] = query[:80] span.attributes["tenant_id"] = tenant_id return db.rpc("recall_memories_hybrid", {...}).execute()

Guardrails

Guardrails enforce rules on LLM inputs and outputs. Three built-in types — each wires into the exporter pipeline automatically via instrument(guardrails=[...]).

PIIRedact — strip sensitive data before storage

Scans span.attributes["input"] and span.attributes["output"] and replaces PII with redaction tokens before the span is persisted. Runs before the storage exporter so your observability data stays clean.

python
peekr.instrument( guardrails=[ peekr.guard.PIIRedact(), # strips email, phone, SSN, card, IP peekr.guard.PIIRedact( fields=("output",), # only scan outputs categories=("email", "phone"), # specific categories only ), ] )

Detected categories: email, phone, ssn, credit_card, ip_address.

Blocklist — block or redact forbidden patterns

Three actions: "raise" (abort the call), "redact" (replace with [BLOCKED]), "warn" (log only). Use Blocklist.COMMON_SECRETS to catch OpenAI, Anthropic, GitHub, and Slack API keys.

python
peekr.instrument( guardrails=[ # Redact API keys from stored traces peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ), # Block calls where the input contains forbidden terms peekr.guard.Blocklist( terms=["confidential", "internal only"], action="raise", fields=("input",), # checked PRE-CALL — the API is never invoked ), ] )

Pre-call blocking: Blocklist(action="raise", fields=("input",)) runs before the LLM API call — if the input matches, the call is aborted and an API credit is saved.

HallucinationBlock — enforce faithfulness threshold

Raises GuardrailError (or records a warning) when the hallucination score falls below the threshold. Re-uses the score from EvalExporter if available — no second judge call.

python
peekr.instrument( evaluators=[peekr.eval.Hallucination(detailed=True)], guardrails=[ # Block responses below 40% grounded — raises GuardrailError peekr.guard.HallucinationBlock(threshold=0.4), # Warn only — records violation but lets the response through peekr.guard.HallucinationBlock(threshold=0.2, action="warn"), ] )

Full example

python
import peekr peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", ), evaluators=[peekr.eval.Hallucination(detailed=True)], guardrails=[ peekr.guard.PIIRedact(), peekr.guard.Blocklist( patterns=peekr.guard.Blocklist.COMMON_SECRETS, action="redact", ), peekr.guard.Blocklist( terms=["confidential"], action="raise", fields=("input",), ), peekr.guard.HallucinationBlock(threshold=0.3, action="warn"), ] )

Pipeline order (load-bearing): PIIRedact → EvalExporter → storage → HallucinationBlock. PII is stripped before evaluation. Violations are persisted before any GuardrailError propagates.

Trace naming

By default, every chat completion span shows as "LLM call". Peekr infers a purpose label from span attributes — but you can also set it explicitly for precise control.

Option 1 — Set the feature attribute (recommended)

python
from peekr import trace, get_current_span @trace(name="llm.generate_answer") async def generate_answer(query: str) -> str: span = get_current_span() if span: span.attributes["feature"] = "generate_answer" span.attributes["query_preview"] = query[:80] ... # LLM call inside here becomes a child span

The waterfall shows: Generate answer (gpt-4o-mini) · 4.4k instead of LLM call.

Built-in feature → label mappings:

feature valueDisplayed as
generate_answerGenerate answer
generate_structuredStructured output
goal_copilotGoal suggestions
entity_extractionEntity extraction
classifyClassify
summarizeSummarise
recallMemory recall
rememberStore memory

Option 2 — Automatic inference (no code changes)

Peekr reads the system prompt and applies pattern matching. Works for any app with no instrumentation required:

System prompt containsDisplayed as
"RAGAS Faithfulness" / "strict fact-checker"Hallucination eval
"memories listed below" / "cited sources"Generate answer
"JSON schema" / "structured output"Structured output
"extract entit-"Entity extraction
"summari-"Summarise

Option 3 — Custom label rules (Peekr Cloud)

In the dashboard under Settings → Advanced, define per-project rules that map prompt patterns, feature names, or span names to your own labels:

Match fieldPatternLabel
promptYou are a support agent for AcmeAcme Support Reply
featureonboarding_flowOnboarding Q&A
span_namellm.generate_answerRAG Answer

Rules are applied in priority order before built-in inference. No SDK update required.

Compliance guardrails (Cloud)

Peekr Cloud Pro includes industry-specific compliance packs — patterns and required disclosures are maintained server-side and fetched at instrument() time. Rules update when regulations change without requiring an SDK upgrade.

python
import peekr peekr.instrument( exporter=peekr.HTTPExporter( endpoint="https://peekr.starkspherelabs.com", api_key="pk_live_…", ), compliance=["FDCPA", "HIPAA"], # fetched from Cloud; enforced locally )

Available packs:

PackRegulationBlocks / warns
FDCPAFair Debt Collection Practices ActUnauthorized fee waivers, missing mini-Miranda, threats
HIPAAHIPAA Privacy Rule + FDAPHI in output, diagnosis as fact, prescribing AI
FINRAFINRA Rule 2111 / SEC Reg BISpecific investment recommendations, guaranteed returns
FAIR_HOUSINGFair Housing Act / RESPADemographic steering, coded neighborhood language
EEOC_ADAADA / GINA / Title VIIPre-offer disability, genetic, pregnancy inquiries
UPLUnauthorized Practice of LawSpecific legal strategy, outcome predictions
TCPATelephone Consumer Protection ActMissing AI identity disclosure, no opt-out mechanism
GDPRGDPR Art. 22 (EU)Missing automated-decision disclosure
EU_AI_ACTEU AI Act Art. 50Missing chatbot AI identity disclosure
TILA_ECOATruth in Lending / Equal CreditGuaranteed approval, discriminatory basis, missing APR
— UAE / MENA —
UAE_PDPLUAE Personal Data Protection LawConsent violations, indefinite retention, sensitive data without consent
UAE_DIFCDIFC Data Protection Law 2020Missing automated decision disclosure, no human review option
UAE_ADGMADGM Data Protection RegulationsAutomated decision disclosure, cross-border transfer without safeguards
UAE_CBUAECentral Bank UAE Consumer StandardsGuaranteed returns, unauthorized fee changes, missing financial disclaimer
UAE_DHADubai Health Authority — Health DataClinical diagnosis claims, prescribing AI, patient data without consent
UAE_RERADubai RERA — Real EstateGuaranteed property returns, demographic steering, unregistered property
KSA_PDPLSaudi Arabia PDPLCross-border transfer without SDAIA approval, sensitive data consent, automated decision disclosure

UAE / MENA Compliance Packs

UAE and Saudi Arabia are building major AI hubs (Dubai AI Roadmap 2031, ADGM AI Framework, NEOM) while simultaneously introducing data protection and sector-specific AI regulations. Peekr's MENA packs are modelled on the actual regulatory frameworks but should be reviewed by local legal counsel before production deployment.

python
peekr.instrument( exporter=peekr.HTTPExporter(endpoint="...", api_key="pk_live_…"), compliance=["UAE_PDPL", "UAE_CBUAE"], # mix UAE packs freely )

UAE PDPL — Federal Data Protection

Federal Decree-Law No. 45 of 2021. Applies across UAE outside DIFC and ADGM free zones. Covers any AI agent that processes personal data of UAE residents.

Prohibited outputs:

Required disclosures:

Penalties: Up to AED 1,000,000 per violation. Criminal liability for intentional breaches.

UAE DIFC — Dubai International Financial Centre

DIFC Data Protection Law 2020, Arts. 36-38. Applies to all DIFC-licensed entities. AI making automated decisions with legal or significant effects must provide disclosure and human review rights.

Required disclosures:

Prohibited: Claiming an automated decision is final or irreversible without offering human review. Processing sensitive data without explicit consent.

Penalties: DIFC Commissioner can issue enforcement notices; fines up to USD 100,000 per violation for serious breaches.

UAE ADGM — Abu Dhabi Global Market

ADGM Data Protection Regulations. GDPR-equivalent framework for all ADGM-registered entities. Closely mirrors GDPR Art. 22 automated decision requirements.

Required disclosures: Same as DIFC — automated processing disclosure, right to object, AI identity. Cross-border transfer requires adequacy or safeguards equivalent to GDPR SCCs.

Penalties: FSRA enforcement; fines aligned with DIFC scale.

UAE CBUAE — Central Bank Consumer Finance

Central Bank of UAE Consumer Protection Standards. Required for AI agents in banking, lending, investment, and financial advice operating in UAE.

Prohibited outputs:

Required disclosures:

Penalties: CBUAE enforcement action; fines up to AED 3,000,000 for consumer protection violations.

UAE DHA — Health Data

Dubai Health Authority Digital Health Strategy + UAE health data regulations. Similar principles to HIPAA but UAE-specific. Required for any AI handling patient data or providing health information in UAE.

Prohibited outputs:

Required disclosures: This is not medical advice. + Consult a licensed healthcare professional.

Penalties: DHA regulatory action; criminal liability for unlicensed medical practice.

UAE RERA — Real Estate

Dubai Real Estate Regulatory Agency advertising and consumer protection rules. Required for property search, investment, and rental AI agents operating in Dubai.

Prohibited outputs:

Required disclosures: Prices are indicative; not a binding offer.

KSA PDPL — Saudi Arabia Data Protection

Saudi Arabia Personal Data Protection Law (Royal Decree M/19, 2021; updated 2023, enforced September 2023). Applies to any AI processing data of Saudi residents regardless of where the processor is located.

Prohibited outputs:

Required disclosures:

Penalties: Up to SAR 5,000,000 (≈USD 1.3M) per violation. Criminal liability for intentional violations. SDAIA has active enforcement from 2023.

Enable packs in the dashboard under Compliance. Each pack can be set to raise (block the response) or warn (record violation, allow through).

FDCPA — Debt Collection

Fair Debt Collection Practices Act + CFPB Regulation F. Required for any AI agent that communicates with debtors.

Prohibited outputs (regex-matched):

Required disclosures (must appear in every response):

Example violation → User asks "Can you waive my $50 late fee?" AI replies "I can remove that fee for you." → blocked by we can (waive|remove) .* fee + missing mini-Miranda.

Severity: Civil $1,000/violation + attorneys fees. Criminal up to $5,000 + 1 year imprisonment for willful violations.

HIPAA — Healthcare

HIPAA Privacy Rule (45 CFR § 164) + FDA 21 CFR. Required for any AI agent accessing or communicating protected health information.

Prohibited outputs:

Required disclosures:

Example violation → AI says "Based on your symptoms, you have Type 2 diabetes and should take metformin 500mg." → diagnosis + prescribing blocked.

Severity: Civil $100–$50,000/violation. Criminal up to $250,000 + 10 years imprisonment.

FINRA / SEC Reg BI — Financial Advice

FINRA Rule 2111 + SEC Regulation Best Interest. Required for financial advice AI agents serving retail clients.

Prohibited outputs:

Required disclosures:

Example violation → AI says "You should buy NVIDIA stock, it's guaranteed to go up 20% this year." → specific recommendation + guaranteed return both blocked.

Severity: FINRA censure/suspension/bar. SEC civil penalties up to $1M/violation. Criminal: securities fraud up to 20 years.

Fair Housing Act — Real Estate

Fair Housing Act (42 U.S.C. § 3604) + RESPA. Required for property search, rental, and mortgage AI agents.

Prohibited outputs (demographic steering):

Required disclosure: Equal Housing Opportunity.

Example violation → AI says "This neighborhood is perfect for young families and has great schools." → demographic steering blocked.

Severity: Civil up to $70,000/violation + unlimited punitive. Criminal up to $1M + 1 year.

EEOC / ADA — Employment & HR

ADA + GINA + Title VII. Blocks prohibited pre-employment questions from HR and recruiting AI agents.

Prohibited inputs (checked pre-call — the LLM never receives these):

Required disclosure: We are an equal opportunity employer.

Example violation → Interview bot asks "Do you have any disabilities we should know about?" → blocked pre-call, LLM never invoked, no API cost.

Severity: EEOC compensatory + punitive up to $300,000. Pattern/practice: unlimited.

UPL — Unauthorized Practice of Law

State bar rules (ABA Model Rule 5.5). Prevents legal AI assistants from providing specific legal advice.

Prohibited outputs:

Required disclosures: This is not legal advice. and Consult a licensed attorney.

Severity: Criminal misdemeanor to felony by state. Unlimited civil liability for damages caused by reliance.

TCPA — Voice AI & Messaging

Telephone Consumer Protection Act + FCC 2024 AI Voice Ruling. Required for outbound voice AI and SMS marketing agents.

Required disclosures (must appear in every interaction):

Prohibited outputs:

Severity: $500/call (negligent), $1,500/call (willful). No cap — a 10,000-call campaign = up to $15M exposure. Class actions extremely common.

GDPR — Automated Decisions (EU)

GDPR Art. 22. Required disclosure for any AI making automated decisions with legal or significant effects on EU residents.

Required disclosures:

Prohibited outputs:

Severity: Up to €20M or 4% global annual revenue (whichever is higher).

EU AI Act — Chatbot Identity

EU AI Act Art. 50, effective February 2, 2025. All EU-facing chatbots must disclose AI nature. AI-generated synthetic media must be labeled.

Required disclosures:

Severity: Prohibited practices: €35M or 7% global turnover. Chatbot non-compliance: €15M or 3%.

TILA / ECOA — Banking & Lending

Truth in Lending Act + Equal Credit Opportunity Act + Fair Lending. Required for credit, mortgage, and lending AI agents.

Prohibited outputs:

Required disclosures:

Example violation → AI says "You're definitely approved — no credit check needed!" → guaranteed approval + material omission both blocked.

Severity: TILA criminal $5,000 + 1 year. ECOA $10,000/violation + punitive. Fair lending: DOJ referral, unlimited compensatory.