Troubleshooting¶
Inference pod not becoming ready¶
Symptom: inference pod stays in Init or readiness probe fails¶
The inference container loads two ML models (~1.5 GB combined) on startup. This takes 2–5 minutes on a cold pull and 30–60 seconds on a warm node with cached layers. The readiness probe has a 3-minute initial delay to account for this.
# Watch pod status
kubectl get pods -n kysira -w
# Stream inference logs to see load progress
kubectl logs -n kysira -l app.kubernetes.io/name=kysira-inference -f
Look for Model loaded or status: ok in the logs. If you see a Python traceback instead:
-
OOMKilled — the node doesn't have enough memory. The two models need ~1.5 GB resident. Increase the memory limit:
Then increase via Helm: -
Image pull error — the
kysira-pullsecret may be missing or expired. Re-create it with a fresh token from app.kysira.com:
High inference latency / scoring timeouts¶
Symptom: proxy logs inference error: context deadline exceeded¶
The proxy and ext-proc timeout after 500 ms waiting for a score. If inference is slower:
# Check which device inference is using
kubectl exec -n kysira deployment/kysira-inference -- \
wget -qO- http://localhost:8081/health | python3 -m json.tool
On CPU, the prompt-injection classifier takes 200–400 ms per request. Options:
-
GPU — set
KYSIRA_DEVICE=cudaand add a GPU resource limit: -
DaemonSet mode — one inference pod per node eliminates the network hop:
-
Increase timeout — the proxy and ext-proc read
KYSIRA_INFERENCE_TIMEOUT(default500ms). Increase if you're on CPU and timeouts are occasional rather than systematic.
The proxy and ext-proc fail open on timeout — the request passes through with an error logged, never dropped.
False positives (legitimate requests flagged)¶
Symptom: normal API calls appear in the dashboard with high scores¶
# Check what the inference service returns for a specific payload
kubectl exec -n kysira deployment/kysira-inference -- \
wget -qO- --post-data='{"request_text":"your payload here"}' \
--header='Content-Type: application/json' \
http://localhost:8081/score/all | python3 -m json.tool
The detector field in the response tells you which classifier fired. Common causes:
-
sqli— the SQL injection classifier can be over-eager on SQL keywords in prose. Raise the threshold:helm upgrade kysira oci://ghcr.io/kysira/charts/kysira-platform \ --namespace kysira --reuse-values \ --set "kysira-proxy.config.scoreThreshold=0.98"0.98significantly reduces false positives at some cost to recall. -
xss— the regex fires on<script,javascript:,onerror=, and similar. If your app legitimately POSTs HTML content, route those specific paths outside Kysira. -
nosqli— legitimate Mongo$in/$gte/$ltefilter operators in JSON bodies trigger an advisory score (below the kill line by default). They only block in active mode if you've lowered the threshold below0.7.
Active mode not blocking requests¶
Symptom: mode is active but malicious requests still reach your app¶
-
Confirm the mode change took effect:
-
Check the dashboard — if
actionshowsshadow_killinstead ofactive_kill, the pod is still in shadow mode. The mode API sets mode in-memory per pod; use Helm to persist it across restarts: -
If
actionshowspassed, the score is below the threshold — the request is genuinely not being flagged. Lower the threshold or test with a more obvious payload like' OR 1=1--. -
ext-proc only — the mode API is on the HTTP port (
:9090), not the gRPC port. Confirm you're hitting the right service and port:
Dashboard shows no events¶
Symptom: dashboard loads but the event feed is empty¶
# Check the proxy is healthy and the dashboard can reach it
kubectl logs -n kysira -l app.kubernetes.io/name=kysira-dashboard
The dashboard proxies /api/ requests to the kysira-proxy service. Upstream connection errors here mean the proxy service name or port is wrong in the dashboard ConfigMap — check the proxyServiceName Helm value matches the proxy service name:
If the service names look right, check the proxy itself is receiving traffic:
HPA not scaling inference¶
Symptom: inference pod count stays at 1 under load¶
The HPA requires the Metrics Server. Check if it's installed:
If this fails, install the Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Then check the HPA status:
The HPA targets 80% CPU by default. If inference is GPU-bound, CPU utilization will be low and the HPA won't trigger — set a fixed replica count or use a custom Prometheus-based metric instead.
Checking metrics directly¶
# Proxy metrics
kubectl exec -n kysira deployment/kysira-proxy -- \
wget -qO- http://localhost:8080/metrics | grep kysira_
# ext-proc metrics
kubectl exec -n kysira deployment/kysira-ext-proc -- \
wget -qO- http://localhost:9090/metrics | grep kysira_extproc_
# Inference metrics
kubectl exec -n kysira deployment/kysira-inference -- \
wget -qO- http://localhost:8081/metrics | grep kysira_inference_
See Observability for connecting these to Grafana or Datadog.