Skip to content

Troubleshooting


Inference pod not becoming ready

Symptom: inference pod stays in Init or readiness probe fails

The inference container loads two ML models (~1.5 GB combined) on startup. This takes 2–5 minutes on a cold pull and 30–60 seconds on a warm node with cached layers. The readiness probe has a 3-minute initial delay to account for this.

# Watch pod status
kubectl get pods -n kysira -w

# Stream inference logs to see load progress
kubectl logs -n kysira -l app.kubernetes.io/name=kysira-inference -f

Look for Model loaded or status: ok in the logs. If you see a Python traceback instead:

  • OOMKilled — the node doesn't have enough memory. The two models need ~1.5 GB resident. Increase the memory limit:

    kubectl describe pod -n kysira -l app.kubernetes.io/name=kysira-inference | grep -A5 OOMKilled
    
    Then increase via Helm:
    helm upgrade kysira oci://ghcr.io/kysira/charts/kysira-platform \
      --namespace kysira --reuse-values \
      --set "kysira-inference.resources.limits.memory=4Gi" \
      --set "kysira-inference.resources.requests.memory=2Gi"
    

  • Image pull error — the kysira-pull secret may be missing or expired. Re-create it with a fresh token from app.kysira.com:

    kubectl delete secret kysira-pull -n kysira
    kubectl create secret docker-registry kysira-pull \
      --docker-server=ghcr.io \
      --docker-username=<username> \
      --docker-password=<token> \
      --namespace kysira
    


High inference latency / scoring timeouts

Symptom: proxy logs inference error: context deadline exceeded

The proxy and ext-proc timeout after 500 ms waiting for a score. If inference is slower:

# Check which device inference is using
kubectl exec -n kysira deployment/kysira-inference -- \
  wget -qO- http://localhost:8081/health | python3 -m json.tool

On CPU, the prompt-injection classifier takes 200–400 ms per request. Options:

  1. GPU — set KYSIRA_DEVICE=cuda and add a GPU resource limit:

    kysira-inference:
      config:
        device: cuda
      resources:
        limits:
          nvidia.com/gpu: "1"
    

  2. DaemonSet mode — one inference pod per node eliminates the network hop:

    helm upgrade kysira oci://ghcr.io/kysira/charts/kysira-platform \
      --namespace kysira --reuse-values \
      --set "kysira-inference.daemonSet.enabled=true"
    

  3. Increase timeout — the proxy and ext-proc read KYSIRA_INFERENCE_TIMEOUT (default 500ms). Increase if you're on CPU and timeouts are occasional rather than systematic.

The proxy and ext-proc fail open on timeout — the request passes through with an error logged, never dropped.


False positives (legitimate requests flagged)

Symptom: normal API calls appear in the dashboard with high scores

# Check what the inference service returns for a specific payload
kubectl exec -n kysira deployment/kysira-inference -- \
  wget -qO- --post-data='{"request_text":"your payload here"}' \
  --header='Content-Type: application/json' \
  http://localhost:8081/score/all | python3 -m json.tool

The detector field in the response tells you which classifier fired. Common causes:

  • sqli — the SQL injection classifier can be over-eager on SQL keywords in prose. Raise the threshold:

    helm upgrade kysira oci://ghcr.io/kysira/charts/kysira-platform \
      --namespace kysira --reuse-values \
      --set "kysira-proxy.config.scoreThreshold=0.98"
    
    0.98 significantly reduces false positives at some cost to recall.

  • xss — the regex fires on <script, javascript:, onerror=, and similar. If your app legitimately POSTs HTML content, route those specific paths outside Kysira.

  • nosqli — legitimate Mongo $in/$gte/$lte filter operators in JSON bodies trigger an advisory score (below the kill line by default). They only block in active mode if you've lowered the threshold below 0.7.


Active mode not blocking requests

Symptom: mode is active but malicious requests still reach your app

  1. Confirm the mode change took effect:

    kubectl exec -n kysira deployment/kysira-proxy -- \
      wget -qO- http://localhost:8080/api/mode
    # → {"mode":"active"}
    

  2. Check the dashboard — if action shows shadow_kill instead of active_kill, the pod is still in shadow mode. The mode API sets mode in-memory per pod; use Helm to persist it across restarts:

    helm upgrade kysira oci://ghcr.io/kysira/charts/kysira-platform \
      --namespace kysira --reuse-values \
      --set "kysira-proxy.config.mode=active"
    

  3. If action shows passed, the score is below the threshold — the request is genuinely not being flagged. Lower the threshold or test with a more obvious payload like ' OR 1=1--.

  4. ext-proc only — the mode API is on the HTTP port (:9090), not the gRPC port. Confirm you're hitting the right service and port:

    kubectl exec -n kysira deployment/kysira-ext-proc -- \
      wget -qO- http://localhost:9090/api/mode
    


Dashboard shows no events

Symptom: dashboard loads but the event feed is empty

# Check the proxy is healthy and the dashboard can reach it
kubectl logs -n kysira -l app.kubernetes.io/name=kysira-dashboard

The dashboard proxies /api/ requests to the kysira-proxy service. Upstream connection errors here mean the proxy service name or port is wrong in the dashboard ConfigMap — check the proxyServiceName Helm value matches the proxy service name:

kubectl get svc -n kysira

If the service names look right, check the proxy itself is receiving traffic:

kubectl logs -n kysira -l app.kubernetes.io/name=kysira-proxy | tail -20

HPA not scaling inference

Symptom: inference pod count stays at 1 under load

The HPA requires the Metrics Server. Check if it's installed:

kubectl top pods -n kysira

If this fails, install the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Then check the HPA status:

kubectl get hpa -n kysira
kubectl describe hpa -n kysira kysira-inference

The HPA targets 80% CPU by default. If inference is GPU-bound, CPU utilization will be low and the HPA won't trigger — set a fixed replica count or use a custom Prometheus-based metric instead.


Checking metrics directly

# Proxy metrics
kubectl exec -n kysira deployment/kysira-proxy -- \
  wget -qO- http://localhost:8080/metrics | grep kysira_

# ext-proc metrics
kubectl exec -n kysira deployment/kysira-ext-proc -- \
  wget -qO- http://localhost:9090/metrics | grep kysira_extproc_

# Inference metrics
kubectl exec -n kysira deployment/kysira-inference -- \
  wget -qO- http://localhost:8081/metrics | grep kysira_inference_

See Observability for connecting these to Grafana or Datadog.