Security

How multivon-eval handles your data, how to report a vulnerability, and what we're building toward for compliance-sensitive deployments.

Data handling

No telemetry. The library does not phone home. There is no opt-out toggle because there is nothing to opt out of.
Local-first by default. Evaluation runs in your process. Outputs, traces, and audit logs are written to your filesystem.
PII detection is regex, not LLM. PIIEvaluator uses local patterns. Detecting PII in an output does not require sending that output to a third party.
Judge calls are explicit. LLM-as-judge evaluators call the model provider you configure (Anthropic, OpenAI, Vertex, Bedrock, or your on-prem endpoint). No proxy through Multivon.
Audit log is yours. ComplianceReporter writes hash-chained NDJSON to a directory you choose. No upload step.

Reporting a vulnerability

Email security@multivon.ai with reproduction steps. We aim to acknowledge within two business days and fix or coordinate disclosure within thirty days for high-severity issues.

Please do not open public GitHub issues for security reports. If you would like to use PGP, request our key in the first message.

Supply chain

Releases are built and published from the public multivon-ai/multivon-eval repository.
Runtime dependencies are pinned in pyproject.toml and limited to a small set: anthropic, openai, jsonschema, python-dotenv, rich.
Heavier optional packs (bertscore, litellm, playwright) are opt-in extras and never pulled in by default.

Compliance posture

multivon-eval is the measurement instrument; it does not by itself certify that your AI system is compliant. What the library does provide:

EU AI Act high-risk evaluators wired to the correct Articles (9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2)) with paragraph-level audit annotations.
NIST AI RMF subcategory mappings (MEASURE 2.3 / 2.5 / 2.6 / 2.10 / 2.11) on every evaluator result in the audit log.
DPDP (India) jurisdiction — Aadhaar, PAN, GSTIN, IFSC, Voter ID, and +91 mobile detection, regex-only with zero egress (matters under DPDP §16 cross-border transfer restrictions).
Hash-chained audit log so mid-log deletion is detectable, not just in-place edits.
Coverage gap report — reporter.coverage(suite) tells you which Articles your suite exercises and which it does not, including the process controls (Art. 11, 13, 14, 15(4-5)) that cannot be satisfied by evaluator output alone.

Process controls (technical documentation, transparency, human oversight, cybersecurity) require organizational measures beyond evaluation. Treat the audit log as evidence in a wider compliance program, not as a stand-in for one.