Security
How multivon-eval handles your data, how to report a vulnerability, and what we're building toward for compliance-sensitive deployments.
Data handling
- No telemetry. The library does not phone home. There is no opt-out toggle because there is nothing to opt out of.
- Local-first by default. Evaluation runs in your process. Outputs, traces, and audit logs are written to your filesystem.
- PII detection is regex, not LLM.
PIIEvaluatoruses local patterns. Detecting PII in an output does not require sending that output to a third party. - Judge calls are explicit. LLM-as-judge evaluators call the model provider you configure (Anthropic, OpenAI, Vertex, Bedrock, or your on-prem endpoint). No proxy through Multivon.
- Audit log is yours.
ComplianceReporterwrites hash-chained NDJSON to a directory you choose. No upload step.
Reporting a vulnerability
Email security@multivon.ai with reproduction steps. We aim to acknowledge within two business days and fix or coordinate disclosure within thirty days for high-severity issues.
Please do not open public GitHub issues for security reports. If you would like to use PGP, request our key in the first message.
Supply chain
- Releases are built and published from the public multivon-ai/multivon-eval repository.
- Runtime dependencies are pinned in
pyproject.tomland limited to a small set:anthropic,openai,jsonschema,python-dotenv,rich. - Heavier optional packs (
bertscore,litellm,playwright) are opt-in extras and never pulled in by default.
Compliance posture
multivon-eval is the measurement instrument; it does not by itself certify that your AI system is compliant. What the library does provide:
- EU AI Act high-risk evaluators wired to the correct Articles (9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2)) with paragraph-level audit annotations.
- NIST AI RMF subcategory mappings (MEASURE 2.3 / 2.5 / 2.6 / 2.10 / 2.11) on every evaluator result in the audit log.
- Hash-chained audit log so mid-log deletion is detectable, not just in-place edits.
- Coverage gap report —
reporter.coverage(suite)tells you which Articles your suite exercises and which it does not, including the process controls (Art. 11, 13, 14, 15(4-5)) that cannot be satisfied by evaluator output alone.
Process controls (technical documentation, transparency, human oversight, cybersecurity) require organizational measures beyond evaluation. Treat the audit log as evidence in a wider compliance program, not as a stand-in for one.