Security

How multivon-eval handles your data, how to report a vulnerability, and what we're building toward for compliance-sensitive deployments.

Data handling

  • No telemetry. The library does not phone home. There is no opt-out toggle because there is nothing to opt out of.
  • Local-first by default. Evaluation runs in your process. Outputs, traces, and audit logs are written to your filesystem.
  • PII detection is regex, not LLM. PIIEvaluator uses local patterns. Detecting PII in an output does not require sending that output to a third party.
  • Judge calls are explicit. LLM-as-judge evaluators call the model provider you configure (Anthropic, OpenAI, Vertex, Bedrock, or your on-prem endpoint). No proxy through Multivon.
  • Audit log is yours. ComplianceReporter writes hash-chained NDJSON to a directory you choose. No upload step.

Reporting a vulnerability

Email security@multivon.ai with reproduction steps. We aim to acknowledge within two business days and fix or coordinate disclosure within thirty days for high-severity issues.

Please do not open public GitHub issues for security reports. If you would like to use PGP, request our key in the first message.

Supply chain

  • Releases are built and published from the public multivon-ai/multivon-eval repository.
  • Runtime dependencies are pinned in pyproject.toml and limited to a small set: anthropic, openai, jsonschema, python-dotenv, rich.
  • Heavier optional packs (bertscore, litellm, playwright) are opt-in extras and never pulled in by default.

Compliance posture

multivon-eval is the measurement instrument; it does not by itself certify that your AI system is compliant. What the library does provide:

  • EU AI Act high-risk evaluators wired to the correct Articles (9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2)) with paragraph-level audit annotations.
  • NIST AI RMF subcategory mappings (MEASURE 2.3 / 2.5 / 2.6 / 2.10 / 2.11) on every evaluator result in the audit log.
  • Hash-chained audit log so mid-log deletion is detectable, not just in-place edits.
  • Coverage gap report reporter.coverage(suite) tells you which Articles your suite exercises and which it does not, including the process controls (Art. 11, 13, 14, 15(4-5)) that cannot be satisfied by evaluator output alone.

Process controls (technical documentation, transparency, human oversight, cybersecurity) require organizational measures beyond evaluation. Treat the audit log as evidence in a wider compliance program, not as a stand-in for one.