Formal Evaluation of Cryptographic Schemes
— Let the data speak.

Coverage × Correctness × Reproducibility — from high-level intent to implementation artifacts, across five model capabilities.

See latest results Why choose C.F.B

9Models

7Formal languages

5Capabilities

100%Reproducible

Why C.F.B

Multi-backend validation

ProVerif, Tamarin, Scyther, Maude-NPA, AVISPA, CryptoVerif, EasyCrypt — cross-tool verification for robust conclusions.

Design → Implementation

From protocol goals/assumptions to implementation-level artifacts (e.g., Jasmin/constant-time checks), end-to-end coverage.

Five capabilities

Interpretation / Generation/ Completion / Transformation / Correction — not just “write”, but “verify”.

Comprehensive dataset

Curated end-to-end from design intent to implementation, covering protocols/schemes, security properties (secrecy, auth, integrity, freshness, PFS, IND-CPA/CCA, equivalence), adversary models (Dolev–Yao, state/ephemeral leaks, CK/eCK).

Overall Leaderboard (recomputed)

Sort by

Capability Radar

Coverage

Schemes/Protocols × Security properties × Adversary models × Formal languages

SecrecyAuthenticationIntegrityFreshness Forward SecrecyIND-CPA/CCAEquivalenceDolev–Yao CK/eCKProVerifTamarinScytherAVISPAMaude-NPACryptoVerifEasyCrypt

Evaluation Pipeline

1 Generate
2 Verify
3 Aggregate
4 Viz
5 Render

Fully scriptable & CI-friendly; scoring formulas and weights are configurable.

Get Started

Use scripts/cfb.sh to reproduce results and figures locally in one go.

Explore Results See Pipeline

Visualization of the Evaluation Process (Real-world Cases)

Task Language Model

① Dataset Content (inputdata)

file—

—

② Prompt (prompt)

—

③ Model Input (modelinput)

—

④ Model Output (modeloutput)

—

⑤ Eval Result (evalresult)

—

Video: Automatic Validation for Generation Task of Seven Formal Languages (grok-3 LLM)

Can’t play inline? Open in a new tab or Download the video.

Formal Evaluation of Cryptographic Schemes— Let the data speak.