How Enclara runs AI on encrypted medical images
A deep dive into the cryptography, model design, and system architecture that makes private diagnosis possible.
What is Fully Homomorphic Encryption?
FHE is a form of encryption that allows computation on ciphertext. The result, when decrypted, matches the result of performing the same operations on the plaintext — but no one except the key holder ever sees the unencrypted data.
Encrypt
Data is encrypted client-side with a secret key. It becomes mathematically opaque to anyone without the key.
Compute
The server performs additions and multiplications directly on the encrypted values. It processes data it cannot read.
Decrypt
The encrypted result is returned to the client. Only the secret key holder can decrypt the output.
Enc(x) + Enc(y) = Enc(x + y) | Enc(x) * Enc(y) = Enc(x * y)
The server sees ciphertext. The math still works. The patient sees results.
QuantVGG11Patch
A 5-bit quantized VGG11 variant purpose-built for FHE. Every layer is chosen to be compatible with encrypted arithmetic.
| Standard VGG11 | → | QuantVGG11Patch (FHE) | Why |
|---|---|---|---|
| Conv2d | → | QuantConv2d (5-bit weights) | Reduces multiplication depth |
| ReLU | → | QuantReLU (5-bit activations) | Bounded activations for FHE polynomial approx |
| MaxPool2d | → | AvgPool2d | Max is a comparison — expensive in FHE |
| FC layers (4096, 4096, 1000) | → | Single QuantLinear(512, 7) | Fewer parameters = faster encrypted inference |
| 224×224 input | → | 32×32 patch input | Smaller input = manageable FHE circuit size |
PatchAggregator
The aggregation layer sits outside the FHE circuit. It takes the 49 per-patch classification vectors (each with 7 logits), computes the max logit per class across all patches, and returns the final diagnosis. This runs in plaintext on the client after decryption — no encrypted comparisons needed.
From training to encrypted inference
Train
train.py- Fine-tune QuantVGG11Patch on HAM10000 (10,015 dermoscopic images, 7 classes)
- Initialize conv layers from pretrained ImageNet VGG11 weights
- Split train/val by lesion_id to prevent data leakage between patients
- Inverse-frequency class weighting handles severe class imbalance
Compile to FHE
fhe_convert.py- Load trained weights into QuantVGG11Patch
- Build calibration dataset from random validation patches
- Compile to FHE circuit via Concrete-ML’s compile_brevitas_qat_model
- Optionally sweep rounding_threshold_bits (8→4) for speed/accuracy tradeoffs
- Outputs client.zip (quantization params + keys) and server circuit
Client encryption
Mobile app- On first launch: generate FHE secret key + evaluation key via Concrete-ML
- Upload evaluation key to server with a random client ID for reuse
- For each scan: crop to 224×224, split into 49 patches (32×32)
- Quantize each patch per client.zip parameters, encrypt, send one-by-one
Server inference
AWS EC2- Receive 49 encrypted patches + client ID
- Load evaluation key for that client
- Run FHE circuit on each encrypted patch
- Return 49 encrypted classification vectors (7 logits each)
Why every choice exists
Why patches instead of the full image?
FHE circuit complexity grows with input size. A 32×32 patch keeps the circuit small enough for practical inference times. The 7×7 grid covers the full 224×224 image.
Why 5-bit quantization?
Lower bit-width reduces the multiplicative depth of the encrypted circuit. 5 bits is the sweet spot where model accuracy is preserved while FHE computation remains feasible.
Why AvgPool instead of MaxPool?
Max requires comparison operations, which are extremely expensive in FHE (they require many encrypted multiplications). Average pooling uses only additions and a constant division.
Why a single linear classifier?
VGG11 normally has 3 fully-connected layers with 4096 neurons each. Each encrypted multiplication adds latency. A single QuantLinear(512, 7) minimizes the FC overhead.
Why send patches one-by-one?
Each encrypted patch is large (the evaluation key alone is significant). Sending patches sequentially keeps memory manageable on both client and server.
Why client-side aggregation?
The PatchAggregator takes the max logit per class across 49 patches. Max is a comparison — doing this in FHE would be expensive. In plaintext after decryption, it’s trivial.
Client-server trust model
Client (iOS app)
- Holds secret key — never leaves device
- Generates evaluation key (public, sent to server once)
- Encrypts patches, decrypts results
- Runs PatchAggregator in plaintext
- Stores scan history locally as JSON
Server (AWS EC2)
- Holds only evaluation keys and FHE circuit
- Cannot decrypt any patient data
- Processes encrypted patches through FHE circuit
- Returns encrypted classification vectors
- Stateless per-inference — no patient data stored
Key insight: Even a compromised server reveals nothing. Without the secret key, the encrypted patches and results are computationally indistinguishable from random noise.
Cryptography as a cooperation tool
FHE transforms the trust model for medical AI. Instead of asking patients to trust a server, Enclara lets them verify mathematically that their data was never exposed. This is what programmable cryptography unlocks — useful computation between parties who don't need to trust each other.