Technology

How Enclara runs AI on encrypted medical images

A deep dive into the cryptography, model design, and system architecture that makes private diagnosis possible.

Primer

What is Fully Homomorphic Encryption?

FHE is a form of encryption that allows computation on ciphertext. The result, when decrypted, matches the result of performing the same operations on the plaintext — but no one except the key holder ever sees the unencrypted data.

Encrypt

Data is encrypted client-side with a secret key. It becomes mathematically opaque to anyone without the key.

Compute

The server performs additions and multiplications directly on the encrypted values. It processes data it cannot read.

Decrypt

The encrypted result is returned to the client. Only the secret key holder can decrypt the output.

Enc(x) + Enc(y) = Enc(x + y) | Enc(x) * Enc(y) = Enc(x * y)
The server sees ciphertext. The math still works. The patient sees results.

Model

QuantVGG11Patch

A 5-bit quantized VGG11 variant purpose-built for FHE. Every layer is chosen to be compatible with encrypted arithmetic.

Standard VGG11	→	QuantVGG11Patch (FHE)	Why
Conv2d	→	QuantConv2d (5-bit weights)	Reduces multiplication depth
ReLU	→	QuantReLU (5-bit activations)	Bounded activations for FHE polynomial approx
MaxPool2d	→	AvgPool2d	Max is a comparison — expensive in FHE
FC layers (4096, 4096, 1000)	→	Single QuantLinear(512, 7)	Fewer parameters = faster encrypted inference
224×224 input	→	32×32 patch input	Smaller input = manageable FHE circuit size

PatchAggregator

The aggregation layer sits outside the FHE circuit. It takes the 49 per-patch classification vectors (each with 7 logits), computes the max logit per class across all patches, and returns the final diagnosis. This runs in plaintext on the client after decryption — no encrypted comparisons needed.

Pipeline

From training to encrypted inference

Train

train.py

Fine-tune QuantVGG11Patch on HAM10000 (10,015 dermoscopic images, 7 classes)
Initialize conv layers from pretrained ImageNet VGG11 weights
Split train/val by lesion_id to prevent data leakage between patients
Inverse-frequency class weighting handles severe class imbalance

Compile to FHE

fhe_convert.py

Load trained weights into QuantVGG11Patch
Build calibration dataset from random validation patches
Compile to FHE circuit via Concrete-ML’s compile_brevitas_qat_model
Optionally sweep rounding_threshold_bits (8→4) for speed/accuracy tradeoffs
Outputs client.zip (quantization params + keys) and server circuit

Client encryption

Mobile app

On first launch: generate FHE secret key + evaluation key via Concrete-ML
Upload evaluation key to server with a random client ID for reuse
For each scan: crop to 224×224, split into 49 patches (32×32)
Quantize each patch per client.zip parameters, encrypt, send one-by-one

Server inference

AWS EC2

Receive 49 encrypted patches + client ID
Load evaluation key for that client
Run FHE circuit on each encrypted patch
Return 49 encrypted classification vectors (7 logits each)

Design decisions

Why every choice exists

Why patches instead of the full image?

FHE circuit complexity grows with input size. A 32×32 patch keeps the circuit small enough for practical inference times. The 7×7 grid covers the full 224×224 image.

Why 5-bit quantization?

Lower bit-width reduces the multiplicative depth of the encrypted circuit. 5 bits is the sweet spot where model accuracy is preserved while FHE computation remains feasible.

Why AvgPool instead of MaxPool?

Max requires comparison operations, which are extremely expensive in FHE (they require many encrypted multiplications). Average pooling uses only additions and a constant division.

Why a single linear classifier?

VGG11 normally has 3 fully-connected layers with 4096 neurons each. Each encrypted multiplication adds latency. A single QuantLinear(512, 7) minimizes the FC overhead.

Why send patches one-by-one?

Each encrypted patch is large (the evaluation key alone is significant). Sending patches sequentially keeps memory manageable on both client and server.

Why client-side aggregation?

The PatchAggregator takes the max logit per class across 49 patches. Max is a comparison — doing this in FHE would be expensive. In plaintext after decryption, it’s trivial.

Architecture

Client-server trust model

Client (iOS app)

Holds secret key — never leaves device
Generates evaluation key (public, sent to server once)
Encrypts patches, decrypts results
Runs PatchAggregator in plaintext
Stores scan history locally as JSON

Server (AWS EC2)

Holds only evaluation keys and FHE circuit
Cannot decrypt any patient data
Processes encrypted patches through FHE circuit
Returns encrypted classification vectors
Stateless per-inference — no patient data stored

Key insight: Even a compromised server reveals nothing. Without the secret key, the encrypted patches and results are computationally indistinguishable from random noise.

Programmable Cryptography

Cryptography as a cooperation tool

FHE transforms the trust model for medical AI. Instead of asking patients to trust a server, Enclara lets them verify mathematically that their data was never exposed. This is what programmable cryptography unlocks — useful computation between parties who don't need to trust each other.

Explore the source code About the team