01A de-identification engine for clinical Arabic 31.7±0.4 ms / record

Unlock your hospital's data for research — safely.

Saudi Arabia is investing billions in health AI and genomics under Vision 2030. But PDPL requires patient data to be anonymised before it leaves the wall — to researchers, AI vendors, or partners abroad. We make that step fast, accurate, and auditable, running entirely on your own infrastructure.

Saudi Arabia's Personal Data Protection Law (PDPL), in force since September 2023, classifies all health data as sensitive. Sharing it with researchers, AI vendors, or partners abroad without proper de-identification carries fines of up to SAR 3 million and, for wilful breach, custodial sentences.

Vision 2030 commits over USD 65 billion to digital health, genomics, and clinical AI by 2030. Hospitals are expected to feed national datasets, partner with AI labs, and publish research — yet most have no operational way to scrub Arabic clinical text at scale.

The result is a quiet bottleneck: data sits in EHRs, projects stall in legal review, and external collaborators wait. We built VeilHealth to remove this single, narrow blockage.

SAR 3M

Maximum PDPL penalty

Per breach involving sensitive personal data. Custodial sentences apply for wilful disclosure.

USD 65B

Vision 2030 health-tech commitment

Across genomics, AI, and digital infrastructure programmes — all upstream of de-identified data.

72%

Current Arabic recall, MSA

Honest baseline. Improving monthly. We always recommend a human-in-the-loop review pass.

0 bytes

Leave your network

Runs entirely on your infrastructure — VM, container, or air-gapped appliance. No cloud egress, ever.

VeilHealth ingests free-text notes, scanned PDFs, structured fields, and DICOM metadata. It detects and redacts personally identifying entities — patient names, national IDs, MRNs, phone numbers, addresses, dates of admission and discharge, family relations — across both Modern Standard Arabic and English, including dialectal spellings common in Gulf clinical notes.

Every redaction event writes to a tamper-evident log. An optional encrypted re-identification map, accessible only to roles you nominate, allows authorised staff to relink records when a regulator, IRB, or treating physician requires it.

Input · raw before

Output · redacted after

A · Coverage

19 PII categories, two languages.

Names, IDs, MRNs, contact details, dates, addresses, providers, family members. Tuned for MSA and Gulf clinical idioms.

B · Format

PDFs, free text, DICOM, FHIR.

OCR for scanned Arabic forms. JSON + DOCX + PDF in, same out, with structure preserved.

C · Integrity

Tamper-evident audit trail.

Every detection, redaction, and re-identification request is logged and hashed. Exportable to your SIEM.

D · Sovereignty

Open source, on your kit.

Apache-2.0 core. Deploys via Docker, Kubernetes, or RHEL appliance. No telemetry, no phone-home.

04Workflow

Three steps. No data leaves the wall.

Step 01 · Ingest ↓

Upload or stream.

Drop a folder, a file, or pipe a live HL7 / FHIR feed into the connector. Files never leave your network — VeilHealth runs as a container alongside your existing systems.

Step 02 · Detect & redact ⊘

Identify, redact, log.

19 entity classes recognised across Arabic + English. Each redaction writes to a hashed audit ledger with the analyst's identity, timestamp, and policy rule that triggered it.

Step 03 · Export →

De-identified out.

Receive the clean dataset, plus an encrypted re-identification map kept on your premises. Authorised staff with a key — and a logged reason — can relink when a regulator or IRB requires it.

05Use cases

Built for the work that doesn't ship today.

Research data sharing

Send IRB-approved cohorts to academic partners.

Run the redaction across thousands of patient records, attach the de-identification certificate to the data transfer agreement, and ship without a six-month legal review.

AI / ML vendor onboarding

Hand a vendor real data — without handing them PHI.

Whether the vendor is a domestic startup or a global cloud, you keep the original records and a re-identification key. They get only what they need to train.

Audit & regulator response

Demonstrate de-identification provenance on demand.

When SDAIA or a Ministry of Health auditor asks how a record was anonymised, hand them the hashed log entry, the policy version, and the human reviewer's signature.

Cross-border collaboration

Move data abroad inside the PDPL fence.

Anonymised data is exempt from PDPL's cross-border restrictions. VeilHealth's output, paired with the audit trail, is what your data-protection officer signs off on.

06Honest limits

What VeilHealth is not.

Compliance buyers prefer to read the limits before the features. We agree.

What it does

Detects and redacts 19 PII classes in Arabic + English clinical text.
Runs entirely on your infrastructure — Docker, K8s, or appliance.
Produces a hashed, exportable audit trail per record.
Supports OCR for scanned forms in Arabic and English.
Provides an encrypted re-identification map for authorised use.

What it doesn't do

Replace your DPO or legal-compliance review of data transfers.
Guarantee 100% recall — current MSA Arabic recall is ~72%, English ~96%. We recommend a human-in-the-loop review pass.
De-identify imaging pixel data (DICOM headers only — pixel-level redaction is on the 2026 roadmap).
Send any data to external services. There is no SaaS mode.
Provide ML training itself — we anonymise; your team trains.

07Provenance

Open. Auditable. Built for this region.

Foundation

OpenAI Privacy Filter

Adapted and re-trained for clinical Arabic and Saudi healthcare entity formats.

Code

Apache-2.0 on GitHub

Inspectable, forkable, no proprietary blob. Read every line that touches your patients' data.

Compliance

PDPL · UAE PDPL · GCC alignment

Designed against the Saudi PDPL and UAE Federal Data Protection Law. Mappable to GDPR Article 4(5).

Sovereignty

Air-gap capable

No outbound calls, no telemetry. Runs in fully isolated networks where many hospital workloads have to live.

08Pilot programme

Ten hospitals. Eight weeks. One de-identified dataset.

We're onboarding ten partners for the 2026 Q3 cohort. Each pilot includes deployment support, a tailored MSA fine-tune on your own corpus, a PDPL impact assessment, and direct access to the engineering team. No fee during the pilot window.