Security

Security by design, not optional

LMbox architecture is designed first and foremost so your data never leaves. Here's how, concretely.

🏢

100% on-premise

No outbound connections by default. The Box operates within your LAN, air-gappable if needed. Your data NEVER transits through a third-party service.

🔐

End-to-end encryption

Internal TLS, LUKS on disk with TPM 2.0, Postgres encrypted at-rest, Restic encrypted backups. Keys stay with you, never with us.

👤

Enterprise SSO

Native Active Directory, Azure AD, Okta, Google Workspace integration via OIDC. Authentik front-end for fine-grained group-based authorization.

📋

Audit logs

Every interaction traced and signed: who asked what, when, to which model, with which sources. Configurable retention per your policy.

Architecture

One Box, multiple layers

┌──────────────────────────────────────────────────────┐
│  Internet                            ❌ pas d'accès  │
└──────────────────────────────────────────────────────┘
                         │
                ┌────────┴────────┐
                │  Firewall LAN   │   Port 443 entrant uniquement
                └────────┬────────┘
                         │
       ┌─────────────────┴─────────────────┐
       │  Reverse proxy + TLS (Caddy)      │   Cert. CA entreprise
       └─────────────────┬─────────────────┘
                         │
       ┌─────────────────┴─────────────────┐
       │  SSO Authentik   (OIDC vers AD)   │   Auth + autorisation
       └─────────────────┬─────────────────┘
                         │
   ┌──────────────┬──────┴──────┬──────────────┐
   │ Open WebUI   │ LiteLLM     │ lmbox-rag    │
   │ (chat UI)    │ (API gw)    │ (RAG, KB)    │
   └──────┬───────┴──────┬──────┴──────┬───────┘
          │              │             │
          └──────────────┴─────────────┘
                         │
                ┌────────┴────────┐
                │  Ollama         │   Modèles locaux
                │  (Gemma 4 ...)  │   Ryzen 9 7950X CPU / RTX Pro 6000 GPU
                └─────────────────┘

  Postgres (chiffré at-rest, LUKS + TPM 2.0)
  Audit logs centralisés (rétention configurable)
  Sauvegardes Restic chiffrées vers NAS interne

Comparison

LMbox vs cloud generative AI services

Criterion	LMbox (on-prem)	ChatGPT / Claude / Gemini cloud
Where does your data live?	In your datacenter	Vendor US/EU datacenters
Data in transit	Internal LAN, TLS	Internet to public cloud
Data at rest	Your disks, your keys	Vendor disks, their keys
Enterprise SSO	Native (OIDC/SAML)	Depends on plan
Encryption keys	Under your control	Managed by vendor
Audit logs	Complete, yours	Partial, their scope
GDPR legal basis	Internal processing (Article 6)	Subprocessing + DPA
Breach notification	You decide	Per vendor SLA
Vendor lock-in	None, you keep everything	High

Compliance

Our compliance roadmap

🇪🇺

GDPR

Native compliance by architecture (no transfers outside the EU, no subprocessing by default).

Compliant

🛡️

SOC 2 Type 1

Starting Q3 2026 via Vanta. Expected availability Q1 2027.

In progress

🌐

ISO 27001

Aligned with ISO 27001 controls from v1.0. Certification targeted Q3 2027.

Aligned

🏥

HDS (Health Data Hosting)

`lmbox-health` mode available: hardened configuration, enhanced audit logs. Certification Q4 2027.

Available

⚖️

EU AI Act readiness

Documentation compliant with AI Act requirements, transparency on models used, human control by default.

Compliant

Anti-hallucination

Four deterministic layers stop the agent from making things up.

« What if the AI just makes stuff up? » is the question from law firms and Compliance Officers. The honest answer isn't « our model is better ». All LLMs hallucinate - from Mistral-7B to GPT-4. The LMbox answer is architectural: four independent layers, each capable of blocking the hallucination, wired in parallel on every catalogue agent.

Layer A - Citation Verifier

Every reference emitted by the agent (Cassation ruling, Code article, statute, decree, EU regulation, case-file exhibit) is extracted by 11 regex families and verified against Légifrance + EUR-Lex live. Reference not found → CRITICAL. Malformed reference (« 12 jav 2024 ») → CRITICAL. 11 families, 7 French codes mapped, production-grade LRU cache + rate-limit.

LMbox command


            lmbox agent verify

Layer B - Runtime Guard

Real-time guardrail over the LLM stream: the moment a hallucinated citation completes in the output, generation is cancelled (strict) or annotated inline (annotate) or logged for observability (warn). No more waiting for the brief to finish to find the error - 200 ms after the bad token, the agent is stopped.

LMbox command


            lmbox agent run --guard strict

Layer C - Structured Output

Every agent declares a JSON Schema in its manifest (draft 2020-12). Output is validated + re-prompted if invalid (up to 2 retries). A 7-rule linter catches schema design bugs at write-time, before deployment. No more shape drift between agents and downstream pipelines.

LMbox command


            lmbox agent lint-schema --strict

Layer D - Source Grounding

Every identifier cited in the output (document_id, protocol_id, source_id) MUST come from a tool call this turn. If the agent claims to cite interne-2019-453 without having called search_dossiers_internes that returns it, it's rejected. The architectural layer that makes an invented source physically impossible.

LMbox command


            lmbox agent check-grounding

Concrete example

The agent invents « Cass. Com., 12 janvier 2024, n° 22-15.487 »

This ruling does not exist. But its shape is perfect: valid court, valid month, valid pourvoi number format. No human eye will catch it in a 30-page conclusion. Here's how the 4 layers catch it - each independently:

A Layer A calls Légifrance live, finds nothing → CRITICAL: external_not_found. Enough on its own to fail the golden suite and block the deployment.
B Layer B picks up Layer A's verdict during streaming: as soon as the citation is complete + followed by 40 trailing characters, the verifier pings Légifrance. On NOT_FOUND, generation stops. The lawyer never reads the tainted brief.
C Layer C requires (in the Conclusions Drafter schema) that any jurisprudence cited in the markdown also appears in cited_jurisprudence[] with a source_id. An inline citation without a metadata entry → schema invalid, repair loop, hard fail if not corrected.
D Layer D verifies the claimed source_id came from a tool call this turn. Without a call to search_jurisprudence that returned that identifier, the source_id is structurally false → blocked.

citation families verified (cassation, court of appeal, council of state, constitutional council, Code articles, statutes, decrees, EU regulations…)

Légifrance + EUR-Lex

live verification with LRU cache, token-bucket rate-limit, exponential retry

228

automated tests on the anti-hallucination stack (extractor, lookups, streaming, CLI dispatch)

Detailed architecture: the 4 ADRs (002 → 005) published on GitHub describe the design, trade-offs, failure modes and code references.

Read the ADRs

Journal d'audit chaîné

Chaque action du portail est cryptographiquement chaînée.

Chaque entrée du journal d'audit LMbox est hashée en SHA-256 avec le hash de l'entrée précédente. Le RSSI peut prouver à un auditeur SOC 2 ou à la CNIL qu'aucune ligne n'a été supprimée, modifiée, ou insérée - y compris par un administrateur LMbox.

Formule de chaînage

chain_hash[N] = SHA-256(
    chain_hash[N-1]
  || canonical(payload[N])
)

genesis = SHA-256(
    "lmbox.ai/audit-chain/v1"
  || "customer=" + customer.id
  || "created=" + customer.created_at
)

Détection d'insertion / suppression

Modifier ou supprimer une ligne brise le chaînage : toutes les entrées suivantes deviennent invalides au prochain verify_chain.
Genesis par client

Chaque tenant a sa propre chaîne - deux clients ne partagent jamais le même préfixe. Aucune fuite cross-tenant possible.
Vérifiable en 1 clic

Le RSSI clique « Vérifier la chaîne » dans le portail. LMbox re-walk les N entrées en quelques secondes et affiche un bandeau vert ou rouge - opposable.

Voir la chaîne d'audit en live : ouvrez la démo publique, cliquez « Journal » dans le portail, puis « Vérifier la chaîne ». 200 entrées chaînées en 3 secondes.

Ouvrir la démo

Connectors and trust boundary

Is connecting SharePoint, Salesforce, Jira to LMbox a sovereignty flaw?

No - but it's a topic where precision matters. LMbox does not move your data. It reads it where you already put it. Storage sovereignty depends on your earlier choice of SaaS vendor, made before LMbox came along. AI-model sovereignty, indexing and audit are what LMbox provides - and no one else can claim that for your existing data.

The 4 layers of an AI system - who controls what

Layer	Question	Who controls
1. Stockage	Where are the source documents?	Your earlier choice (M365, Salesforce, on-prem, …)
2. Indexation / RAG	Who reads them and embeds them?	LMbox - local, on-prem, on the box
3. Inférence IA	Where does the model run?	LMbox - local model, never an external API
4. Audit	Who keeps the trail?	LMbox - verifiable SHA-256 chain

Three typical scenarios

7/10

Sovereignty

Pure-SaaS connectors (Salesforce, HubSpot, Notion, Slack, Drive, Teams)

Your data is already in the vendor's cloud - a choice made before LMbox. We read it locally via OAuth, without moving anything. AI-side sovereignty, not storage-side.

8/10

Sovereignty

Hybrid connectors on cloud (SharePoint Online, Confluence Cloud, Jira Cloud)

The vendor offers a self-hosted edition, you chose cloud. LMbox reads locally. If storage sovereignty becomes critical, migrate to the self-hosted variant - LMbox supports it without changes.

10/10

Sovereignty

Hybrid connectors self-hosted (SharePoint Server, Confluence DC, GitLab self-managed)

Full on-prem stack end-to-end. No data leaves your DC. SecNumCloud, defence, strict HDS argument - defensible to an auditor with no caveats.

The 7 technical controls already in place

Credentials encrypted at rest

OAuth tokens for connectors are stored via Rails 8 attribute encryption (AES-GCM, key outside the DB). A Postgres dump never reveals a usable token.
Scrubbing after push

Once the credential reaches the box, the cloud-side `credentials` sub-field is wiped. The digest stays for future rotation; the plaintext value is gone.
Every access traced in the audit chain

Reading a SharePoint document creates a SHA-256-hashed audit entry. The CISO can re-walk the chain anytime and prove no read was hidden.
Outbound-only heartbeat

The box accepts no inbound connection. Every cloud → box command rides the outbound heartbeat the box initiates. No external attack surface.
Minimum scope on the App Registration

Explicit documentation per use case: for NDA Reviewer, `Sites.Selected` rather than `Sites.Read.All`. The integrator partner tunes the scope with the client.
Token rotation

Documented process for periodic rotation (90 days typical) and immediate revocation when a user leaves. Handled on the Azure AD / Google Workspace / etc. side.
Read-only RAG

No connector writes back to the source. No risk of malicious injection into a shared library. If an agent needs to write (0.5+), explicit human approval required.

Recommendation for maximum sovereignty: if you're in a regulated sector (defence, strict HDS, ACPR-sensitive banking), pick the self-hosted variants of hybrid connectors: SharePoint Server, Confluence Data Center, GitLab self-managed, Outlook/Exchange on-prem. LMbox supports them natively and you get the full on-prem stack. See the connector catalogue

Security or compliance question?

Our team can answer a CIO questionnaire, provide a detailed architecture dossier, or arrange a call with your CISO.

Talk to an expert

Security by design, not optional

100% on-premise

End-to-end encryption

Enterprise SSO

Audit logs

One Box, multiple layers

LMbox vs cloud generative AI services

Our compliance roadmap

Four deterministic layers stop the agent from making things up.

Layer A - Citation Verifier

Layer B - Runtime Guard

Layer C - Structured Output

Layer D - Source Grounding

The agent invents « Cass. Com., 12 janvier 2024, n° 22-15.487 »

Chaque action du portail est cryptographiquement chaînée.

Détection d'insertion / suppression

Genesis par client

Vérifiable en 1 clic

Is connecting SharePoint, Salesforce, Jira to LMbox a sovereignty flaw?

The 4 layers of an AI system - who controls what

Three typical scenarios

Pure-SaaS connectors (Salesforce, HubSpot, Notion, Slack, Drive, Teams)

Hybrid connectors on cloud (SharePoint Online, Confluence Cloud, Jira Cloud)

Hybrid connectors self-hosted (SharePoint Server, Confluence DC, GitLab self-managed)

The 7 technical controls already in place

Credentials encrypted at rest

Scrubbing after push

Every access traced in the audit chain

Outbound-only heartbeat

Minimum scope on the App Registration

Token rotation

Read-only RAG

Security or compliance question?