LMbox
Security

Security by design, not optional

LMbox architecture is designed first and foremost so your data never leaves. Here's how, concretely.

🏢

100% on-premise

No outbound connections by default. The Box operates within your LAN, air-gappable if needed. Your data NEVER transits through a third-party service.

🔐

End-to-end encryption

Internal TLS, LUKS on disk with TPM 2.0, Postgres encrypted at-rest, Restic encrypted backups. Keys stay with you, never with us.

👤

Enterprise SSO

Native Active Directory, Azure AD, Okta, Google Workspace integration via OIDC. Authentik front-end for fine-grained group-based authorization.

📋

Audit logs

Every interaction traced and signed: who asked what, when, to which model, with which sources. Configurable retention per your policy.

Architecture

One Box, multiple layers

┌──────────────────────────────────────────────────────┐
│  Internet                            ❌ pas d'accès  │
└──────────────────────────────────────────────────────┘
                         │
                ┌────────┴────────┐
                │  Firewall LAN   │   Port 443 entrant uniquement
                └────────┬────────┘
                         │
       ┌─────────────────┴─────────────────┐
       │  Reverse proxy + TLS (Caddy)      │   Cert. CA entreprise
       └─────────────────┬─────────────────┘
                         │
       ┌─────────────────┴─────────────────┐
       │  SSO Authentik   (OIDC vers AD)   │   Auth + autorisation
       └─────────────────┬─────────────────┘
                         │
   ┌──────────────┬──────┴──────┬──────────────┐
   │ Open WebUI   │ LiteLLM     │ lmbox-rag    │
   │ (chat UI)    │ (API gw)    │ (RAG, KB)    │
   └──────┬───────┴──────┬──────┴──────┬───────┘
          │              │             │
          └──────────────┴─────────────┘
                         │
                ┌────────┴────────┐
                │  Ollama         │   Modèles locaux
                │  (Gemma 4 ...)  │   Ryzen 9 7950X CPU / RTX Pro 6000 GPU
                └─────────────────┘

  Postgres (chiffré at-rest, LUKS + TPM 2.0)
  Audit logs centralisés (rétention configurable)
  Sauvegardes Restic chiffrées vers NAS interne
Comparison

LMbox vs cloud generative AI services

Criterion LMbox (on-prem) ChatGPT / Claude / Gemini cloud
Where does your data live? In your datacenter Vendor US/EU datacenters
Data in transit Internal LAN, TLS Internet to public cloud
Data at rest Your disks, your keys Vendor disks, their keys
Enterprise SSO Native (OIDC/SAML) Depends on plan
Encryption keys Under your control Managed by vendor
Audit logs Complete, yours Partial, their scope
GDPR legal basis Internal processing (Article 6) Subprocessing + DPA
Breach notification You decide Per vendor SLA
Vendor lock-in None, you keep everything High
Compliance

Our compliance roadmap

🇪🇺
GDPR
Native compliance by architecture (no transfers outside the EU, no subprocessing by default).
Compliant
🛡️
SOC 2 Type 1
Starting Q3 2026 via Vanta. Expected availability Q1 2027.
In progress
🌐
ISO 27001
Aligned with ISO 27001 controls from v1.0. Certification targeted Q3 2027.
Aligned
🏥
HDS (Health Data Hosting)
`lmbox-health` mode available: hardened configuration, enhanced audit logs. Certification Q4 2027.
Available
⚖️
EU AI Act readiness
Documentation compliant with AI Act requirements, transparency on models used, human control by default.
Compliant
Anti-hallucination

Four deterministic layers stop the agent from making things up.

« What if the AI just makes stuff up? » is the question from law firms and Compliance Officers. The honest answer isn't « our model is better ». All LLMs hallucinate - from Mistral-7B to GPT-4. The LMbox answer is architectural: four independent layers, each capable of blocking the hallucination, wired in parallel on every catalogue agent.

A

Layer A - Citation Verifier

Every reference emitted by the agent (Cassation ruling, Code article, statute, decree, EU regulation, case-file exhibit) is extracted by 11 regex families and verified against Légifrance + EUR-Lex live. Reference not found → CRITICAL. Malformed reference (« 12 jav 2024 ») → CRITICAL. 11 families, 7 French codes mapped, production-grade LRU cache + rate-limit.

LMbox command
lmbox agent verify
B

Layer B - Runtime Guard

Real-time guardrail over the LLM stream: the moment a hallucinated citation completes in the output, generation is cancelled (strict) or annotated inline (annotate) or logged for observability (warn). No more waiting for the brief to finish to find the error - 200 ms after the bad token, the agent is stopped.

LMbox command
lmbox agent run --guard strict
C

Layer C - Structured Output

Every agent declares a JSON Schema in its manifest (draft 2020-12). Output is validated + re-prompted if invalid (up to 2 retries). A 7-rule linter catches schema design bugs at write-time, before deployment. No more shape drift between agents and downstream pipelines.

LMbox command
lmbox agent lint-schema --strict
D

Layer D - Source Grounding

Every identifier cited in the output (document_id, protocol_id, source_id) MUST come from a tool call this turn. If the agent claims to cite interne-2019-453 without having called search_dossiers_internes that returns it, it's rejected. The architectural layer that makes an invented source physically impossible.

LMbox command
lmbox agent check-grounding
Concrete example

The agent invents « Cass. Com., 12 janvier 2024, n° 22-15.487 »

This ruling does not exist. But its shape is perfect: valid court, valid month, valid pourvoi number format. No human eye will catch it in a 30-page conclusion. Here's how the 4 layers catch it - each independently:

  • A Layer A calls Légifrance live, finds nothing → CRITICAL: external_not_found. Enough on its own to fail the golden suite and block the deployment.
  • B Layer B picks up Layer A's verdict during streaming: as soon as the citation is complete + followed by 40 trailing characters, the verifier pings Légifrance. On NOT_FOUND, generation stops. The lawyer never reads the tainted brief.
  • C Layer C requires (in the Conclusions Drafter schema) that any jurisprudence cited in the markdown also appears in cited_jurisprudence[] with a source_id. An inline citation without a metadata entry → schema invalid, repair loop, hard fail if not corrected.
  • D Layer D verifies the claimed source_id came from a tool call this turn. Without a call to search_jurisprudence that returned that identifier, the source_id is structurally false → blocked.
11
citation families verified (cassation, court of appeal, council of state, constitutional council, Code articles, statutes, decrees, EU regulations…)
Légifrance + EUR-Lex
live verification with LRU cache, token-bucket rate-limit, exponential retry
228
automated tests on the anti-hallucination stack (extractor, lookups, streaming, CLI dispatch)

Detailed architecture: the 4 ADRs (002 → 005) published on GitHub describe the design, trade-offs, failure modes and code references.

Read the ADRs
Journal d'audit chaîné

Chaque action du portail est cryptographiquement chaînée.

Chaque entrée du journal d'audit LMbox est hashée en SHA-256 avec le hash de l'entrée précédente. Le RSSI peut prouver à un auditeur SOC 2 ou à la CNIL qu'aucune ligne n'a été supprimée, modifiée, ou insérée - y compris par un administrateur LMbox.

Formule de chaînage
chain_hash[N] = SHA-256(
    chain_hash[N-1]
  || canonical(payload[N])
)

genesis = SHA-256(
    "lmbox.ai/audit-chain/v1"
  || "customer=" + customer.id
  || "created=" + customer.created_at
)
  • Détection d'insertion / suppression

    Modifier ou supprimer une ligne brise le chaînage : toutes les entrées suivantes deviennent invalides au prochain verify_chain.

  • Genesis par client

    Chaque tenant a sa propre chaîne - deux clients ne partagent jamais le même préfixe. Aucune fuite cross-tenant possible.

  • Vérifiable en 1 clic

    Le RSSI clique « Vérifier la chaîne » dans le portail. LMbox re-walk les N entrées en quelques secondes et affiche un bandeau vert ou rouge - opposable.

Voir la chaîne d'audit en live : ouvrez la démo publique, cliquez « Journal » dans le portail, puis « Vérifier la chaîne ». 200 entrées chaînées en 3 secondes.

Ouvrir la démo
Connectors and trust boundary

Is connecting SharePoint, Salesforce, Jira to LMbox a sovereignty flaw?

No - but it's a topic where precision matters. LMbox does not move your data. It reads it where you already put it. Storage sovereignty depends on your earlier choice of SaaS vendor, made before LMbox came along. AI-model sovereignty, indexing and audit are what LMbox provides - and no one else can claim that for your existing data.

The 4 layers of an AI system - who controls what

Layer Question Who controls
1. Stockage Where are the source documents? Your earlier choice (M365, Salesforce, on-prem, …)
2. Indexation / RAG Who reads them and embeds them? LMbox - local, on-prem, on the box
3. Inférence IA Where does the model run? LMbox - local model, never an external API
4. Audit Who keeps the trail? LMbox - verifiable SHA-256 chain

Three typical scenarios

7/10
Sovereignty

Pure-SaaS connectors (Salesforce, HubSpot, Notion, Slack, Drive, Teams)

Your data is already in the vendor's cloud - a choice made before LMbox. We read it locally via OAuth, without moving anything. AI-side sovereignty, not storage-side.

8/10
Sovereignty

Hybrid connectors on cloud (SharePoint Online, Confluence Cloud, Jira Cloud)

The vendor offers a self-hosted edition, you chose cloud. LMbox reads locally. If storage sovereignty becomes critical, migrate to the self-hosted variant - LMbox supports it without changes.

10/10
Sovereignty

Hybrid connectors self-hosted (SharePoint Server, Confluence DC, GitLab self-managed)

Full on-prem stack end-to-end. No data leaves your DC. SecNumCloud, defence, strict HDS argument - defensible to an auditor with no caveats.

The 7 technical controls already in place

  • Credentials encrypted at rest

    OAuth tokens for connectors are stored via Rails 8 attribute encryption (AES-GCM, key outside the DB). A Postgres dump never reveals a usable token.

  • Scrubbing after push

    Once the credential reaches the box, the cloud-side `credentials` sub-field is wiped. The digest stays for future rotation; the plaintext value is gone.

  • Every access traced in the audit chain

    Reading a SharePoint document creates a SHA-256-hashed audit entry. The CISO can re-walk the chain anytime and prove no read was hidden.

  • Outbound-only heartbeat

    The box accepts no inbound connection. Every cloud → box command rides the outbound heartbeat the box initiates. No external attack surface.

  • Minimum scope on the App Registration

    Explicit documentation per use case: for NDA Reviewer, `Sites.Selected` rather than `Sites.Read.All`. The integrator partner tunes the scope with the client.

  • Token rotation

    Documented process for periodic rotation (90 days typical) and immediate revocation when a user leaves. Handled on the Azure AD / Google Workspace / etc. side.

  • Read-only RAG

    No connector writes back to the source. No risk of malicious injection into a shared library. If an agent needs to write (0.5+), explicit human approval required.

Recommendation for maximum sovereignty: if you're in a regulated sector (defence, strict HDS, ACPR-sensitive banking), pick the self-hosted variants of hybrid connectors: SharePoint Server, Confluence Data Center, GitLab self-managed, Outlook/Exchange on-prem. LMbox supports them natively and you get the full on-prem stack. See the connector catalogue

Security or compliance question?

Our team can answer a CIO questionnaire, provide a detailed architecture dossier, or arrange a call with your CISO.

Talk to an expert