Diksuchi-AI: Secure, Offline Document Intelligence for Defense and Aerospace

December 16, 2025

A Secure, Offline Document Intelligence Platform for Defense and Aerospace Systems

Whitepaper


Executive Summary

Defense and aerospace organizations operate in environments where data sovereignty, operational security, and system reliability are non-negotiable. Technical documentation—maintenance manuals, engineering procedures, safety bulletins, and system specifications—forms the backbone of mission readiness. Yet extracting actionable intelligence from these documents remains difficult due to security restrictions, offline requirements, and organizational compartmentalization.

Diksuchi-AI is a privacy-first, offline document intelligence platform designed specifically for such environments. It enables secure, multi-tenant retrieval-augmented generation (RAG) over sensitive technical documents while guaranteeing strict data isolation, air-gapped operation, and multilingual, voice-enabled access.

This whitepaper presents the architecture, design principles, and operational advantages of Diksuchi-AI for defense and aerospace deployments.


1. Operational Context and Problem Statement

1.1 Defense Documentation Challenges

Defense organizations manage vast volumes of documentation, including:

  • S1000D technical manuals
  • Maintenance and overhaul procedures
  • Engineering change orders
  • Safety advisories and fault isolation guides

These documents are often:

  • Highly classified or export-controlled
  • Stored across multiple programs and organizations
  • Accessed in bandwidth-constrained or offline environments

1.2 Limitations of Existing Solutions

Conventional document intelligence platforms face critical limitations:

Cloud dependency - Unusable in air-gapped or restricted networks

Shared indices - Risk of cross-program data leakage

Monolithic architectures - Poor scalability and auditability

Single-mode retrieval - Low accuracy on technical content

Limited language/voice support - Reduced usability in field operations

Defense deployments require architectural guarantees, not policy-based controls.


2. Design Principles

Diksuchi-AI is built on the following core principles:

  1. Offline-first operation No external network or cloud dependency

  2. Isolation by architecture Data boundaries enforced at the storage and retrieval layers

  3. Hybrid intelligence Combining semantic, lexical, and relational understanding

  4. Modular services Independent scaling, auditing, and failure containment

  5. Operational usability Multilingual and voice-based interaction in real environments


3. System Architecture Overview

Diksuchi-AI employs a four-service microservices architecture, optimized for secure on-premise deployment.

3.1 Web Application Layer

Next.js 16 + React 19

Responsibilities:

  • Multi-tenant authentication and authorization
  • Organization and collection context management
  • Document ingestion orchestration
  • Secure chat interface
Port 3000
├── Multi-tenant authentication
├── PostgreSQL-backed access control
└── API orchestration layer

This layer does not access document embeddings or indices directly.


3.2 Document Intelligence (RAG) Service

Python FastAPI

Responsibilities:

  • Parsing of PDF and S1000D XML documents
  • Hybrid retrieval execution
  • Knowledge graph traversal
  • Background ingestion and indexing
Port 5001
├── Vector search (ChromaDB)
├── BM25 keyword indices
├── Knowledge graphs
└── Redis-based async workers

This separation enables secure evolution of AI components without affecting user interfaces.


3.3 Speech-to-Text Service

OpenAI Whisper

  • Multilingual speech recognition supporting 99+ languages
  • Enables hands-free queries in maintenance and operational settings
  • Supports both cloud (OpenAI API) and on-prem (whisper.cpp) deployments
Port 8080

3.4 Text-to-Speech Service

ParlerTTS

  • Supports 18+ Indian languages
  • On-prem inference with no data egress
Port 8002

4. Multi-Tenant Data Isolation Model

4.1 Collection-Level Isolation

Each document collection represents a hard security boundary.

For every collection, Diksuchi-AI maintains:

  • A dedicated vector store
  • A dedicated BM25 keyword index
  • A dedicated knowledge graph
Organization / Program
├── Collection A
   ├── Vector DB
   ├── BM25 Index
   └── Knowledge Graph
├── Collection B
└── Collection C

4.2 Security Implications

  • Retrieval requests must specify a collection ID
  • Cross-collection queries are impossible by design
  • Deletion equals namespace removal, simplifying compliance

This model aligns with defense compartmentalization practices.


5. Hybrid Retrieval Strategy

Technical documentation requires multiple retrieval approaches.

5.1 Semantic Vector Retrieval

  • Uses BGE-M3 multilingual embeddings
  • Captures conceptual meaning and paraphrased queries

5.2 Lexical Keyword Retrieval (BM25)

  • Precise matching for part numbers, procedures, warnings
  • Critical for fault isolation and maintenance steps

5.3 Knowledge Graph Expansion

  • Entity extraction during ingestion
  • Enables relationship-driven discovery across procedures and components

5.4 Cross-Encoder Reranking

  • Optional reranking using BAAI/bge-reranker-v2-m3
  • Improves precision for mission-critical queries

6. Conversational and Contextual Intelligence

Diksuchi-AI supports multi-turn technical conversations:

  • Context-aware follow-up questions
  • Conversation window tracking (default: last 3 turns)
  • Optional query reformulation agents

This enables natural interaction without repeated restatement of context.


7. Technology Stack (Offline-Optimized)

PostgreSQL 16 - Reliable, auditable relational storage

Redis 8.4 - Job orchestration and caching

ChromaDB - Lightweight offline vector database

LM Studio - Local LLM inference

BGE-M3 - Multilingual technical embeddings

OpenAI Whisper - Multilingual speech-to-text (99+ languages)

ParlerTTS - On-prem speech synthesis

All components support both cloud and on-premise deployment options.


8. Deployment and Scaling Model

Diksuchi-AI supports horizontal scaling:

  • Multiple RAG workers via Redis Queue
  • Stateless web nodes behind load balancers
  • Independent service scaling
  • PostgreSQL replication for high availability

Scaling is targeted, not monolithic.


9. Defense and Aerospace Use Cases

9.1 Maintenance and Repair Operations (MRO)

  • Rapid retrieval of fault isolation steps
  • Voice-driven queries in hangars and field environments

9.2 Program-Level Documentation Isolation

  • Separate collections per aircraft, system, or program
  • Shared infrastructure without shared data

9.3 Training and Knowledge Transfer

  • Conversational access to legacy manuals
  • Multilingual support for diverse personnel

10. Security and Compliance Alignment

Diksuchi-AI aligns with:

  • Air-gapped deployment requirements
  • Data sovereignty mandates
  • Compartmentalized access models
  • Audit-friendly data lifecycle management

Security is enforced structurally, not procedurally.


11. Roadmap and Future Enhancements

  • Domain-specific embedding fine-tuning
  • Advanced query reformulation agents
  • Usage analytics for documentation gaps
  • Collaborative workflows with access controls
  • Expanded language and dialect support

12. Conclusion

Diksuchi-AI demonstrates that defense-grade document intelligence can be achieved without cloud dependence or security compromise.

By enforcing isolation at the architectural level, embracing hybrid retrieval, and designing for offline environments, the platform delivers:

  • Operational reliability
  • Strong security guarantees
  • Scalable intelligence across programs

Diksuchi-AI provides a modern foundation for AI-assisted technical knowledge systems in defense and aerospace domains.


Intended Audience

  • Defense ministries and armed forces
  • DRDO / DPSU engineering teams
  • Aerospace OEMs and MRO providers
  • Secure system integrators