# Storage Module (`lib/storage`) The `lib/storage` module is the **single source of truth** for reading delivery note metadata from the NAS share. All code that needs to **list** branches/years/months/days/files on the NAS should go through this module instead of using Node.js `fs` directly. This keeps filesystem listing logic centralized and makes it easier to change conventions later. > Note: > > - Binary streaming endpoints (e.g. PDF streaming) may use direct filesystem streaming APIs (`fs.createReadStream`) for performance and memory safety. > - Such endpoints must still follow the same safety rules (validated segments, no path traversal) and the same error mapping conventions (`lib/api/storageErrors.js`). --- ## 1. High-Level Responsibilities - Resolve paths under the NAS root (`NAS_ROOT_PATH`). - Provide intention-revealing helpers: - `listBranches()` → `['NL01', 'NL02', ...]` - `listYears(branch)` → `['2023', '2024', ...]` - `listMonths(branch, year)` → `['01', '02', ...]` - `listDays(branch, year, month)` → `['01', '02', ...]` - `listFiles(branch, year, month, day)` → `[{ name, relativePath }, ...]` - Enforce **read-only** behavior. - Use async filesystem APIs (`fs/promises`). --- ## 2. Environment Configuration ### 2.1 `NAS_ROOT_PATH` (required) The module depends on: - `NAS_ROOT_PATH` — absolute Unix path where the NAS share is mounted **inside the app container**. Default/typical value: ```env NAS_ROOT_PATH=/mnt/niederlassungen ``` Important: - `lib/storage` reads `process.env.NAS_ROOT_PATH` on demand and does not cache it at module load. - If `NAS_ROOT_PATH` is missing, `lib/storage` throws (fail fast). --- ## 3. Docker Mount Strategy (Local vs Server) The application code always expects the NAS path **inside the container** to be: - `/mnt/niederlassungen` Which host folder is mounted there is an environment concern: - **Server (`docker-compose.yml`)** mounts the real NAS: ```yaml volumes: - /mnt/niederlassungen:/mnt/niederlassungen:ro ``` - **Local development (`docker-compose.local.yml`)** mounts a local fixture folder: ```yaml volumes: - ./.local_nas:/mnt/niederlassungen:ro ``` This separation keeps code identical across environments while allowing safe local testing. --- ## 4. Directory Layout Assumptions `lib/storage` assumes the following structure under `NAS_ROOT_PATH`: ```text NAS_ROOT_PATH/ @Recently-Snapshot/ # ignored NL01/ 2024/ 10/ 23/ file1.pdf file2.pdf ... ``` Rules: - Branch directories follow `NL` (e.g. `NL01`). - Year directories are 4-digit numeric (`2024`). - Month/day directories are numeric and normalized to **two digits**. - Only `.pdf` files are returned by `listFiles()`. --- ## 5. Error Handling ### 5.1 Storage layer behavior `lib/storage` does not swallow errors: - If a folder does not exist or is not accessible, `fs.promises.readdir` throws. - `lib/storage` remains intentionally small and focused on filesystem reads. ### 5.2 API-level mapping API routes map filesystem errors into standardized HTTP responses: - If a requested path does not exist (e.g. `ENOENT`) and the NAS root is accessible: - `404` with `FS_NOT_FOUND` - If the NAS root itself is missing/unreachable or other unexpected filesystem errors occur: - `500` with `FS_STORAGE_ERROR` This mapping is implemented in `lib/api/storageErrors.js` and used by route handlers. --- ## 6. Caching & Freshness (RHL-006) The NAS content can change at any time (new scans). To reduce filesystem load while keeping freshness predictable, `lib/storage` implements a small process-local TTL micro-cache. ### 6.1 TTLs - `listBranches()` / `listYears()` → **60 seconds** - `listMonths()` / `listDays()` / `listFiles()` → **15 seconds** ### 6.2 Semantics - TTL is a **maximum staleness** guarantee. - New files may appear immediately. - New files must appear after TTL expires. - Cache is **process-local**. - If the app is scaled to multiple instances, each instance maintains its own cache. - Cache keys include `NAS_ROOT_PATH`. - This avoids cross-environment/test pollution when the env var changes. ### 6.3 Testing - Unit tests clear the storage cache between tests via a test-only helper. - A TTL test verifies “stable within TTL” and “refresh after TTL”. --- ## 7. File streaming endpoints (PDF delivery) The storage module currently focuses on **listing** directory contents. For endpoints that must return **binary file data** (PDF streaming/download), a direct stream approach is preferred: - **Do not** read the whole PDF into memory. - Use `fs.stat()` first (for existence/type) and then `fs.createReadStream()`. ### 7.1 Security rules A streaming endpoint must never accept arbitrary paths. Rules: - Build the absolute file path from: - `NAS_ROOT_PATH` - validated route segments (`branch`, `year`, `month`, `day`) - validated filename (`filename`) - Validate route segments using strict patterns: - `branch`: `^NL\d+$` - `year`: `^\d{4}$` - `month`: `01–12` - `day`: `01–31` - Validate filename: - must be a simple file name (no `/`, `\`, or `..` segments) - only `.pdf` is allowed - Apply a root containment check: - after resolving the absolute path, ensure the resolved path stays within `NAS_ROOT_PATH` ### 7.2 Error mapping rules Even when the happy-path response is binary, **errors must remain standardized JSON**. Recommended approach: 1. `stat(absPath)` 2. If it throws: - map via `mapStorageReadError(err, { details })` 3. If `stat` succeeds but `!stat.isFile()`: - return `404 FS_NOT_FOUND` ### 7.3 HTTP headers For PDF streaming: - `Content-Type: application/pdf` - `Content-Disposition: inline; filename="..."` (default) - `Content-Disposition: attachment; filename="..."` (when `download=1`) - `Cache-Control: no-store` --- ## 8. Future extensions Potential follow-ups for the storage layer: - A dedicated helper for streaming files (e.g. `openPdfStream(...)`) that centralizes: - strict validation - safe path construction - `stat()` + stream creation - consistent error details This is optional; the current v1 design keeps `lib/storage` focused on listing operations.