storage.md 6.4 KB

Storage Module (lib/storage)

The lib/storage module is the single source of truth for reading delivery note metadata from the NAS share.

All code that needs to list branches/years/months/days/files on the NAS should go through this module instead of using Node.js fs directly. This keeps filesystem listing logic centralized and makes it easier to change conventions later.

Note:

  • Binary streaming endpoints (e.g. PDF streaming) may use direct filesystem streaming APIs (fs.createReadStream) for performance and memory safety.
  • Such endpoints must still follow the same safety rules (validated segments, no path traversal) and the same error mapping conventions (lib/api/storageErrors.js).

1. High-Level Responsibilities

  • Resolve paths under the NAS root (NAS_ROOT_PATH).

  • Provide intention-revealing helpers:

    • listBranches()['NL01', 'NL02', ...]
    • listYears(branch)['2023', '2024', ...]
    • listMonths(branch, year)['01', '02', ...]
    • listDays(branch, year, month)['01', '02', ...]
    • listFiles(branch, year, month, day)[{ name, relativePath }, ...]
  • Enforce read-only behavior.

  • Use async filesystem APIs (fs/promises).


2. Environment Configuration

2.1 NAS_ROOT_PATH (required)

The module depends on:

  • NAS_ROOT_PATH — absolute Unix path where the NAS share is mounted inside the app container.

Default/typical value:

NAS_ROOT_PATH=/mnt/niederlassungen

Important:

  • lib/storage reads process.env.NAS_ROOT_PATH on demand and does not cache it at module load.
  • If NAS_ROOT_PATH is missing, lib/storage throws (fail fast).

3. Docker Mount Strategy (Local vs Server)

The application code always expects the NAS path inside the container to be:

  • /mnt/niederlassungen

Which host folder is mounted there is an environment concern:

  • Server (docker-compose.yml) mounts the real NAS:

    volumes:
    - /mnt/niederlassungen:/mnt/niederlassungen:ro
    
  • Local development (docker-compose.local.yml) mounts a local fixture folder:

    volumes:
    - ./.local_nas:/mnt/niederlassungen:ro
    

This separation keeps code identical across environments while allowing safe local testing.


4. Directory Layout Assumptions

lib/storage assumes the following structure under NAS_ROOT_PATH:

NAS_ROOT_PATH/
  @Recently-Snapshot/   # ignored
  NL01/
    2024/
      10/
        23/
          file1.pdf
          file2.pdf
  ...

Rules:

  • Branch directories follow NL<Number> (e.g. NL01).
  • Year directories are 4-digit numeric (2024).
  • Month/day directories are numeric and normalized to two digits.
  • Only .pdf files are returned by listFiles().

5. Error Handling

5.1 Storage layer behavior

lib/storage does not swallow errors:

  • If a folder does not exist or is not accessible, fs.promises.readdir throws.
  • lib/storage remains intentionally small and focused on filesystem reads.

5.2 API-level mapping

API routes map filesystem errors into standardized HTTP responses:

  • If a requested path does not exist (e.g. ENOENT) and the NAS root is accessible:

    • 404 with FS_NOT_FOUND
  • If the NAS root itself is missing/unreachable or other unexpected filesystem errors occur:

    • 500 with FS_STORAGE_ERROR

This mapping is implemented in lib/api/storageErrors.js and used by route handlers.


6. Caching & Freshness (RHL-006)

The NAS content can change at any time (new scans). To reduce filesystem load while keeping freshness predictable, lib/storage implements a small process-local TTL micro-cache.

6.1 TTLs

  • listBranches() / listYears()60 seconds
  • listMonths() / listDays() / listFiles()15 seconds

6.2 Semantics

  • TTL is a maximum staleness guarantee.

    • New files may appear immediately.
    • New files must appear after TTL expires.
  • Cache is process-local.

    • If the app is scaled to multiple instances, each instance maintains its own cache.
  • Cache keys include NAS_ROOT_PATH.

    • This avoids cross-environment/test pollution when the env var changes.

6.3 Testing

  • Unit tests clear the storage cache between tests via a test-only helper.
  • A TTL test verifies “stable within TTL” and “refresh after TTL”.

7. File streaming endpoints (PDF delivery)

The storage module currently focuses on listing directory contents.

For endpoints that must return binary file data (PDF streaming/download), a direct stream approach is preferred:

  • Do not read the whole PDF into memory.
  • Use fs.stat() first (for existence/type) and then fs.createReadStream().

7.1 Security rules

A streaming endpoint must never accept arbitrary paths.

Rules:

  • Build the absolute file path from:

    • NAS_ROOT_PATH
    • validated route segments (branch, year, month, day)
    • validated filename (filename)
  • Validate route segments using strict patterns:

    • branch: ^NL\d+$
    • year: ^\d{4}$
    • month: 01–12
    • day: 01–31
  • Validate filename:

    • must be a simple file name (no /, \, or .. segments)
    • only .pdf is allowed
  • Apply a root containment check:

    • after resolving the absolute path, ensure the resolved path stays within NAS_ROOT_PATH

7.2 Error mapping rules

Even when the happy-path response is binary, errors must remain standardized JSON.

Recommended approach:

  1. stat(absPath)

  2. If it throws:

    • map via mapStorageReadError(err, { details })
  3. If stat succeeds but !stat.isFile():

    • return 404 FS_NOT_FOUND

7.3 HTTP headers

For PDF streaming:

  • Content-Type: application/pdf
  • Content-Disposition: inline; filename="..." (default)
  • Content-Disposition: attachment; filename="..." (when download=1)
  • Cache-Control: no-store

8. Future extensions

Potential follow-ups for the storage layer:

  • A dedicated helper for streaming files (e.g. openPdfStream(...)) that centralizes:

    • strict validation
    • safe path construction
    • stat() + stream creation
    • consistent error details

This is optional; the current v1 design keeps lib/storage focused on listing operations.