Mental Model
Anything that calls NovaLogger during a request automatically merges CLS context (correlation ID, user, geo, route) into every Winston log line. Separately, AlertDispatcherService reacts to error/fatal-level logs via AlertTransport to email, Monday.com, and signature-based deduplication.
Purpose
Detects scanners and automated attacks by trapping requests to known exploit paths before they reach any real route handler. Applied globally via LoggerModule.configure — runs first, before CorrelationMiddleware.
Behavior
Compares req.path.toLowerCase() against the HONEYPOT_PATHS set (e.g. /.env, /.git, /wp-admin, /phpmyadmin, /actuator). On match: logs a warn with tags HONEYPOT_HIT, SECURITY, plus honeypotPath, ip, userAgent, method. Responds 404 with { success: false, data: null, error: 'Not Found' }. Does not call next() — request ends here.
Steps (in order)
1. IP: reads x-forwarded-for first segment, else req.socket.remoteAddress.
2. Sets CLS: ip, userAgent, route (originalUrl), method.
3. Geo: GeoIpService.lookup(ip) → sets geoCountry, geoCity, geoFlag.
4. JWT: If Authorization: Bearer present — verifies with JWT_SECRET. On success sets userId, studentId, isGhostAdmin. On failure: silent (no throw).
5. Entity IDs: EntityContextService.extract(req) → sets entityIds on CLS if non-empty.
6. Calls next().
Fields extracted from JWT payload
| CLS Field | JWT Source |
|---|---|
userId | payload.userId or payload.sub |
studentId | payload.studentId |
isGhostAdmin | payload.ghostMode === true |
Resolution logic
Uses geoip-lite (offline DB). Normalizes IPv4-mapped IPv6 addresses (::ffff:x). Private IPs return { country: 'LOCAL', city: 'localhost', flag: '🏠' }. Public IPs return ISO country, city, and a flag emoji derived from regional indicator codepoints.
Fields on the CLS store
| Field | Meaning |
|---|---|
correlationId | UUID per request (set at CLS mount in AppModule) |
requestStartTime | Date.now() at request start |
userId | From JWT userId or sub |
studentId | From JWT studentId |
geoCountry / geoCity / geoFlag | From GeoIpService |
ip / userAgent / route / method | Request metadata |
isGhostAdmin | true if JWT has ghostMode: true |
entityIds | Map of extracted route/query IDs |
isBlackBoxTarget | For forcing DEBUG verbosity on a user |
Log levels & methods
| Method | Level | Use |
|---|---|---|
log / info | info (3) | General information |
warn | warn (2) | Warnings, shadow exceptions, security |
error | error (1) | Errors — triggers alert transport |
fatal | fatal (0) | Catastrophic — triggers all alert channels |
debug | debug (4) | Verbose diagnostics |
system | system (2) | Boot, deployment, operational banners |
Automatic context merging
Every write() call builds LogContext via getContext() and passes it to Winston. This means correlationId, userId, geo, route and all CLS fields are merged automatically — you never need to pass them manually when logging inside a request.
setLevel(level) mutates Winston level at runtime — used by the Discord /logs-level slash command.Transports (all attached to main logger)
| Transport | Details |
|---|---|
| Console | consoleFormat with chalk colors per level |
| Daily Rotate File | logs/nova-%DATE%.log · max 20MB · 30-day retention · zipped archive |
| Error-only File | logs/nova-errors-%DATE%.log · error level only · 60-day retention |
| DiscordTransport | Silent if no DISCORD_BOT_TOKEN. Buffers 100 items until channel resolver is set |
| MongoDbAuditTransport | Buffers 200 entries. Note: modelResolver may be unset — logs may not persist until bound |
| AlertTransport | level: error. Invokes AlertDispatcherService for error/fatal events |
exitOnError: false, handleExceptions: true, handleRejections: true — uncaught errors/rejections are logged instead of crashing the process.Behavior
Skips non-HTTP contexts. On first request: calls BootBannerService.recordFirstRequest() to record TTFR on the deployment fingerprint.
| Event | Log level | Tags |
|---|---|---|
| Request in | info | REQUEST_IN |
| Response out (success) | info | RESPONSE_OUT + durationMs + statusCode |
| Slow request (>500ms) | warn | 🐢 SLOW_REQUEST |
| Response error | error | RESPONSE_ERROR |
AllExceptionsFilter (500+)
Catches everything for HTTP. Status from HttpException or 500. If status ≥ 500: logger.fatal with tags UNHANDLED_EXCEPTION, FATAL, stack, path, method. Response: { success: false, data: null, error: message }.
DomainExceptionFilter (4xx — "Shadow")
Catches only HttpException. For status 400–499: logger.warn with 👻 [SHADOW], tags SHADOW_EXCEPTION, full stack. Implements "shadow tracking" — client errors are tracked without treating them as server crashes.
Frontend log ingestion pipeline
1. Validates payload via Zod (clientLogPayloadSchema): level, message (1–2000 chars), optional stack, context, rrwebEvents (max 500).
2. PiiRedactorService.redact(payload) on full payload.
3. Logs via NovaLogger at appropriate level with tags CLIENT_LOG.
4. If rrwebEvents non-empty and userId set: stores session via RrwebSessionService (in-memory Map, max 500 sessions).
5. Returns { correlationId: "client-{Date.now()}", replayUrl }.
/api/logs/replay/:id route does not exist yet; replayUrl is a future contract.Services
| Service | Role |
|---|---|
| DiscordBotService | Bot login, env-aware channel resolution (dev vs prod). Sets channel resolver on DiscordTransport. |
| DiscordFeedService | sendLog (drops if isGhostAdmin), sendEmbed, sendAlert to configured channels. |
| DiscordDashboardService | setInterval 30s → updates Dependency Health + Traffic Hub embeds. |
| DiscordIncidentsService | Alert embeds with Acknowledge / Mute 1hr / Create Ticket buttons. |
Slash Commands
| Command | Access | What it does |
|---|---|---|
/server-health | public | StateSnapshot + deployment fingerprint embed |
/logs-level | owner | NovaLogger.setLevel() at runtime |
/logs-sample | owner | Updates in-memory sampling rate |
/debug-user | devops | BlackBoxService.activate(userId) for 10 min |
/user-trace | devops | AuditService.getByActor() — last 10 actions |
/export-trace | devops | getRecent(200) filtered by correlationId → JSON file |
/run-diagnostics | devops | Parallel: Mongo ping, Redis, Discord WS, Resend, Monday, OpenAI |
Schema — collection: audit_logs
| Field | Type | Notes |
|---|---|---|
action | string (required) | Stable action string e.g. logs.level_changed |
actor | object | type: user|admin|discord|system · id · name? |
metadata | object | default {} |
source | enum | api | discord | system |
correlationId | string? | optional |
createdAt | Date | TTL index: 90 days auto-delete |
Indexes
actor.id + createdAt · action + createdAt · TTL on createdAt (expireAfterSeconds: 7,776,000)
How it works
Call recordRequest() to increment a per-minute counter. The @Cron(EVERY_MINUTE) checkPulse method: rolls the minute bucket, skips if fewer than 10 buckets, computes baseline as average of all buckets, compares latest minute to baseline.
If deviation > 40%: logs 📈 SPIKE or 📉 DROP with tag PULSE_ANOMALY. Keeps up to 1440 snapshots (~24h) in history.
dispatch(alert) flow
1. Key = alert.errorSignature ?? alert.title.
2. rateLimiter.shouldAlert(signature) — if false, muted.
3. If severity === CRITICAL: stateSnapshot.capture('fatal').
4. Expand ALL channels → Discord + Email + Monday.
5. Parallel Promise.allSettled: Resend (email), Monday.com.
Alert Rate Limiter
Per signature: within 60 seconds, if more than 3 alerts → mute for another 60s. muteSignature(sig, 3_600_000) used by Discord's "Mute 1hr" button.
Error Signature deduplication
ErrorSignatureService.evaluate(error): strips line numbers + node_modules frames, SHA-256 hex slice(0,16). New errors → escalate to ALL channels. Known errors → Discord only.
PiiRedactorService
Recursively walks objects (max depth 10). Keys matching PII_SENSITIVE_KEYS (case-insensitive) → [REDACTED]. Strings: regex replace for email, phone, IPv4 patterns → [REDACTED]. Strings > 5120 bytes truncated first. Used automatically by ClientBridgeService and GdprSanitizerService.
PayloadTruncatorService
Deep truncation: max depth 8, strings 5120 chars, arrays max 50 items, objects max 100 keys, Buffer summarized. Not auto-applied to every Winston log — call manually when logging large objects.
trackApiCost(userId, service, estimatedCost)
Rolling hourly bucket per user. Logs each track with tag COST_TRACKED. If hourly total > $10 USD: logs 💸 [FINANCIAL_ALERT]. Call from code that wraps OpenAI / Resend / etc. with estimated USD cost.
recordUserError(userId)
Keeps timestamps in a 2-minute (120,000ms) sliding window. If count ≥ 5 errors and not yet alerted: logs 😤 [FRUSTRATION]. Resets alerted when count drops back below threshold. Call from shadow filter or auth failure handler.
Daily OpenAI analysis
Requires OPENAI_API_KEY env var. Every day at 06:00: loads 100 recent audit docs via AuditService.getRecent(100), builds a text summary of actions + metadata snippets, calls OpenAI gpt-4o-mini with a DevOps-style system prompt. Logs result as system with tag 🔮 [PREDICTIVE_WARNING].
recordFailedAuth(ip, userId?)
5-minute sliding window per IP. Each failure logs 🛡️ [SECURITY]. At ≥ 5 failures: logs 🚨 brute-force style error. Call from your auth failure path.
API
| Method | Behavior |
|---|---|
activate(userId, reason) | Stores target for 10 minutes (BLACK_BOX_RECORDING_DURATION_MS = 600,000) |
isActive(userId) | Returns true + expires old entries |
markCurrentRequest() | If CLS userId is active, sets cls isBlackBoxTarget = true |
Behavior
isGhostAdmin() reads CLS isGhostAdmin (from JWT ghostMode). activateForCurrentRequest() sets CLS flag manually. DiscordFeedService.sendLog drops messages when isGhostAdmin is true — reduces noise for admin ghost actions.
scrubExpiredPii (cron)
Finds audit docs with createdAt older than 14 days (batch 1000). Redacts metadata via PiiRedactorService, then updateOne.
purgeUserData(userId)
deleteMany on actor.id. Returns count. Logs purge action for audit trail.
API
| Method | Behavior |
|---|---|
registerStrategy(name, async fn) | Stores a recovery function for the named service |
attemptRecovery(name, reason) | Logs phases: Analysis → Strategy → Execution → Verification. Invokes strategy. On success: updates CircuitObserver to CLOSED. |
onCircuitOpen(name, reason) | If a strategy exists for this service, starts recovery automatically |
reportState(serviceName, newState, reason?)
On state change, pushes to history (max 500 entries). Logs: OPEN as ⚡ [CIRCUIT_OPEN], HALF_OPEN as 🔄, CLOSED as ✅. Methods: getState, getHistory, getAllStates for diagnostics. Call from your resilience layer (e.g. opossum).
capture(reason?)
Reasons: 'fatal' (default) | 'health-check' | 'diagnostic'. Captures: heap used/total (MB), RSS, external, uptime (s), PID, Node version, NODE_ENV, active handle count, CPU user/system (ms). Logs as system level with distinct message per reason.
| Service | Schedule | Behavior |
|---|---|---|
| PulseAnomalyService | Every minute | Baseline deviation check (needs 10+ buckets) |
| GdprSanitizerService | Daily 03:00 | Scrub PII on audit docs older than 14 days |
| AiPredictorService | Daily 06:00 | OpenAI analysis of recent audit entries (if OPENAI_API_KEY set) |
| Constant | Value | Meaning |
|---|---|---|
SLOW_REQUEST_THRESHOLD_MS | 500 | HTTP/WS "slow" warning threshold |
SLOW_DB_QUERY_THRESHOLD_MS | 100 | Slow query warning |
PAYLOAD_TRUNCATION_LIMIT_BYTES | 5120 | ~5KB string cap for PII/truncation |
ALERT_RATE_LIMIT_COUNT | 3 | Alerts per window before mute |
ALERT_RATE_LIMIT_WINDOW_MS | 60,000 | 1 minute rate limit window |
LOG_GROUP_BUFFER_MS | 60,000 | LogGrouper flush interval |
FRUSTRATION_ERROR_COUNT | 5 | Errors in window to trigger frustration log |
FRUSTRATION_WINDOW_MS | 120,000 | 2-minute frustration detection window |
FINANCIAL_HOURLY_THRESHOLD_USD | $10 | Per-user hourly spend alert |
BLACK_BOX_RECORDING_DURATION_MS | 600,000 | 10 minutes of black box recording |
GDPR_RETENTION_DAYS | 14 | Age threshold for PII metadata scrub |
AUDIT_LOG_TTL_DAYS | 90 | MongoDB TTL index for audit logs |
RRWEB_SESSION_TTL_DAYS | 7 | rrweb session retention |
PULSE_DEVIATION_THRESHOLD_PERCENT | 40 | Pulse anomaly sensitivity |
Architecture layers
Entry Points (Red): Inbound requests pass through HoneypotMiddleware first — trap paths are blocked with 404. Legitimate requests flow to CorrelationMiddleware which enriches the CLS store with IP, geo, JWT claims, and entity IDs.
Core Processing (Blue): NestJS AppModule + CLS store per request. NovaLogger automatically merges all CLS context into every log line via Winston. HttpLoggingInterceptor and Exception Filters both route through NovaLogger.
Observability (Green): Winston fans out to Console, rotating files, Discord (live feed + alerts), MongoDB audit, and AlertTransport. AlertDispatcherService deduplicates via error signatures and rate-limits to avoid alert storms.
Self-Healing (Purple): CircuitObserverService tracks circuit breaker state changes. SelfHealingService executes registered recovery strategies. StateSnapshotService captures system health on demand or on CRITICAL events.