Log Patrol
AI-Assisted Log Monitoring & Incident Management
Problem
Alert fatigue is real. The standard approach to log-based incident monitoring, firing an alert for every ERROR-level event, floods the issue queue with noise faster than any team can meaningfully respond. The result is that engineers start ignoring alerts, the very outcome monitoring is supposed to prevent. I wanted a system that watches Loki log streams continuously and opens a GitLab issue only when something genuinely warrants human attention, not when a log line merely contains the word "error."
Approach
Log Patrol runs on a configurable patrol loop and applies three complementary layers of analysis to incoming log streams:
- Fast-path deterministic rules catch explicit, unambiguous error-level events immediately. No ML overhead, no inference latency. These are direct pattern matches that should always produce an issue.
- Drain3 log template mining runs on the slow-path to surface rare or suspicious patterns that don't match known error signatures. Drain3 clusters log messages into templates, which makes it possible to detect novel failure modes without hardcoding every possible error string.
- LLM sentiment gate: before opening or updating any GitLab issue, a locally-hosted Ollama model evaluates the candidate finding and decides whether it's genuinely incident-worthy. This gate exists specifically to filter the cases where Drain3 flags something structurally unusual but contextually benign.
Findings are fingerprinted across patrol runs so the same underlying problem doesn't generate duplicate issues. Patrol state persists in SQLite. Stale issues that stop recurring are automatically closed. The entire system deploys via Docker Compose with a full smoke test harness.
Outcome
A monitoring system that operates in the background without generating noise. When it opens a GitLab issue, it's because all three layers agreed something warranted attention. The LLM gate, in particular, catches the "technically an anomaly, contextually fine" cases that would otherwise produce false positives. The project is fully typed (mypy strict), linted (pylint, pydocstyle Google convention), and tested (pytest with smoke test suite).