Best Practices for Syslog Priority Levels in…

When the syslog PRI byte is extracted without bounds-checking, a routine NAT timeout and an active privilege-escalation alert reach the correlation engine carrying identical weight — and the alert drowns. This page drills the single decoding step that prevents that, as part of the broader Syslog RFC Standards parsing contract inside the SOC Log Architecture & Taxonomy pipeline.

Root-Cause Context

Every syslog line begins with a priority value enclosed in angle brackets, computed as (Facility * 8) + Severity. Decoding it is two bit operations: facility = PRI >> 3 and severity = PRI & 7. The arithmetic is trivial; the failures come from the data, not the math.

RFC 3164 never mandated validation, so a decade of appliances ships PRI values that violate the spec in predictable ways. The recurring defects a SOC receiver sees on UDP/514 are:

Padded or multi-digit PRI such as <00134> where a vendor zero-pads the byte; a naive int() still parses it, but a length-bounded regex rejects it as malformed.
Out-of-range values above 191 (the maximum legal PRI, local7.debug), usually a sign the field is carrying something other than a priority — a sequence number, a truncated header, or a binary smear.
Missing PRI entirely, where a relay strips the marker and the first integer in the line is a port or PID that a greedy parser mistakes for a priority.
Severity-only or facility-only emitters, where a device hard-codes one half of the byte and the other half is junk.

The downstream cost is severity collapse. If the byte is not split into the facility that drives routing and the severity that drives triage, the alert correlation and rule engines downstream have no clean signal to key on, rate limiting strategies shed high-severity lines indiscriminately during volume spikes, and dynamic severity scoring computes risk from a constant. Deterministic PRI handling restores the signal before any of those stages run.

Prerequisites

The decoder targets Python 3.11+ and the standard library only — re, asyncio, enum, logging, and dataclasses — so the hot path carries no third-party dependency. Assumptions:

Syslog is already framed one event per line (UDP) or octet-counted per RFC 5425/6587 (TLS); this code parses the line payload after framing.
The compiled PRI_PATTERN is a module-level constant, compiled once rather than per event.
The decoder is pure — it takes a str and returns a result object, performing no I/O — so it is safe to run inside an asyncio worker pool and trivial to unit-test.
A separate quarantine sink (file, Kafka topic, or DLQ) exists for lines that fail validation; nothing malformed is force-fit into a partial record.

Production-Ready Implementation

The following self-contained module decodes the PRI byte, validates it against the legal 0–191 range, splits facility and severity into named enums, and routes each event to a severity-tiered asyncio.Queue. Rejected lines carry a typed error code rather than a silent default.

from __future__ import annotations

import asyncio
import re
from dataclasses import dataclass
from enum import IntEnum
from typing import Optional


class Severity(IntEnum):
    EMERGENCY = 0
    ALERT = 1
    CRITICAL = 2
    ERROR = 3
    WARNING = 4
    NOTICE = 5
    INFORMATIONAL = 6
    DEBUG = 7


FACILITY_MAP: dict[int, str] = {
    0: "kern", 1: "user", 2: "mail", 3: "daemon", 4: "auth",
    5: "syslog", 6: "lpr", 7: "news", 8: "uucp", 9: "cron",
    10: "authpriv", 11: "ftp", 12: "ntp", 13: "audit",
    14: "alert", 15: "clock", 16: "local0", 17: "local1",
    18: "local2", 19: "local3", 20: "local4", 21: "local5",
    22: "local6", 23: "local7",
}

# Bounded 1–3 digits anchored at line start: rejects zero-padded and oversized PRI.
PRI_PATTERN = re.compile(r"^<(\d{1,3})>")
MAX_PRI = 191  # (23 << 3) | 7  -> local7.debug


@dataclass(frozen=True)
class PriResult:
    pri_raw: int
    facility_id: int
    facility_name: str
    severity_id: int
    severity_name: str


def decode_pri(raw_event: str) -> tuple[Optional[PriResult], Optional[str]]:
    """Decode and validate the syslog PRI. Returns (result, error_code)."""
    match = PRI_PATTERN.match(raw_event)
    if not match:
        return None, "ERR_PRI_001"  # missing or malformed PRI header

    pri = int(match.group(1))  # regex guarantees 1–3 digits
    if pri > MAX_PRI:
        return None, "ERR_PRI_002"  # PRI exceeds RFC maximum (191)

    facility_id = pri >> 3
    severity_id = pri & 7
    if facility_id not in FACILITY_MAP:
        return None, "ERR_PRI_003"  # facility outside 0–23

    return PriResult(
        pri_raw=pri,
        facility_id=facility_id,
        facility_name=FACILITY_MAP[facility_id],
        severity_id=severity_id,
        severity_name=Severity(severity_id).name,
    ), None


class SyslogRouter:
    """Route validated events into severity-tiered queues; quarantine the rest."""

    def __init__(self) -> None:
        self.critical: asyncio.Queue[PriResult] = asyncio.Queue()   # severity <= 2
        self.correlate: asyncio.Queue[PriResult] = asyncio.Queue()  # severity 3–5
        self.archive: asyncio.Queue[PriResult] = asyncio.Queue()    # severity 6–7
        self.quarantine: asyncio.Queue[tuple[str, str]] = asyncio.Queue()

    async def route(self, raw_event: str) -> None:
        result, error = decode_pri(raw_event)
        if error is not None:
            await self.quarantine.put((raw_event, error))
            return
        assert result is not None
        if result.severity_id <= Severity.CRITICAL:
            await self.critical.put(result)
        elif result.severity_id <= Severity.NOTICE:
            await self.correlate.put(result)
        else:
            await self.archive.put(result)

    async def consume(self, lines: asyncio.Queue[str]) -> None:
        while True:
            raw = await lines.get()
            try:
                await self.route(raw)
            finally:
                lines.task_done()

The >> 3 / & 7 split is the load-bearing operation: it is constant-time, allocation-free, and impossible to get wrong once the input has passed the range gate. Validation happens once, at the edge, so every stage after SyslogRouter can trust that severity_id is a real 0–7 value rather than a vendor string.

Error-Code Reference

The decoder emits stable, typed codes following the site-wide ERR_CATEGORY_NNN convention so the error categorization framework can route and alert on them without parsing free text.

Code	Meaning	Action
`ERR_PRI_001`	No `<...>` PRI marker, or more than three digits	Quarantine; check upstream relay for header stripping or PRI padding
`ERR_PRI_002`	Numeric PRI exceeds `191`	Quarantine; the field is not a priority — inspect framing and source config
`ERR_PRI_003`	Facility code outside `0–23`	Quarantine; confirm whether the source uses a non-standard facility extension
`ERR_PRI_004`	Severity not in `0–7` (reserved; unreachable via `& 7`, used by upstream enrich)	Tag for manual review; severity is masked, so this fires only on pre-masked inputs

Severity extracted with & 7 is mathematically always 0–7, so ERR_PRI_004 cannot originate inside decode_pri; it is reserved for an enrichment stage that overwrites severity from a parsed application field and must re-validate the override.

Operational Notes

CPU profile. Decoding is a single anchored regex match plus two bit operations — roughly 1–2 microseconds per line on a modern core. The regex is the only meaningful cost; keeping PRI_PATTERN module-level (compiled once) keeps the hot path flat under millions of events per second.
Memory. PriResult is a frozen dataclass with five small fields; it adds no per-event allocation pressure beyond the queue entry. Cap each asyncio.Queue with maxsize in production so an overloaded consumer applies backpressure instead of growing the heap unbounded.
Safe batch sizes. When draining queues into a downstream sink, batch severity 6–7 archive writes at 500–1000 events and flush severity <= 2 immediately (batch size 1) so critical alerts never wait on a buffer.
Vendor quirks. Cisco ASA emits standard PRI but encodes its own severity inside the message body (%ASA-6-...); trust the PRI byte for routing, not the in-message digit, unless you have explicitly validated they agree. Some appliances reuse facility 13–15 for non-standard purposes — confirm against vendor docs before treating those as the IANA-registered audit/alert/clock facilities.
Edge pre-filter (optional). Dropping obviously malformed PRI at the collector saves the Python stage from junk. An rsyslog RainerScript guard quarantines lines that lack a valid leading PRI before forwarding:
```
# rsyslog: quarantine lines without a 1–3 digit PRI, then stop
if not re_match($rawmsg, "^<[0-9]{1,3}>") then {
  action(type="omfile" file="/var/log/syslog-quarantine.log")
  stop
}
```
Normalized PRI output should align with JSON event normalization so severity_id and facility_name become first-class fields, and routing thresholds should match the threshold tuning strategies used downstream. NIST SP 800-92 covers the broader log-management context for these retention and routing tiers.

Verification Checklist

A line with no <PRI> marker returns ERR_PRI_001 and lands in the quarantine queue, not the archive tier.
<192> and <00134> are both rejected (ERR_PRI_002 and ERR_PRI_001), proving the range gate and the digit-length gate both fire.
<134> decodes to facility 16 (local0) and severity 6 (INFORMATIONAL) and routes to the archive queue.
<10> decodes to facility 1 (user) and severity 2 (CRITICAL) and routes to the critical queue for immediate triage.
PRI_PATTERN is defined once at module scope (not rebuilt per call) — confirm with a profiler that regex compilation does not appear in the hot path.
Every quarantined event retains its raw line alongside the error code, so the source can be traced and the upstream config corrected.

FAQ

Should I trust the PRI byte or the severity digit inside the message?

Trust the PRI byte for routing. It is the transport-level priority every RFC-compliant relay preserves, and severity = PRI & 7 is unambiguous. Vendor in-message digits (like Cisco’s %ASA-6-) are application conventions that can drift from the PRI, be remapped by intermediate devices, or be absent entirely. If you need the application severity, parse it into a separate field and reconcile it explicitly rather than letting it silently override the byte.

What is the maximum legal PRI value, and why reject anything above it?

The maximum is 191, computed as (23 << 3) | 7 — facility local7 (23) with severity debug (7). Any value above 191 cannot be a valid facility/severity pair, so it almost always means the field is carrying something else: a truncated header, a sequence number, or binary noise. Rejecting it with ERR_PRI_002 keeps that garbage out of the severity-routing logic instead of letting it decode into a nonsensical facility.

Why use a regex with a digit-length bound instead of just int()?

int("00134") happily returns 134, hiding a malformed zero-padded PRI that a strict parser should flag. The anchored ^<(\d{1,3})> pattern enforces both the framing (leading <...>) and the legal width (1–3 digits) before any conversion, so padded, oversized, or marker-less lines are caught as ERR_PRI_001 rather than silently normalized. The regex is the validation gate; int() alone is not.

Syslog RFC Standards — parent guide
SOC Log Architecture & Taxonomy — section overview
JSON Event Normalization
Dynamic Severity Scoring
Error Categorization Frameworks

Best Practices for Syslog Priority Levels in SOC Pipelines