Regex with re: Extracting Counters and Parsing show Output in Python

Network gear emits walls of text. show interfaces, show ip bgp summary, syslog lines, MAC tables — all of it semi-structured, none of it JSON. Yesterday’s topic was reasoning about addresses; today’s is pulling facts out of text. Regular expressions are the scalpel: a tiny pattern language for “find the thing that looks like this.” Python ships it in the re module.

A warning up front: regex is powerful and easy to overuse. For a single field, string methods (.split(), .startswith()) are clearer. Regex is the right tool when the data has shape — an IP here, a counter there, a state in the middle — and several pieces need capturing at once.

Table of Contents

The Five Functions That Cover Most Cases

import re

line = "GigabitEthernet0/1 is up, line protocol is up"

# search: find the first match anywhere
m = re.search(r"line protocol is (\w+)", line)
print(m.group(1))          # up

# match: must match at the START of the string
print(bool(re.match(r"Gig", line)))   # True

# findall: every match, as a list
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", "from 10.0.0.1 to 10.0.0.2")
print(ips)                 # ['10.0.0.1', '10.0.0.2']

# finditer: every match as objects (keeps position + groups)
for m in re.finditer(r"(\d+)\.(\d+)", "1.2 and 3.4"):
    print(m.group(0))      # 1.2 then 3.4

# sub: find and replace
print(re.sub(r"\d+\.\d+\.\d+\.\d+", "x.x.x.x", "ping 8.8.8.8"))
# ping x.x.x.x

The Metacharacters That Cover 90% of Network Parsing

The whole language isn’t necessary. This handful does the job:

# \d  digit          \w  word char (letter/digit/_)     \s  whitespace
# +   one or more    *   zero or more    ?   optional
# .   any char       ^   start    $   end
# []  a set, e.g. [0-9A-Fa-f]      |   OR
# ()  a capture group    (?:...)  a non-capturing group

# A MAC address in xxxx.xxxx.xxxx (Cisco) form:
mac = re.search(r"([0-9a-f]{4}\.){2}[0-9a-f]{4}", "0050.56c0.0008")
print(mac.group(0))   # 0050.56c0.0008

Raw Strings and Why They Matter

Notice every pattern starts with r"...". Backslashes mean something to both Python strings and regex. Without the r, a literal \d would have to be written "\\d". The raw-string prefix turns off Python’s own backslash handling so the pattern reads the way the regex engine sees it. Make it a habit: regex patterns are always raw strings.

Named Groups: Parsing That Reads Like Documentation

Numbered groups (.group(1)) get confusing fast. Naming them with (?P<name>...) allows pulling the result as a dictionary:

line = "GigabitEthernet0/1 is up, line protocol is down"

pat = re.compile(
    r"(?P<intf>\S+) is (?P<link>up|down|administratively down),"
    r" line protocol is (?P<proto>up|down)"
)
m = pat.search(line)
print(m.groupdict())
# {'intf': 'GigabitEthernet0/1', 'link': 'up', 'proto': 'down'}

Two things to note. re.compile builds the pattern once — worth doing when it runs over many lines (every line of a 5,000-line config). And .groupdict() returns a clean dict that drops straight into a report.

Cisco Context: Parsing Interface Counters

Here is a realistic chunk of show interfaces output and a parser that extracts the error counters for every interface — the kind of thing to run nightly to catch a flapping link before users do.

import re

output = """
GigabitEthernet0/1 is up, line protocol is up
  5 minute input rate 1000 bits/sec, 2 packets/sec
     12 input errors, 3 CRC, 0 frame, 0 overrun
GigabitEthernet0/2 is up, line protocol is up
     0 input errors, 0 CRC, 0 frame, 0 overrun
"""

intf_re = re.compile(r"^(\S+) is (up|down)", re.MULTILINE)
err_re  = re.compile(r"(\d+) input errors, (\d+) CRC")

current = None
report = {}
for line in output.splitlines():
    m = intf_re.search(line)
    if m:
        current = m.group(1)
        continue
    e = err_re.search(line)
    if e and current:
        report[current] = {"input_errors": int(e.group(1)),
                           "crc": int(e.group(2))}

for intf, stats in report.items():
    flag = "  <-- CHECK" if stats["input_errors"] else ""
    print(f"{intf}: {stats['input_errors']} errors, {stats['crc']} CRC{flag}")
# GigabitEthernet0/1: 12 errors, 3 CRC  <-- CHECK
# GigabitEthernet0/2: 0 errors, 0 CRC

The re.MULTILINE flag makes ^ match the start of every line, not just the start of the whole string — essential when scanning multi-line command output. Note also the conversion of captured strings to int: regex always returns text.

A Word on When NOT to Use Regex

A regex to parse deeply nested or table-structured output is a signal to stop. Week 3 covers TextFSM and ntc-templates, which turn show output into clean dictionaries using vendor-maintained templates. Regex is for targeted extraction; TextFSM is for full tables. Use the right tool.

Exercises

Warm-up. Extract every IPv4 address from the string "OSPF neighbor 10.1.1.2 on 10.1.1.1, dead 10.0.0.0" using findall.
Validation. Write is_cisco_mac(s) that returns True only when s is a full Cisco-format MAC like aabb.ccdd.eeff (and nothing else around it). Hint: anchor with ^ and $.
Named capture. Parse "%LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to down" into a dict with keys severity, interface, and state.
Scrub. Given a config snippet, replace every password 7 XXXX and secret 5 XXXX value with <redacted> before it goes into a ticket.
Challenge. From multi-line show ip bgp summary output, extract each neighbor IP and its State/PfxRcd value, then print only the neighbors that are NOT in an established (numeric prefix count) state.

Answers

Show answers

1. Warm-up

import re
s = "OSPF neighbor 10.1.1.2 on 10.1.1.1, dead 10.0.0.0"
print(re.findall(r"\d+\.\d+\.\d+\.\d+", s))
# ['10.1.1.2', '10.1.1.1', '10.0.0.0']

2. Validation

def is_cisco_mac(s):
    return bool(re.match(r"^([0-9a-fA-F]{4}\.){2}[0-9a-fA-F]{4}$", s))

print(is_cisco_mac("aabb.ccdd.eeff"))      # True
print(is_cisco_mac("aabb.ccdd.eeff extra")) # False

The ^ and $ anchors are what reject the trailing junk — without them, match succeeds on the prefix alone.

3. Named capture

line = "%LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to down"
pat = re.compile(
    r"%\w+-(?P<severity>\d)-\w+: Interface (?P<interface>\S+),"
    r" changed state to (?P<state>up|down)"
)
print(pat.search(line).groupdict())
# {'severity': '3', 'interface': 'GigabitEthernet0/2', 'state': 'down'}

4. Scrub

cfg = "enable secret 5 $1$abcd\n username bob password 7 070C285F"
clean = re.sub(r"(password 7|secret 5) \S+", r"\1 <redacted>", cfg)
print(clean)
# enable secret 5 <redacted>
#  username bob password 7 <redacted>

\1 in the replacement re-inserts the first captured group, keeping the keyword and redacting only the value.

5. Challenge

summary = """
Neighbor    V    AS  MsgRcvd  Up/Down  State/PfxRcd
10.1.1.2    4 65001    1203   01:20:11        15
10.1.1.6    4 65002      0    never           Idle
10.1.1.10   4 65003      0    00:00:30        Active
"""
pat = re.compile(r"^(\d+\.\d+\.\d+\.\d+)\s+.*\s+(\S+)$", re.MULTILINE)
for ip, state in pat.findall(summary):
    if not state.isdigit():
        print(f"{ip} is DOWN (state={state})")
# 10.1.1.6 is DOWN (state=Idle)
# 10.1.1.10 is DOWN (state=Active)

The insight: in show ip bgp summary, an established peer shows a number (prefixes received) in the last column. Anything non-numeric (Idle, Active, Connect) is a session that has not come up. state.isdigit() is the whole test.

Previously: The ipaddress Module. Coming tomorrow — subprocess: wrapping ping, traceroute, and nslookup so scripts can drive the tools already trusted at the CLI.

This is Day 9 of the 21‑post Python for Network Engineers series.

The Five Functions That Cover Most Cases

The Metacharacters That Cover 90% of Network Parsing

Raw Strings and Why They Matter

Named Groups: Parsing That Reads Like Documentation

Cisco Context: Parsing Interface Counters

A Word on When NOT to Use Regex

Exercises

Answers

You Might Also Like

Lists, Tuples, Dicts, Sets — Modeling Devices and Inventories

subprocess: Wrapping ping, traceroute, and nslookup in Python

NetFlow configuration