Regex with re: Extracting Counters and Parsing show Output in Python

Network gear emits walls of text. show interfaces, show ip bgp summary, syslog lines, MAC tables — all of it semi-structured, none of it JSON. Yesterday you learned to reason about addresses; today you learn to pull facts out of text. Regular expressions are the scalpel: a tiny pattern language for “find me the thing that looks like this.” Python ships it in the re module.

A warning up front: regex is powerful and easy to overuse. For a single field, string methods (.split(), .startswith()) are clearer. Reach for regex when the data has shape — an IP here, a counter there, a state in the middle — and you want to capture several pieces at once.

The Five Functions You Actually Use

import re

line = "GigabitEthernet0/1 is up, line protocol is up"

# search: find the first match anywhere
m = re.search(r"line protocol is (\w+)", line)
print(m.group(1))          # up

# match: must match at the START of the string
print(bool(re.match(r"Gig", line)))   # True

# findall: every match, as a list
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", "from 10.0.0.1 to 10.0.0.2")
print(ips)                 # ['10.0.0.1', '10.0.0.2']

# finditer: every match as objects (keeps position + groups)
for m in re.finditer(r"(\d+)\.(\d+)", "1.2 and 3.4"):
    print(m.group(0))      # 1.2 then 3.4

# sub: find and replace
print(re.sub(r"\d+\.\d+\.\d+\.\d+", "x.x.x.x", "ping 8.8.8.8"))
# ping x.x.x.x

The Metacharacters That Cover 90% of Network Parsing

You do not need the whole language. This handful does the job:

# \d  digit          \w  word char (letter/digit/_)     \s  whitespace
# +   one or more    *   zero or more    ?   optional
# .   any char       ^   start    $   end
# []  a set, e.g. [0-9A-Fa-f]      |   OR
# ()  a capture group    (?:...)  a non-capturing group

# A MAC address in xxxx.xxxx.xxxx (Cisco) form:
mac = re.search(r"([0-9a-f]{4}\.){2}[0-9a-f]{4}", "0050.56c0.0008")
print(mac.group(0))   # 0050.56c0.0008

Raw Strings and Why You Always Use Them

Notice every pattern starts with r"...". Backslashes mean something to both Python strings and regex. Without the r, you would have to write "\\d" to get a literal \d. The raw-string prefix turns off Python’s own backslash handling so the pattern reads the way the regex engine sees it. Make it a habit: regex patterns are always raw strings.

Named Groups: Parsing That Reads Like Documentation

Numbered groups (.group(1)) get confusing fast. Name them with (?P<name>...) and pull the result as a dictionary:

line = "GigabitEthernet0/1 is up, line protocol is down"

pat = re.compile(
    r"(?P<intf>\S+) is (?P<link>up|down|administratively down),"
    r" line protocol is (?P<proto>up|down)"
)
m = pat.search(line)
print(m.groupdict())
# {'intf': 'GigabitEthernet0/1', 'link': 'up', 'proto': 'down'}

Two things to note. re.compile builds the pattern once — do this when you will run it over many lines (every line of a 5,000-line config). And .groupdict() hands you a clean dict you can drop straight into a report.

Cisco Context: Parsing Interface Counters

Here is a realistic chunk of show interfaces output and a parser that extracts the error counters every interface — the kind of thing you would run nightly to catch a flapping link before users do.

import re

output = """
GigabitEthernet0/1 is up, line protocol is up
  5 minute input rate 1000 bits/sec, 2 packets/sec
     12 input errors, 3 CRC, 0 frame, 0 overrun
GigabitEthernet0/2 is up, line protocol is up
     0 input errors, 0 CRC, 0 frame, 0 overrun
"""

intf_re = re.compile(r"^(\S+) is (up|down)", re.MULTILINE)
err_re  = re.compile(r"(\d+) input errors, (\d+) CRC")

current = None
report = {}
for line in output.splitlines():
    m = intf_re.search(line)
    if m:
        current = m.group(1)
        continue
    e = err_re.search(line)
    if e and current:
        report[current] = {"input_errors": int(e.group(1)),
                           "crc": int(e.group(2))}

for intf, stats in report.items():
    flag = "  <-- CHECK" if stats["input_errors"] else ""
    print(f"{intf}: {stats['input_errors']} errors, {stats['crc']} CRC{flag}")
# GigabitEthernet0/1: 12 errors, 3 CRC  <-- CHECK
# GigabitEthernet0/2: 0 errors, 0 CRC

The re.MULTILINE flag makes ^ match the start of every line, not just the start of the whole string — essential when scanning multi-line command output. Note also we convert captured strings to int: regex always hands you text.

A Word on When NOT to Use Regex

If you find yourself writing a regex to parse deeply nested or table-structured output, stop. Later this week we will cover TextFSM and ntc-templates, which turn show output into clean dictionaries using vendor-maintained templates. Regex is for targeted extraction; TextFSM is for full tables. Use the right tool.

Exercises

  1. Warm-up. Extract every IPv4 address from the string "OSPF neighbor 10.1.1.2 on 10.1.1.1, dead 10.0.0.0" using findall.
  2. Validation. Write is_cisco_mac(s) that returns True only if s is a full Cisco-format MAC like aabb.ccdd.eeff (and nothing else around it). Hint: anchor with ^ and $.
  3. Named capture. Parse "%LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to down" into a dict with keys severity, interface, and state.
  4. Scrub. Given a config snippet, replace every password 7 XXXX and secret 5 XXXX value with <redacted> before you paste it into a ticket.
  5. Challenge. From multi-line show ip bgp summary output, extract each neighbor IP and its State/PfxRcd value, then print only the neighbors that are NOT in an established (numeric prefix count) state.

Answers

Show answers

1. Warm-up

import re
s = "OSPF neighbor 10.1.1.2 on 10.1.1.1, dead 10.0.0.0"
print(re.findall(r"\d+\.\d+\.\d+\.\d+", s))
# ['10.1.1.2', '10.1.1.1', '10.0.0.0']

2. Validation

def is_cisco_mac(s):
    return bool(re.match(r"^([0-9a-fA-F]{4}\.){2}[0-9a-fA-F]{4}$", s))

print(is_cisco_mac("aabb.ccdd.eeff"))      # True
print(is_cisco_mac("aabb.ccdd.eeff extra")) # False

The ^ and $ anchors are what reject the trailing junk — without them, match would succeed on the prefix.

3. Named capture

line = "%LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to down"
pat = re.compile(
    r"%\w+-(?P<severity>\d)-\w+: Interface (?P<interface>\S+),"
    r" changed state to (?P<state>up|down)"
)
print(pat.search(line).groupdict())
# {'severity': '3', 'interface': 'GigabitEthernet0/2', 'state': 'down'}

4. Scrub

cfg = "enable secret 5 $1$abcd\n username bob password 7 070C285F"
clean = re.sub(r"(password 7|secret 5) \S+", r"\1 <redacted>", cfg)
print(clean)
# enable secret 5 <redacted>
#  username bob password 7 <redacted>

\1 in the replacement re-inserts the first captured group, so we keep the keyword and redact only the value.

5. Challenge

summary = """
Neighbor    V    AS  MsgRcvd  Up/Down  State/PfxRcd
10.1.1.2    4 65001    1203   01:20:11        15
10.1.1.6    4 65002      0    never           Idle
10.1.1.10   4 65003      0    00:00:30        Active
"""
pat = re.compile(r"^(\d+\.\d+\.\d+\.\d+)\s+.*\s+(\S+)$", re.MULTILINE)
for ip, state in pat.findall(summary):
    if not state.isdigit():
        print(f"{ip} is DOWN (state={state})")
# 10.1.1.6 is DOWN (state=Idle)
# 10.1.1.10 is DOWN (state=Active)

The insight: in show ip bgp summary, an established peer shows a number (prefixes received) in the last column. Anything non-numeric (Idle, Active, Connect) is a session that has not come up. state.isdigit() is the whole test.


Previously: The ipaddress Module. Coming tomorrow — subprocess: wrapping ping, traceroute, and nslookup so your scripts can use the tools you already trust.

Leave a Reply