Feedback

  • Contents
 

Regular Expression Syntax

A regular expression is a special text string for describing a search pattern and functions similar to wildcards. You can use regular expressions to search for and replace information in a file.

The following regular expressions are available:

Literal Characters

A single letter character matches the first occurrence of that character in the string. For example, "a" matches the first "a" in Jack is a boy, which is the "a" in "Jack."

Character

Description

Example

Any character except [\^$.|?*+()

All characters except the listed special characters match a single instance of themselves.

a matches a

\(backslash) followed by any of [\^$.|?*+()

Some characters have special meaning. A backslash placed before a special character escapes the special character to suppress its special meaning.

\+ matches +

\xFFwhere FF are 2 hexadecimal digits

Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can also use in character classes.

\xA9 matches ©when using the Latin-1 code page.

\n, \r and \t

Match an LF character, CR character, and a tab character respectively. Can also use in character classes.

\r\n matches a DOS/Windows CRLF line break.

Character classes or character sets [abc]

A character class matches one out of several characters. For example, "ae" matches either gray or grey, whichever word it finds first. The order of the characters in the set is irrelevant.

Character

Description

Example

[(opening square bracket)

Starts a character class. A character class matches a single character out of all the possibilities the character class offers. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are not valid in character classes, except \n, \r, \t and \xFF.

 

Any character except ^-]\ add that character to the possible matches for the character class.

All characters except the listed special characters.

[abc] matches a, b, or c

\(backslash) followed by any of the following: ^-]\

A backslash escapes special characters to suppress their special meaning.

[\^\]] matches ^ or ]

-(hyphen) except immediately after the opening [

Specifies a range of characters. A hyphen placed immediately after the opening bracket indicates a hyphen.

[a-zA-Z0-9] matches any letter or digit

^(caret) immediately after the opening [

Negates the character class, causing it to match a single character not listed in the character class. A caret placed anywhere except after the opening bracket indicates a caret.

[^a-d] matches x (any character except a, b, c or d)

\d, \w, and \s

Shorthand character classes match digits 0-9, word characters (alphanumeric characters plus underscore), and white space (including tabs and line breaks), respectively. Can use inside and outside character classes.

[\d\s] matches a character that is a digit or white space

\D, \W, and \S

Negated versions of the shorthand character classes. We recommend using outside character classes only as using them inside can be confusing.

\D matches a character that is not a digit

Dot

A dot matches a single character except line break characters. For example, "gr.y" matches gray and grey. Often, a character class or negated character class is faster and more precise than the dot.

Character

Description

Example

.(dot)

Matches any single character except line break characters \r and \n. Most regex engines have a "dot matches all" or "single line" mode that causes the dot match to include line break characters.

. matches x or (almost) any other character

Anchors

An anchor matches a position rather than a character. For example, "^b" only matches the first "b" in bob.

Character

Description

Example

^(caret)

Matches at the start of the string to which you apply the pattern. Most regex engines have a "multi-line" mode that causes the caret to match after any line break (for example, at the start of a line in a file).

^. matches a in abc\ndef. Also matches d in "multi-line" mode.

$(dollar)

Matches at the end of the string to which you apply the pattern. Most regex engines have a "multi-line" mode that causes the dollar sign to match before any line break (for example, at the end of a line in a file). Also matches before the last line break when the string ends with a line break.

.$ matches f in abc\ndef. Also matches c in "multi-line" mode.

\A

Matches at the start of the string to which you apply the pattern. Never matches after line breaks.

\A. matches a in abc

\Z

Matches at the end of the string to which you apply the pattern. Never matches before line breaks, except for the last line break when the string ends with a line break.

.\Z matches f in abc\ndef

\z

Matches at the end of the string to which you apply the pattern. Never matches before line breaks.

.\z matches f in abc\ndef

Word boundaries

A word boundary matches at a position between a word character and either another word character or a non-word character.

Character

Description

Example

\b

Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W). It also matches the string at the start, end, or both when the first, last, or both characters in the string are word characters.

.\b matches c in abc

\B

Matches at the position between two word characters (for example, the position between \w\w) and the position between two non-word characters (for example, \W\W).

\B.\B matches b in abc

Alternation

Alternation matches one item out of a group of items and is equivalent to the "or" operator. For example, "cat|dog" matches "cat" in About cats and dogs. If it's applied again, it matches "dog." You can add as many alternatives as you want (for example, "cat|dog|mouse|fish." Include parenthesis for grouping. For example, "(cat|dog) food" matches cat food and dog food.

The pipe (|) character has the lowest precedence of all regex operators.

Character

Description

Example

|

Matches either the part on the left side, or the part on the right side. You can string characters together into a series of options.

abc|def|xyz matches abc, def, or xyz

|(pipe)

Use grouping to alternate only part of the regular expression.

abc(def|xyz) matches abcdef or abcxyz

Quantifiers

A quantifier matches a set number of characters that matched specific criteria. There are two types of quantifier searches: greedy (maximal) and lazy (minimal). A greedy search tries to match as many characters as it can while still returning a true value. A lazy search matches once only. For example, if you search for one to four "b's" in a row and have a string with three "b's" in a row, greedy matches the three "b's" and lazy only matches the first "b."

Character

Description

Example

?

Greedy. Makes the preceding item optional. Includes the optional item in the match when possible.

abc? matches ab or abc

??

Lazy. Makes the preceding item optional. Excludes the optional item in the match when possible.

abc?? matches ab or abc

*(star)

Greedy. Repeats the previous item zero or more times. Matches as many items as possible before trying permutations with fewer matches of the preceding item, up to the point where it doesn't match the preceding item at all.

".*" matches "def" "ghi" in abc "def" "ghi" jkl

*?(lazy star)

Lazy. Repeats the previous item zero or more times. The regex engine attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.

".*?" matches "def" in abc "def" "ghi" jkl

+(plus)

Greedy. Repeats the previous item once or more. Matches as many items as possible before trying permutations with fewer matches of the preceding item, up to the point where the search engine matches the preceding item once only.

".+" matches "def" "ghi" in abc "def" "ghi" jkl

+?(lazy plus)

Lazy. Repeats the previous item once or more. The regex engine matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.

".+?" matches "def" in abc "def" "ghi" jkl

{n}where n is an integer >= 1

Repeats the previous item exactly n times.

a{3} matches aaa

{n,m}where n >= 1 and m >= n

Greedy. Repeats the previous item between n and m times. Tries repeating m times before reducing the repetition to n times.

a{2,4} matches aa, aaa or aaaa

{n,m}?where n >= 1 and m >= n

Lazy. Repeats the previous item between n and m times. Tries repeating n times before increasing the repetition to m times.

a{2,4}? matches aaaa, aaa or aa

{n,}where n >= 1

Greedy. Repeats the previous item at least n times. Matches as many items as possible before trying permutations with fewer matches of the preceding item, up to the point where it matches the preceding item only n times.

a{2,} matches aaaaa in aaaaa

{n,}?where n >= 1

Lazy. Repeats the previous item between n and m times. The regex engine matches the previous item n times, before trying permutations with ever increasing matches of the preceding item.

a{2,}? matches aa in aaaaa

Related Topics

Define the File Layout

Source File Layout