Regular Expression

A regular expression, commonly known as regex, describes a pattern of characters.

Usage: They are often used to perform searches, replace substrings and validate string data.

Alternation

Alternation has the pipe symbol | . It allows us to match either the characters preceding the | or after the |

For example: cat|dog will match cat as well as dog.

Character Set

Character sets denoted by a pair of brackets [] will match any of the characters included within the brackets.

Wildcards

Wildcards denoted with the period can match any single character (letter, number, symbol or whitespace)

For example:

… will match cat or dog or any other 3-character text.

To match an actual period . , youcan use the escape character .

Ranges

Regular expression ranges are used to specify a range of characters that can be matched.

Common regular expression ranges include:

  • [A-Z]. : match any uppercase letter
  • [a-z]. : match any lowercase letter
  • [0-9]. : match any digit
  • [A-Za-z] : match any uppercase or lowercase letter.

Shorthand Character Classes

Shorthand character classes simplify writing regular expressions.

  • \w: [A-Za-z0-9_] and it matches a single uppercase character, lowercase character, digit or underscore.
  • \d: [0-9], and it matches a single digit character.
  • \s: the “whitespace character”.

Grouping

Grouping lets us group parts of a regular expression together.

For example: I like (tea|coffee) will match the text I like and then match either tea or coffee.

Fixed Quantifiers

Fixed quantifiers are indicated by curly braces {}

  • \w{3} will match exactly 3 word characters.
  • \w{4,7} will match at minimum 4 word characters and at maximum 7 word characters.

Optional Quantifiers

Optional quantifiers with question mark ? allows us to indicate a character in a regex is optional.

Kleene Star

Kleene star * indicates that the preceding character can occur 0 or more times.

For example: ha*t will match hat, haat, haaaat, or haaaaaaaaat

Kleene plus

Kleene plus + indicates that the preceding character can occur 1 or more times.

For example: meo+w will match meow, meooow, and meoooooooooooow, but not match mew.

Anchors

Anchors (hat ^ and dollar sign $) are used in regular expressions to match text at the start and end of a string.