Ch4 Regular expression / SQL
Character Classes¶
Class | Description | Matched Example | Unmatched Example |
---|---|---|---|
[abc] |
Matches a, b, or c | a | |
[a-z] |
Matches any character between a and z | x | |
[^A-Z] |
Matches any character that is not between A and Z. | a | |
\w |
Matches any "word" character. Equivalent to [A-Za-z0-9_] . |
A, a, 0, _ | - + * / ...... |
\d |
Matches any digit. Equivalent to [0-9] . |
||
[0-9] |
Matches a single digit in the range 0 - 9. Equivalent to \d . |
1 | |
\s |
Matches any whitespace character (spaces, tabs, line breaks). | ||
. |
Matches any character besides new line. |
Character classes can be combined, like in [a-zA-Z0-9]
.
Combining Patterns¶
There are multiple ways to combine patterns together in regular expressions.
Combo | Description |
---|---|
AB |
A match for A followed immediately by one for B. Example: x[.,]y matches "x.y" or "x,y". |
A|B |
Matches either A or B. Example: \d+|Inf matches either a sequence containing 1 or more digits or "Inf". |
A pattern can be followed by one of these quantifiers to specify how many instances of the pattern can occur.
Symbol | Description |
---|---|
* |
0 or more occurrences of the preceding pattern. Example: [a-z]* matches any sequence of lower-case letters or the empty string. |
+ |
1 or more occurrences of the preceding pattern. Example: \d+ matches any non-empty sequence of digits. |
? |
0 or 1 occurrences of the preceding pattern. Example: [-+]? matches an optional sign. |
{1,3} |
Matches the specified quantity of the preceding pattern. {1,3} will match from 1 to 3 instances. {3} will match exactly 3 instances. {3,} will match 3 or more instances. Example: \d{5,6} matches either 5 or 6 digit numbers. |
Groups¶
圆括号用于创建groups,和平常的算术表达式中的圆括号类似。 For example, (Mahna)+
matches strings with 1 or more "Mahna", like "MahnaMahna". Without the parentheses, Mahna+
would match strings with "Mahn" followed by 1 or more "a" characters, like "Mahnaaaa".
Anchors¶
^
: Matches the beginning of a string. Example:^(I|You)
matches I or You at the start of a string.$
: Normally matches the empty string at the end of a string or just before a newline at the end of a string. Example:(\.edu|\.org|\.com)$
matches .edu, .org, or .com at the end of a string.\b
: Matches a "word boundary", the beginning or end of a word. Example:s\b
matches s characters at the end of words,\bs
matches s characters at the beginning of words.
Special Characters¶
The following special characters are used above to denote types of patterns:
That means if you actually want to match one of those characters, you have to escape it using a backslash. For example, \(1\+3\)
matches "(1 + 3)".