博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[转]The Regular Expression Object Model
阅读量:5040 次
发布时间:2019-06-12

本文共 27620 字,大约阅读时间需要 92 分钟。

本文转自:

This topic describes the object model used in working with .NET regular expressions. It contains the following sections:

The Regular Expression Engine

The regular expression engine in .NET is represented by the  class. The regular expression engine is responsible for parsing and compiling a regular expression, and for performing operations that match the regular expression pattern with an input string. The engine is the central component in the .NET regular expression object model.

You can use the regular expression engine in either of two ways:

  • By calling the static methods of the  class. The method parameters include the input string and the regular expression pattern. The regular expression engine caches regular expressions that are used in static method calls, so repeated calls to static regular expression methods that use the same regular expression offer relatively good performance.

  • By instantiating a  object, by passing a regular expression to the class constructor. In this case, the  object is immutable (read-only) and represents a regular expression engine that is tightly coupled with a single regular expression. Because regular expressions used by  instances are not cached, you should not instantiate a  object multiple times with the same regular expression.

You can call the methods of the  class to perform the following operations:

  • Determine whether a string matches a regular expression pattern.

  • Extract a single match or the first match.

  • Extract all matches.

  • Replace a matched substring.

  • Split a single string into an array of strings.

These operations are described in the following sections.

Matching a Regular Expression Pattern

The  method returns true if the string matches the pattern, or false if it does not. The  method is often used to validate string input. For example, the following code ensures that a string matches a valid social security number in the United States.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim values() As String = { "111-22-3333", "111-2-3333"} Dim pattern As String = "^\d{3}-\d{2}-\d{4}$" For Each value As String In values If Regex.IsMatch(value, pattern) Then Console.WriteLine("{0} is a valid SSN.", value) Else Console.WriteLine("{0}: Invalid", value) End If Next End Sub End Module ' The example displays the following output: ' 111-22-3333 is a valid SSN. ' 111-2-3333: Invalid

The regular expression pattern ^\d{3}-\d{2}-\d{4}$ is interpreted as shown in the following table.

Pattern Description
^ Match the beginning of the input string.
\d{3} Match three decimal digits.
- Match a hyphen.
\d{2} Match two decimal digits.
- Match a hyphen.
\d{4} Match four decimal digits.
$ Match the end of the input string.

Extracting a Single Match or the First Match

The  method returns a  object that contains information about the first substring that matches a regular expression pattern. If the Match.Success property returns true, indicating that a match was found, you can retrieve information about subsequent matches by calling the  method. These method calls can continue until the Match.Successproperty returns false. For example, the following code uses the  method to find the first occurrence of a duplicated word in a string. It then calls the  method to find any additional occurrences. The example examines the Match.Success property after each method call to determine whether the current match was successful and whether a call to the  method should follow.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim input As String = "This is a a farm that that raises dairy cattle." Dim pattern As String = "\b(\w+)\W+(\1)\b" Dim match As Match = Regex.Match(input, pattern) Do While match.Success Console.WriteLine("Duplicate '{0}' found at position {1}.", _ match.Groups(1).Value, match.Groups(2).Index) match = match.NextMatch() Loop End Sub End Module ' The example displays the following output: ' Duplicate 'a' found at position 10. ' Duplicate 'that' found at position 22.

The regular expression pattern \b(\w+)\W+(\1)\b is interpreted as shown in the following table.

Pattern Description
\b Begin the match on a word boundary.
(\w+) Match one or more word characters. This is the first capturing group.
\W+ Match one or more non-word characters.
(\1) Match the first captured string. This is the second capturing group.
\b End the match on a word boundary.

Extracting All Matches

The  method returns a  object that contains information about all matches that the regular expression engine found in the input string. For example, the previous example could be rewritten to call the  method instead of the  and  methods.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim input As String = "This is a a farm that that raises dairy cattle." Dim pattern As String = "\b(\w+)\W+(\1)\b" For Each match As Match In Regex.Matches(input, pattern) Console.WriteLine("Duplicate '{0}' found at position {1}.", _ match.Groups(1).Value, match.Groups(2).Index) Next End Sub End Module ' The example displays the following output: ' Duplicate 'a' found at position 10. ' Duplicate 'that' found at position 22.

Replacing a Matched Substring

The  method replaces each substring that matches the regular expression pattern with a specified string or regular expression pattern, and returns the entire input string with replacements. For example, the following code adds a U.S. currency symbol before a decimal number in a string.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "\b\d+\.\d{2}\b" Dim replacement As String = "$$$&" Dim input As String = "Total Cost: 103.64" Console.WriteLine(Regex.Replace(input, pattern, replacement)) End Sub End Module ' The example displays the following output: ' Total Cost: $103.64

The regular expression pattern \b\d+\.\d{2}\b is interpreted as shown in the following table.

Pattern Description
\b Begin the match at a word boundary.
\d+ Match one or more decimal digits.
\. Match a period.
\d{2} Match two decimal digits.
\b End the match at a word boundary.

The replacement pattern $$$& is interpreted as shown in the following table.

Pattern Replacement string
$$ The dollar sign ($) character.
$& The entire matched substring.

Splitting a Single String into an Array of Strings

The  method splits the input string at the positions defined by a regular expression match. For example, the following code places the items in a numbered list into a string array.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim input As String = "1. Eggs 2. Bread 3. Milk 4. Coffee 5. Tea" Dim pattern As String = "\b\d{1,2}\.\s" For Each item As String In Regex.Split(input, pattern) If Not String.IsNullOrEmpty(item) Then Console.WriteLine(item) End If Next End Sub End Module ' The example displays the following output: ' Eggs ' Bread ' Milk ' Coffee ' Tea

The regular expression pattern \b\d{1,2}\.\s is interpreted as shown in the following table.

Pattern Description
\b Begin the match at a word boundary.
\d{1,2} Match one or two decimal digits.
\. Match a period.
\s Match a white-space character.

The MatchCollection and Match Objects

Regex methods return two objects that are part of the regular expression object model: the  object, and the object.

The Match Collection

The  method returns a  object that contains  objects that represent all the matches that the regular expression engine found, in the order in which they occur in the input string. If there are no matches, the method returns a  object with no members. The  property lets you access individual members of the collection by index, from zero to one less than the value of the  property.  is the collection's indexer (in C#) and default property (in Visual Basic).

By default, the call to the  method uses lazy evaluation to populate the  object. Access to properties that require a fully populated collection, such as the  and  properties, may involve a performance penalty. As a result, we recommend that you access the collection by using the  object that is returned by the  method. Individual languages provide constructs, such as For Each in Visual Basic and foreachin C#, that wrap the collection's  interface.

The following example uses the  method to populate a  object with all the matches found in an input string. The example enumerates the collection, copies the matches to a string array, and records the character positions in an integer array.

VBCopy
Imports System.Collections.GenericImports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim matches As MatchCollection Dim results As New List(Of String) Dim matchposition As New List(Of Integer) ' Create a new Regex object and define the regular expression. Dim r As New Regex("abc") ' Use the Matches method to find all matches in the input string. matches = r.Matches("123abc4abcd") ' Enumerate the collection to retrieve all matches and positions. For Each match As Match In matches ' Add the match string to the string array. results.Add(match.Value) ' Record the character position where the match was found. matchposition.Add(match.Index) Next ' List the results. For ctr As Integer = 0 To results.Count - 1 Console.WriteLine("'{0}' found at position {1}.", _ results(ctr), matchposition(ctr)) Next End Sub End Module ' The example displays the following output: ' 'abc' found at position 3. ' 'abc' found at position 7.

The Match

The  class represents the result of a single regular expression match. You can access  objects in two ways:

  • By retrieving them from the  object that is returned by the  method. To retrieve individual objects, iterate the collection by using a foreach (in C#) or For Each...Next (in Visual Basic) construct, or use the  property to retrieve a specific  object either by index or by name. You can also retrieve individual  objects from the collection by iterating the collection by index, from zero to one less that the number of objects in the collection. However, this method does not take advantage of lazy evaluation, because it accesses the  property.

    The following example retrieves individual  objects from a  object by iterating the collection using the foreach or For Each...Next construct. The regular expression simply matches the string "abc" in the input string.

    VBCopy
    Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "abc" Dim input As String = "abc123abc456abc789" For Each match As Match In Regex.Matches(input, pattern) Console.WriteLine("{0} found at position {1}.", _ match.Value, match.Index) Next End Sub End Module ' The example displays the following output: ' abc found at position 0. ' abc found at position 6. ' abc found at position 12.
  • By calling the  method, which returns a  object that represents the first match in a string or a portion of a string. You can determine whether the match has been found by retrieving the value of the Match.Success property. To retrieve  objects that represent subsequent matches, call the  method repeatedly, until the Success property of the returned  object is false.

    The following example uses the  and  methods to match the string "abc" in the input string.

    VBCopy
    Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "abc" Dim input As String = "abc123abc456abc789" Dim match As Match = Regex.Match(input, pattern) Do While match.Success Console.WriteLine("{0} found at position {1}.", _ match.Value, match.Index) match = match.NextMatch() Loop End Sub End Module ' The example displays the following output: ' abc found at position 0. ' abc found at position 6. ' abc found at position 12.

Two properties of the  class return collection objects:

  • The  property returns a  object that contains information about the substrings that match capturing groups in the regular expression pattern.

  • The Match.Captures property returns a  object that is of limited use. The collection is not populated for a  object whose Success property is false. Otherwise, it contains a single  object that has the same information as the  object.

For more information about these objects, see  and  sections later in this topic.

Two additional properties of the  class provide information about the match. The Match.Value property returns the substring in the input string that matches the regular expression pattern. The Match.Index property returns the zero-based starting position of the matched string in the input string.

The  class also has two pattern-matching methods:

  • The  method finds the match after the match represented by the current  object, and returns a object that represents that match.

  • The  method performs a specified replacement operation on the matched string and returns the result.

The following example uses the  method to prepend a $ symbol and a space before every number that includes two fractional digits.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "\b\d+(,\d{3})*\.\d{2}\b" Dim input As String = "16.32" + vbCrLf + "194.03" + vbCrLf + "1,903,672.08" For Each match As Match In Regex.Matches(input, pattern) Console.WriteLine(match.Result("$$ $&")) Next End Sub End Module ' The example displays the following output: ' $ 16.32 ' $ 194.03 ' $ 1,903,672.08

The regular expression pattern \b\d+(,\d{3})*\.\d{2}\b is defined as shown in the following table.

Pattern Description
\b Begin the match at a word boundary.
\d+ Match one or more decimal digits.
(,\d{3})* Match zero or more occurrences of a comma followed by three decimal digits.
\. Match the decimal point character.
\d{2} Match two decimal digits.
\b End the match at a word boundary.

The replacement pattern $$ $& indicates that the matched substring should be replaced by a dollar sign ($) symbol (the $$pattern), a space, and the value of the match (the $& pattern).

The Group Collection

The  property returns a  object that contains  objects that represent captured groups in a single match. The first  object in the collection (at index 0) represents the entire match. Each object that follows represents the results of a single capturing group.

You can retrieve individual  objects in the collection by using the  property. You can retrieve unnamed groups by their ordinal position in the collection, and retrieve named groups either by name or by ordinal position. Unnamed captures appear first in the collection, and are indexed from left to right in the order in which they appear in the regular expression pattern. Named captures are indexed after unnamed captures, from left to right in the order in which they appear in the regular expression pattern. To determine what numbered groups are available in the collection returned for a particular regular expression matching method, you can call the instance  method. To determine what named groups are available in the collection, you can call the instance  method. Both methods are particularly useful in general-purpose routines that analyze the matches found by any regular expression.

The  property is the indexer of the collection in C# and the collection object's default property in Visual Basic. This means that individual  objects can be accessed by index (or by name, in the case of named groups) as follows:

VBCopy
Dim group As Group = match.Groups(ctr)

The following example defines a regular expression that uses grouping constructs to capture the month, day, and year of a date.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "\b(\w+)\s(\d{1,2}),\s(\d{4})\b" Dim input As String = "Born: July 28, 1989" Dim match As Match = Regex.Match(input, pattern) If match.Success Then For ctr As Integer = 0 To match.Groups.Count - 1 Console.WriteLine("Group {0}: {1}", ctr, match.Groups(ctr).Value) Next End If End Sub End Module ' The example displays the following output: ' Group 0: July 28, 1989 ' Group 1: July ' Group 2: 28 ' Group 3: 1989

The regular expression pattern \b(\w+)\s(\d{1,2}),\s(\d{4})\b is defined as shown in the following table.

Pattern Description
\b Begin the match at a word boundary.
(\w+) Match one or more word characters. This is the first capturing group.
\s Match a white-space character.
(\d{1,2}) Match one or two decimal digits. This is the second capturing group.
, Match a comma.
\s Match a white-space character.
(\d{4}) Match four decimal digits. This is the third capturing group.
\b End the match on a word boundary.

The Captured Group

The  class represents the result from a single capturing group. Group objects that represent the capturing groups defined in a regular expression are returned by the  property of the  object returned by the  property. The  property is the indexer (in C#) and the default property (in Visual Basic) of the  class. You can also retrieve individual members by iterating the collection using the foreach or For Each construct. For an example, see the previous section.

The following example uses nested grouping constructs to capture substrings into groups. The regular expression pattern (a(b))cmatches the string "abc". It assigns the substring "ab" to the first capturing group, and the substring "b" to the second capturing group.

VBCopy
Dim matchposition As New List(Of Integer) Dim results As New List(Of String) ' Define substrings abc, ab, b. Dim r As New Regex("(a(b))c") Dim m As Match = r.Match("abdabc") Dim i As Integer = 0 While Not (m.Groups(i).Value = "") ' Add groups to string array. results.Add(m.Groups(i).Value) ' Record character position. matchposition.Add(m.Groups(i).Index) i += 1 End While ' Display the capture groups. For ctr As Integer = 0 to results.Count - 1 Console.WriteLine("{0} at position {1}", _ results(ctr), matchposition(ctr)) Next ' The example displays the following output: ' abc at position 3 ' ab at position 3 ' b at position 4

The following example uses named grouping constructs to capture substrings from a string that contains data in the format "DATANAME:VALUE", which the regular expression splits at the colon (:).

VBCopy
Dim r As New Regex("^(?
\w+):(?
\w+)")Dim m As Match = r.Match("Section1:119900") Console.WriteLine(m.Groups("name").Value) Console.WriteLine(m.Groups("value").Value) ' The example displays the following output: ' Section1 ' 119900

The regular expression pattern ^(?<name>\w+):(?<value>\w+) is defined as shown in the following table.

Pattern Description
^ Begin the match at the beginning of the input string.
(?<name>\w+) Match one or more word characters. The name of this capturing group is name.
: Match a colon.
(?<value>\w+) Match one or more word characters. The name of this capturing group is value.

The properties of the  class provide information about the captured group: The Group.Value property contains the captured substring, the Group.Index property indicates the starting position of the captured group in the input text, the Group.Lengthproperty contains the length of the captured text, and the Group.Success property indicates whether a substring matched the pattern defined by the capturing group.

Applying quantifiers to a group (for more information, see ) modifies the relationship of one capture per capturing group in two ways:

  • If the * or *? quantifier (which specifies zero or more matches) is applied to a group, a capturing group may not have a match in the input string. When there is no captured text, the properties of the  object are set as shown in the following table.

    Group property Value
    Success false
    Value
    Length 0

    The following example provides an illustration. In the regular expression pattern aaa(bbb)*ccc, the first capturing group (the substring "bbb") can be matched zero or more times. Because the input string "aaaccc" matches the pattern, the capturing group does not have a match.

    VBCopy
    Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "aaa(bbb)*ccc" Dim input As String = "aaaccc" Dim match As Match = Regex.Match(input, pattern) Console.WriteLine("Match value: {0}", match.Value) If match.Groups(1).Success Then Console.WriteLine("Group 1 value: {0}", match.Groups(1).Value) Else Console.WriteLine("The first capturing group has no match.") End If End Sub End Module ' The example displays the following output: ' Match value: aaaccc ' The first capturing group has no match.
  • Quantifiers can match multiple occurrences of a pattern that is defined by a capturing group. In this case, the Value and Length properties of a  object contain information only about the last captured substring. For example, the following regular expression matches a single sentence that ends in a period. It uses two grouping constructs: The first captures individual words along with a white-space character; the second captures individual words. As the output from the example shows, although the regular expression succeeds in capturing an entire sentence, the second capturing group captures only the last word.

    VBCopy
    Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "\b((\w+)\s?)+\." Dim input As String = "This is a sentence. This is another sentence." Dim match As Match = Regex.Match(input, pattern) If match.Success Then Console.WriteLine("Match: " + match.Value) Console.WriteLine("Group 2: " + match.Groups(2).Value) End If End Sub End Module ' The example displays the following output: ' Match: This is a sentence. ' Group 2: sentence

The Capture Collection

The  object contains information only about the last capture. However, the entire set of captures made by a capturing group is still available from the  object that is returned by the  property. Each member of the collection is a  object that represents a capture made by that capturing group, in the order in which they were captured (and, therefore, in the order in which the captured strings were matched from left to right in the input string). You can retrieve individual objects from the collection in either of two ways:

  • By iterating through the collection using a construct such as foreach (in C#) or For Each (in Visual Basic).

  • By using the  property to retrieve a specific object by index. The  property is the  object's default property (in Visual Basic) or indexer (in C#).

If a quantifier is not applied to a capturing group, the  object contains a single  object that is of little interest, because it provides information about the same match as its  object. If a quantifier is applied to a capturing group, the  object contains all captures made by the capturing group, and the last member of the collection represents the same capture as the  object.

For example, if you use the regular expression pattern ((a(b))c)+ (where the + quantifier specifies one or more matches) to capture matches from the string "abcabcabc", the  object for each  object contains three members.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim pattern As String = "((a(b))c)+" Dim input As STring = "abcabcabc" Dim match As Match = Regex.Match(input, pattern) If match.Success Then Console.WriteLine("Match: '{0}' at position {1}", _ match.Value, match.Index) Dim groups As GroupCollection = match.Groups For ctr As Integer = 0 To groups.Count - 1 Console.WriteLine(" Group {0}: '{1}' at position {2}", _ ctr, groups(ctr).Value, groups(ctr).Index) Dim captures As CaptureCollection = groups(ctr).Captures For ctr2 As Integer = 0 To captures.Count - 1 Console.WriteLine(" Capture {0}: '{1}' at position {2}", _ ctr2, captures(ctr2).Value, captures(ctr2).Index) Next Next End If End Sub End Module ' The example dosplays the following output: ' Match: 'abcabcabc' at position 0 ' Group 0: 'abcabcabc' at position 0 ' Capture 0: 'abcabcabc' at position 0 ' Group 1: 'abc' at position 6 ' Capture 0: 'abc' at position 0 ' Capture 1: 'abc' at position 3 ' Capture 2: 'abc' at position 6 ' Group 2: 'ab' at position 6 ' Capture 0: 'ab' at position 0 ' Capture 1: 'ab' at position 3 ' Capture 2: 'ab' at position 6 ' Group 3: 'b' at position 7 ' Capture 0: 'b' at position 1 ' Capture 1: 'b' at position 4 ' Capture 2: 'b' at position 7

The following example uses the regular expression (Abc)+ to find one or more consecutive runs of the string "Abc" in the string "XYZAbcAbcAbcXYZAbcAb". The example illustrates the use of the  property to return multiple groups of captured substrings.

VBCopy
Dim counter As IntegerDim m As Match Dim cc As CaptureCollection Dim gc As GroupCollection ' Look for groupings of "Abc". Dim r As New Regex("(Abc)+") ' Define the string to search. m = r.Match("XYZAbcAbcAbcXYZAbcAb") gc = m.Groups ' Display the number of groups. Console.WriteLine("Captured groups = " & gc.Count.ToString()) ' Loop through each group. Dim i, ii As Integer For i = 0 To gc.Count - 1 cc = gc(i).Captures counter = cc.Count ' Display the number of captures in this group. Console.WriteLine("Captures count = " & counter.ToString()) ' Loop through each capture in the group. For ii = 0 To counter - 1 ' Display the capture and its position. Console.WriteLine(cc(ii).ToString() _ & " Starts at character " & cc(ii).Index.ToString()) Next ii Next i ' The example displays the following output: ' Captured groups = 2 ' Captures count = 1 ' AbcAbcAbc Starts at character 3 ' Captures count = 3 ' Abc Starts at character 3 ' Abc Starts at character 6 ' Abc Starts at character 9

The Individual Capture

The  class contains the results from a single subexpression capture. The  property contains the matched text, and the  property indicates the zero-based position in the input string at which the matched substring begins.

The following example parses an input string for the temperature of selected cities. A comma (",") is used to separate a city and its temperature, and a semicolon (";") is used to separate each city's data. The entire input string represents a single match. In the regular expression pattern ((\w+(\s\w+)*),(\d+);)+, which is used to parse the string, the city name is assigned to the second capturing group, and the temperature is assigned to the fourth capturing group.

VBCopy
Imports System.Text.RegularExpressionsModule Example   Public Sub Main() Dim input As String = "Miami,78;Chicago,62;New York,67;San Francisco,59;Seattle,58;" Dim pattern As String = "((\w+(\s\w+)*),(\d+);)+" Dim match As Match = Regex.Match(input, pattern) If match.Success Then Console.WriteLine("Current temperatures:") For ctr As Integer = 0 To match.Groups(2).Captures.Count - 1 Console.WriteLine("{0,-20} {1,3}", match.Groups(2).Captures(ctr).Value, _ match.Groups(4).Captures(ctr).Value) Next End If End Sub End Module ' The example displays the following output: ' Current temperatures: ' Miami 78 ' Chicago 62 ' New York 67 ' San Francisco 59

The regular expression is defined as shown in the following table.

Pattern Description
\w+ Match one or more word characters.
(\s\w+)* Match zero or more occurrences of a white-space character followed by one or more word characters. This pattern matches multi-word city names. This is the third capturing group.
(\w+(\s\w+)*) Match one or more word characters followed by zero or more occurrences of a white-space character and one or more word characters. This is the second capturing group.
, Match a comma.
(\d+) Match one or more digits. This is the fourth capturing group.
; Match a semicolon.
((\w+(\s\w+)*),(\d+);)+ Match the pattern of a word followed by any additional words followed by a comma, one or more digits, and a semicolon, one or more times. This is the first capturing group.

See also

 

转载于:https://www.cnblogs.com/freeliver54/p/10800357.html

你可能感兴趣的文章
跟Google学习Android开发-起始篇-用碎片构建一个动态的用户界面(3)
查看>>
精密整流电路(AD630)
查看>>
实验四
查看>>
js判断手指滑动方向(移动端)
查看>>
POJ 2112 Optimal Milking (Dinic + Floyd + 二分)
查看>>
HDU 1003 Max Sum 求区间最大值 (尺取法)
查看>>
简单实现web单点登录
查看>>
辣鸡蒟蒻的每日打卡
查看>>
IOS9 Swift
查看>>
SOCKET类型定义及应用
查看>>
[CF191](Fools and Roads)
查看>>
浏览器常见状态码
查看>>
Django model 反向引用中的related_name
查看>>
电信网关-天翼网关-GPON-HS8145C设置桥接路由拨号认证
查看>>
稀疏矩阵
查看>>
网络爬虫设计中需要注意的几个问题
查看>>
POJ 2369 Permutations (置换的秩P^k = I)
查看>>
点击模态框滑动出来 抽屉
查看>>
再看设计模式——观察者模式
查看>>
JavaWeb学习-1
查看>>