ASPN ActiveState Programmer Network
  ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups | Web Services
SEARCH

Reference
ActiveTcl 8.5
ActiveTcl 8.5.4.0 User Guide
ActiveTcl 8.5.4.0 Documentation Index
Tutorial
Adding & Deleting members of a list
Adding new commands to Tcl - proc
Assigning values to variables
Associative Arrays
Building reusable libraries - packages and namespaces
Changing Working Directory - cd, pwd
Channel I/O: socket, fileevent, vwait
Child interpreters
Command line arguments and environment strings
Creating Commands - eval
Debugging & Errors - errorInfo errorCode catch error return
Evaluation & Substitutions 1: Grouping arguments with ""
Evaluation & Substitutions 2: Grouping arguments with {}
Evaluation & Substitutions 3: Grouping arguments with []
File Access 101
Information about Files - file, glob
Information about procs - info
Introduction
Invoking Subprocesses from Tcl - exec, open
Learning the existence of commands and variables ? - info
Leftovers - time, unset
Looping 101 - While loop
Looping 102 - For and incr
Modifying Strings - tolower, toupper, trim, format
Modularization - source
More channel I/O - fblocked & fconfigure
More command construction - format, list
More Debugging - trace
More Examples Of Regular Expressions
More list commands - lsearch, lsort, lrange
More On Arrays - Iterating and use in procedures
More Quoting Hell - Regular Expressions 102
Numeric Comparisons 101 - if
Regular Expressions 101
Results of a command - Math 101
Simple pattern matching - "globbing"
Simple Text Output
State of the interpreter - info
String comparisons - compare match first last wordend
String Subcommands - length index range
Substitution without evaluation - format, subst
Tcl Data Structures 101 - The list
Textual Comparison - switch
Time and Date - clock
Variable scope - global and upvar
Variations in proc arguments and return values

MyASPN >> Reference >> ActiveTcl 8.5 >> ActiveTcl 8.5.4.0 User Guide >> ActiveTcl 8.5.4.0 Documentation Index >> Tutorial
ActiveTcl 8.5 documentation

Regular Expressions 101

Tcl also supports string operations known as regular expressions Several commands can access these methods with a -regexp argument, see the man pages for which commands support regular expressions.

There are also two explicit commands for parsing regular expressions.

regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?
Searches string for the regular expression exp. If a parameter matchVar is given, then the substring that matches the regular expression is copied to matchVar. If subMatchN variables exist, then the parenthetical parts of the matching string are copied to the subMatch variables, working from left to right.
regsub ?switches? exp string subSpec varName
Searches string for substrings that match the regular expression exp and replaces them with subSpec. The resulting string is copied into varName.

Regular expressions can be expressed in just a few rules.

^
Matches the beginning of a string
$
Matches the end of a string
.
Matches any single character
*
Matches any count (0-n) of the previous character
+
Matches any count, but at least 1 of the previous character
[...]
Matches any character of a set of characters
[^...]
Matches any character *NOT* a member of the set of characters following the ^.
(...)
Groups a set of characters into a subSpec.

Regular expressions are similar to the globbing that was discussed in lessons 16 and 18. The main difference is in the way that sets of matched characters are handled. In globbing the only way to select sets of unknown text is the * symbol. This matches to any quantity of any character.

In regular expression parsing, the * symbol matches zero or more occurrences of the character immediately proceeding the *. For example a* would match a, aaaaa, or a blank string. If the character directly before the * is a set of characters within square brackets, then the * will match any quantity of all of these characters. For example, [a-c]* would match aa, abc, aabcabc, or again, an empty string.

The + symbol behaves roughly the same as the *, except that it requires at least one character to match. For example, [a-c]+ would match a, abc, or aabcabc, but not an empty string.

Regular expression parsing is more powerful than globbing. With globbing you can use square brackets to enclose a set of characters any of which will be a match. Regular expression parsing also includes a method of selecting any character not in a set. If the first character after the [ is a caret (^), then the regular expression parser will match any character not in the set of characters between the square brackets. A caret can be included in the set of characters to match (or not) by placing it in any position other than the first.

The regexp command is similar to the string match command in that it matches an exp against a string. It is different in that it can match a portion of a string, instead of the entire string, and will place the characters matched into the matchVar variable.

If a match is found to the portion of a regular expression enclosed within parentheses, regexp will copy the subset of matching characters is to the subSpec argument. This can be used to parse simple strings.

Regsub will copy the contents of the string to a new variable, substituting the characters that match exp with the characters in subSpec. If subSpec contains a & or \0, then those characters will be replaced by the characters that matched exp. If the number following a backslash is 1-9, then that backslash sequence will be replaced by the appropriate portion of exp that is enclosed within parentheses.

Note that the exp argument to regexp or regsub is processed by the Tcl substitution pass. Therefore quite often the expression is enclosed in braces to prevent any special processing by Tcl.


Example

set sample "Where there is a will, There is a way."

#
# Match the first substring with lowercase letters only
#
set result [regexp {[a-z]+} $sample match]
puts "Result: $result match: $match"

#
# Match the first two words, the first one allows uppercase
set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts "Result: $result Match: $match 1: $sub1 2: $sub2"

#
# Replace a word
#
regsub "way" $sample "lawsuit" sample2
puts "New: $sample2"

#
# Use the -all option to count the number of "words"
#
puts "Number of words: [regexp -all {[^ ]} $sample]"
   

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState 2004 All rights reserved