236 CHAPTER 10 (Web site traffic) BATTERIES INCLUDED The Wildcard

236 CHAPTER 10 BATTERIES INCLUDED The Wildcard A regexp can match more than one string, and you create such a pattern by using some special characters. For example, the period character (dot) matches any character (except a newline), so the regular expression ‘.ython’ would match both the string ‘python’ and the string ‘jython’. It would also match strings such as ‘qython’, ‘+ython’, or ‘ ython’(in which the first letter is a single space), but not strings such as ‘cpython’or ‘ython’ because the period matches a single letter, and neither two nor zero. Because it matches anything (any single character except a newline), the period is called a wildcard. Escaping Special Characters When you use special characters such as this, it s important to know that you may run into problems if you try to use them as normal characters. For example, imagine you want to match the string ‘python.org’. Do you simply use the pattern ‘python.org’? You could, but that would also match ‘pythonzorg’, for example, which you probably wouldn t want. (The dot matches any character except newline, remember?) To make a special character behave like a normal one, you escape it, just as I demonstrated how to escape quotes in strings in Chapter 1. You place a backslash in front of it. Thus, in this example, you would use ‘python\.org’, which would match ‘python.org’, and nothing else. Note To get a single backslash, which is required here by the re module, you need to write two back- slashes in the string to escape it from the interpreter. Thus you have two levels of escaping here: (1) from the interpreter, and (2) from the re module. (Actually, in some cases you can get away with using a single backslash and have the interpreter escape it for you automatically, but don t rely on it.) If you are tired of doubling up backslashes, use a raw string, such as r’python.org’. Character Sets Matching any character can be useful, but sometimes you want more control. You can create a so-called character set by enclosing a substring in brackets. Such a character set will match any of the characters it contains, so ‘[pj]ython’ would match both ‘python’and ‘jython’, but nothing else. You can also use ranges, such as ‘[a-z]’ to match any character from a to z (alphabetically), and you can combine such ranges by putting one after another, such as ‘[a-zA-Z0-9]’ to match uppercase and lowercase letters and digits. (Note that the character set will match only one such character, though.) To invert the character set, put the character ^ first, as in ‘[^abc]’ to match any character except a, b, or c.
We recommend cheap and reliable webhost to host and run your web applications: Coldfusion Web Hosting services.

Comments are closed.