String Patterns: Difference between revisions

From Legacy Roblox Wiki
Jump to navigationJump to search
>Crazypotato4
I changed [^%s]+ to %S+, because to my knowledge they're the same exact thing.
Adding categories
 
(44 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{ScriptTutorial|intermediate|scripting}}
==What are String Patterns?==
==What are String Patterns?==
String patterns are, in essence, just [[String|strings]]. What makes them different from ordinary strings then, you ask? String patterns are strings that use a special combination of characters. These characters combinations are generally used with functions in the string library such as 'string.match' and 'string.gsub' to do interesting things with strings. For instance, with string patterns you can do something like this:
String patterns are, in essence, just [[String|strings]]. What makes them different from ordinary strings then, you ask? String patterns are strings that use a special combination of characters. These characters combinations are generally used with functions in the string library such as 'string.match' and 'string.gsub' to do interesting things with strings. For instance, with string patterns you can do something like this:


<pre>
{{code and output|code=
local s = "I am a string!"
local s = "I am a string!"
for i in string.gmatch(s, "%S+") do --Where "%S+" is the string pattern.
for i in string.gmatch(s, "%S+") do --Where "%S+" is the string pattern.
Line 8: Line 9:
end
end


Output:
|output=
I
I
am
am
a
a
string!
string!
</pre>
}}


But what makes the code above so cool? Perhaps you've wanted to make a list of people without using a [[Tables|table]], or maybe you need to [[Text_Parsing_Tutorial|parse]] a string. String patterns can help do this!
But what makes the code above so cool? Perhaps you've wanted to make a list of people without using a [[Tables|table]], or maybe you need to [[Text_Parsing_Tutorial|parse]] a string. String patterns can help do this!




==The Basics of String Patterns==
As said before, string patterns are strings that look a little different and are used for a different purpose than what strings are usually used for. Here we will look at the basics of just what make a string pattern up. Here we will look at just what the different parts of a string pattern mean.
As said before, string patterns are strings that look a little different and are used for a different purpose than what strings are usually used for. Here we will look at the basics of just what make a string pattern up. Here we will look at just what the different parts of a string pattern mean.


===Character Classes===
In these examples, we will use the [[Function_Dump/String_Manipulation#string.match_.28s.2C_pattern_.5B.2C_init.5D.29|string.match]] function.
Character classes in string patterns stand for a range or set of characters. Let's look at the classes listed below.


*%a
==Simple matching==
:*This character class represents all letters no matter if they're lowercase or uppercase.
Guess what? You already know some string patterns! Any string is a pattern!
:*Some examples are: 'a', 'd', 'F', and 'G'.


*%l
{{code and output|code=
:*This character class represents all lowercase letters.
local pattern = "Roblox"
:*Some examples are: 'a', 'd', 'f', and 'g'.
print( ("Welcome to Roblox"):match(pattern) )
 
print( ("Welcome to the Wiki"):match(pattern) )
*%u
|output=
:*This character class represents all uppercase letters.
Roblox
:*Some examples are: 'A', 'B', 'D', and 'Z'.
 
*%p
:*This character class represents all punctuation characters.
:*Some examples are: ".", "?", "+", and "/".
 
*%w
:*This character class represents all alphanumeric characters.
::*This means that this class encompasses both letters and numbers.
:*Some examples are 'A',  'f', '3', and '7'.
 
*%d
:*This class represents all base 10 numbers.
:*Examples are '0', '1', '2' all the way up to '9'
 
*%s
:*This character class represents all space characters.
:*Some examples are ' ', '\n', and '\r'
 
*%c
:*This character class represents all control characters.
:*Control characters are characters with an ASCII code below 32 and also ASCII code 127
:*Control characters are all non-printing meaning that they don't represent a symbol representation.
 
*%x
:*This character class represents all hexadecimal (Base 16) characters.
:*Some examples are '21' which represents '!' and '5A' which represents 'Z'
 
*%z
:*This character class represents the character '\0'.
:*This character is commonly referred to as NUL.
 
*The dot character class
:*This class is represented by a single dot '.'
:*This class represents all characters, every single one.
:*Unlike the others, it is not preceded by a '%' sign.
 
 
As you can see, each of the character classes are used to represent a set of characters. Now let's look at some of the many things we can do with just these character classes.
 
 
Classes can also be used to represent a sequence of a type of characters. For instance, %d%l would match a number that is followed by a lowercase letter. Look at the following example:
<pre>
local s = "abc123"
local Pattern = "%a%a%a%d" --Matches three letters and a digit
print( string.match( s, Pattern ) )
 
Output:
abc1
</pre>
 
 
One of the things you might notice about the character classes I mentioned above, is that they are all lowercase. Making them capitals reverses their effect. For instance, %s represents spaces, but %S represents everything except space characters. %l represents lowercase letters which %L represents its compliment, all characters except those that are lowercase letters. Let's look at this example:
<pre>
local s1 = "a4-2" --Letter, not a letter, punctuation, not a letter
local s2 = "aA-2" --Letter, letter, punctuation, not a letter
local Pattern = "%a%A%p%A" --Matches a letter, not a letter, punctuation, and not a letter
print( string.match( s1, Pattern ) )
print( string.match( s2, Pattern ) )
 
Output:
a4-2
nil
nil
</pre>
}}
Why did it print out a4-2? It's because that s1 matched the pattern while s2 did not match the pattern.
==Character Classes==
There's only so far we can go by using this kind of pattern matching. Sometimes, we want to match any of a set of characters. Here's an example:


===Pattern Items===
{{code and output|code=
Pattern items can be used to make your code simpler. Here are the pattern items and their definitions, we will explain them below.
local pattern = "%d words"
print( ("This sentence has 5 words"):match(pattern) )
print( ("This one has more than 2 words"):match(pattern) )
|output=
5 words
2 words
}}


:*a single character class, which matches a single character in the string
The following table shows the meaning of each character class:


:*a single character class followed by a '+', which matches 1 or more repetitions in the string. These repetition items will always match the longest possible sequence.
{| class="wikitable"
|-
! Pattern  !! Represents                    !! Example matches
|-
| .  || Any character                || #32kas321fslk#?@34
|-
| %a || An uppercase or lowercase letter || aBcDeFgHiJkLmNoPqRsTuVwXyZ
|-
| %l || A lowercase letter              || abcdefghijklmnopqrstuvwxyz
|-
| %u || An uppercase letter              || ABCDEFGHIJKLMNOPQRSTUVWXYZ
|-
| %p || Any punctuation character        || #^;,.
|-
| %w || An alphanumeric character - either a letter or a digit || aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789
|-
| %d || Any Digit                        || 0123456789
|-
| %s || A whitespace character          || &nbsp;, \n, and \r
|-
| %c || A [http://en.wikipedia.org/wiki/Control_character control character] ||
|-
| %x || A hexadecimal (Base 16) digit  || 0123456789ABCDEF
|-
| %z || The NULL character, '\0'       ||
|-
| %f || The [http://lua-users.org/wiki/FrontierPattern frontier pattern] (not officially documented)      ||
|-
| %bxy || The balanced capture. It matches x, y, and everything in between. It allows for the nesting of balanced captures as well. (Note: x and y must be different) || %b() captures everything between parentheses (included them).
|}


:*a single character class followed by a '*' (asterisk), which matches 0 or more repetitions in the string. These repetition items will always match the longest possible sequence.
Any non-magic character (not one of {{`|^$()%.[]*+-?}}), represents itself in a pattern. To search for a literal magic character, precede it by a space - for example, to look for a percent symbol, use {{`|%%}}.


:*a single character class followed by a '-', which matches 0 or more repetitions in the string. These repetitions will always match the shortest possible sequence.
One of the things you might notice about the character classes above is that they are all lowercase. Making them uppercase reverses their effect. For instance, {{`|%s}} represents whitespace, but {{`|%S}} represents any non-whitespace character. {{`|%l}} represents a lowercase letter while {{`|%L}} represents its compliment - any characters but a lowercase letter. Let's look at this example, which matches a digit, followed by four non-digits:


:*a single character class followed by a '?', which matches 0 or 1 occurrence of the string.
{{code and output|code=
local pattern = "%d%D%D%D%D%D"
print( ("This sentence has 5 words"):match(pattern) )
print( ("21 times 3 equals 63"):match(pattern) )
|output=
5 word
1 time
}}


==Quantifiers==
Character classes allow you to match any character. Quantifiers allow you to match any number of characters


Now let's look at how to use them. In these examples, we will use the [[Function_Dump/String_Manipulation#string.match_.28s.2C_pattern_.5B.2C_init.5D.29|string.match]] function. Lets say you have a string like this:
{| class="wikitable"
<pre>
|-
local s = "abc1234567efg"
! Pattern !! Meaning
</pre>
|-
and you want to retrieve the numbers from that string with patterns. One way you could do that is by using the pattern "%d%d%d%d%d%d%d" which would match seven digits in a row. But what happens if you don't know how many digits there are? For this, you can use pattern items, specifically the '+' pattern item for this example.
| {{`|?}} || Match 0 or 1 of the preceding character specifier
|-
| {{`|*}} || Match 0 or more of the preceding character specifier
|-
| {{`|+}} || Match 1 or more of the preceding character specifier
|-
| {{`|-}} || Match as few of the preceding character specifier as possible
|}


<pre>
=== The {{`|+}} quantifier ===
local s = "abc1234567efg"
Let's say you have a string that contains a number, such as {{`|"It costs 100 tix"}},
local Pattern = "%d+" --See how I used the '+' pattern item to make it shorter?
and you want to extract the number. If you know how many digits the number has, you could use the pattern {{`|%d%d%d}} which would match three digits in a row. But what happens if you don't know how many digits there are? For this, you can use quantifiers. In this example, the {{`|+}} quantifier is suitable.
print( string.match( s, Pattern ) )


Output:
{{code and output|code=
1234567
local pattern = "%d+"
</pre>
print( ("It costs 100 tix"):match(pattern) )
print( ("It costs OVAR 9000 tix"):match(pattern) )
|output=
100
9000
}}
Now how does this work exactly? As we know, a character class followed by a '+' matches one or more repetitions. For this example, it means that it would match the first digits it finds until it reaches the end of the string or a non-digit.
Now how does this work exactly? As we know, a character class followed by a '+' matches one or more repetitions. For this example, it means that it would match the first digits it finds until it reaches the end of the string or a non-digit.


=== The {{`|*}} quantifier ===
The difference between {{`|+}} and {{`|*}} is that {{`|+}} matches 1 or more characters, while {{`|*}} matches 0 or more. This means that if the character class that is followed by this quantifier isn't represented in the string, it doesn't matter, because no matches are required.
{{code and output|code=
local pattern = "%d%p*%d" --Matches a digit followed by 0 or more punctuation character followed by another digit.
print( ("1,!643"):match(pattern) )
print( ("12349"):match(pattern) )


Now let's take a look at the next pattern item '*'. The difference between '+' and the '*' items is that the '+' item matches 1 or more while the '*' item matches 0 or more. This means that if the character class that is followed by this pattern item isn't represented in the string, it doesn't matter because no matches are required.
|output=
<pre>
local s1 = "1,!643"
local s2 = "12349"
local Pattern = "%d%p*%d" --Matches a digit followed by 0 or more punctuation character followed by another digit.
print( string.match( s1, Pattern ) )
print( string.match( s2, Pattern ) )
 
Output:
1,!6
1,!6
12
12
</pre>
}}
As you can see, it matches a digit, punctuation characters (if there is one), and then another digit. If you had used the '+' item, the second example would have returned nil because that pattern item requires at least one match. The '*' pattern is very useful when you have something in the string that is optional.
As you can see, it matches a digit, punctuation characters (if there are any), and then another digit. If you had used {{`|+}}, the second example would have returned nil, because {{`|+}} requires at least one match. The {{`|*}} pattern is very useful when you have something in the string that is optional.


=== The {{`|-}} quantifier ===
Unlike {{`|*}} and {{`|+}}, {{`|-}} matches the shortest possible sequence. For example, if you have a path name, and you want to retrieve a part of the string between {{`|/}}s, then you can use the {{`|-}} item. This example shows you the difference you'd get if you used '-' compared to the '*' item.
{{code and output|code=
local s = "C:/Users/Telamon/Documents"
print( s:match("/.-/") )
print( s:match("/.+/") )
|output=
/Users/
/Users/Telamon/
}}
From the example, you see that the {{`|-}} found the shortest possible sequence and stopped at the second {{`|/}}, while the {{`|*}} matched the longest sequence and stopped only at the last {{`|/}} in the string.


Unlike the '*' and '+' pattern items, the '-' item matches the shortest possible sequence. For example, if you have a string that starts and ends with a digit and you want to retrieve a part of the string only up to the second digit in the string, then you can use the '-' item. This example shows you the difference you'd get if you used '-' compared to the '*' item.
=== The {{`|?}} quantifier ===
<pre>
The {{`|?}} quantifier is used to make certain characters in the string optional.
local s = "5ab2__0"
{{code and output|code=
local Pattern1 = "%d.-%d" --Matches a digit followed by any character using the shortest possible sequence followed by another digit.
local pattern = "wik?is?"
local Pattern2 = "%d.*%d" --The same as the above except it matches the longest possible sequence.
print( ("This is the wiki"):match(pattern) )
print( string.match( s, Pattern1 ) )
print( ("There are multiple wikis"):match(pattern) )
print( string.match( s, Pattern2 ) )
print( ("You do not spell it wikki"):match(pattern) )
print( ("This is not a wii"):match(pattern) )


Output:
|output=
5ab2
wiki
5ab2__0
wikis
</pre>
nil
From the example, you see that using the '-' item found the shortest possible sequence and stopped at the second digit while using the '*' item matched the longest sequence and stopped only at the last digit in the string.
wii
 
}}
 
From the example you can see that the {{`|?}} made the s and k optional, allowing the pattern to match "wii" and "wikis". However, only one k was allowed, so wikki was not matched
The '?' pattern item is much different than the others because it matches only 0 or 1 occurrence of the string. This is used to make certain characters in the string optional. This makes it a bit like the '*' item except that instead of matching 0 or more occurrences, it only matches 0 or 1.
<pre>
local s1 = "1.56"
local s2 = "7890"
local s3 = "7..890"
local Pattern = "%d%p?%d"
print( string.match( s1, Pattern ) )
print( string.match( s2, Pattern ) )
print( string.match( s3, Pattern ) )
 
Output:
1.5
78
89
</pre>
From the example, you can see, the '?' item matches either 0 or 1. In the first string, there is a single dot between the numbers which this pattern item matches. In the second string, there is no punctuation at all so the punctuation is skipped over. Finally, in the third example, the '?' item matches the first dot, but not the second. In this case, it's skipped over and a match is found immediately after the two dots.


===Sets===
==Sets==
Sets are used when a single character class cannot do the whole job. For instance, you might want to match '''both''' lowercase letters (%l) as well as punctuation characters (%p) using a single class. So how would we do this? Let's take a look at this example:
Sets are used when a single character class cannot do the whole job. For instance, you might want to match '''both''' lowercase letters (%l) as well as punctuation characters (%p) using a single class. So how would we do this? Let's take a look at this example:


<pre>
{{code and output|code=
local s = "123 Hello! I am another string."
local s = "123 Hello! I am another string."
local Pattern = "[%l%p]+"
local pattern = "[%l%p]+"
print(string.match(s, Pattern))
print( s:match(pattern) )
 
Output:
>ello!
</pre>


As you can see from the example, sets are defined by the '[' and ']' around them. You also see that the classes for lowercase letters and punctuation is contained within. This means that the set will act as a class that represents both lowercase and punctuation, unlike if you used %l%p which would match the sequence of a punctuation character following a lowercase letter.
|output=
ello!
}}


As you can see from the example, sets are defined by the '[' and ']' around them. You also see that the classes for lowercase letters and punctuation are contained within. This means that the set will act as a class that represents both lowercase and punctuation, unlike if you used {{`|%l%p}}, which would match a lowercase letter and a punctuation character following it.


You aren't restricted to using only character classes, though! You can also use normal characters to add to the set. Also, you can specify a '''range''' of characters with the '-' symbol. Let's see how this works in the following example:
You aren't restricted to using only character classes, though! You can also use normal characters to add to the set. Also, you can specify a '''range''' of characters with the '-' symbol. Let's see how this works in the following example:


<pre>
{{code and output|code=
local NormCharP = "[3_%l]+" --A set representing the number three, an underscore, and lowercase letters that matches 1 or more repetitions.
--A sequence of threes, underscores, and lowercase letters
local RangeP = "[1-4%u]+" --A set representing the range of numbers 1 to 4 as well as uppercase letters that matches 1 or more repetitions.
local pattern = "[3_%l]+"
local s1 = "Random_123"
local s2 = "37913 Sandwiches!"


for i in string.gmatch(s1, NormCharP) do
for match in ("Random_123"):gmatch(pattern) do
     print(i)
     print(match)
end
end
print("--Next--")
|output=
for i in string.gmatch(s2, RangeP) do
    print(i)
end
 
Output:
andom_
andom_
3
3
--Next--
}}
{{code and output|code=
--A sequence of the numbers 1 to 4 and uppercase letters
local pattern = "[1-4%u]+"
 
for match in ("37913 Sandwiches!"):gmatch(pattern) do
    print(match)
end
|output=
3
3
13
13
S
S
</pre>
}}


From the example, you can see how [[Function_Dump/String_Manipulation#string.gmatch_.28s.2C_pattern.29|string.gmatch]] manipulated strings s1 and s2 using the string patterns. And yet, there's still one last thing you can do. Like with character classes, sets have compliments of themselves.
{{code and output|fit=code|code=
--A sequence of characters which are neither spaces nor one of the numbers 1 to 9
local pattern = "[^%s1-9]+"


From the example, you can see how [[Function_Dump/String_Manipulation#string.gmatch_.28s.2C_pattern.29|string.gmatch]] manipulated strings s1 and s2 using the string patterns. And yet, there's still one last thing you can do. Like with character classes, sets have compliments of themselves.
local result = ""
<pre>
for match in ("He29ll0, I like strings1"):gmatch(pattern) do
local Pattern = "[^%s1-9]+" --Represents all numbers that are not spaces and are not one of the numbers 1 to 9.
     result = result .. match
local s = "He29ll0, I like strings1"
local temp = "
for i in string.gmatch(s, Pattern) do
     temp = temp .. i
end
end
print(temp)
print(result)


Output:
|output=
Hell0,Ilikestrings
Hell0,Ilikestrings
</pre>
}}
This pattern is the compliment of [%s1-9]. As seen from the example, the compliment of a set is defined by using the '^' character at the beginning of the set. All this does is reverse the meaning of the set. As you can easily see from this example, the spaces, the number 29 in the middle of 'Hello', and the 1 at the end were removed.
This pattern is the compliment of {{`|[%s1-9]}}. As seen from the example, the compliment of a set is defined by using the {{`|^}} character at the beginning of the set. All this does is reverse the meaning of the set. As you can easily see from this example, the spaces, the number 29 in the middle of 'Hello', and the 1 at the end were removed.


===Captures===
==Captures==
Captures are used to get pieces of a string that match a capture. Captures are defined by parentheses around them. For instance, (%a%s) is a capture for a letter and a space character. When a capture is matched, it is then stored for future use. Let's look at this example:
Captures are used to get pieces of a string that match a capture. Captures are defined by parentheses around them. For instance, (%a%s) is a capture for a letter and a space character. When a capture is matched, it is then stored for future use. Let's look at this example:
<pre>
{{code and output|code=
local s = "TwentyOne = 21"
local pattern = "(%a+)%s=%s(%d+)"
local Pattern = "(%a+)%s=%s(%d+)"
Start, End, key, val = string.find( s, Pattern ) --see how I used parenthesis to designate my captures? "key" is the first capture, and "val" is the second capture.


print( key, val )
key, val = ("TwentyOne = 21"):match(Pattern)
 
print( key )
Output:
print( val )
>"TwentyOne 21" --See how it only printed the captures designated by the parenthesis?
|output=
</pre>
TwentyOne
21
}}




Now what happens if you want to get a list by using captures? You can use string.gmatch to do this.
Now what happens if you want to get a list by using captures? You can use string.gmatch to do this.
<pre>
{{code and output|code=
local pattern = "(%a+)%s?=%s?(%d+)" --Captures a string of letters seperated by an optional space, an equal, and an optional space and then captures a string of numbers
local s = "TwentyOne = 21 Two=2 One =7 Four= 4"
local s = "TwentyOne = 21 Two=2 One =7 Four= 4"
local Pattern = "(%a+)%s?=%s?(%d+)" --Captures a string of letters seperated by an optional space, an equal, and an optional space and then captures a string of numbers
for key, val in s:gmatch(pattern) do --You see how gmatch returns the captures instead of the matches to the pattern here.
for key, val in string.gmatch(s, Pattern) do --You see how gmatch returns the captures instead of the matches to the pattern here.
     print( key, val )
     print( key, val )
end
end


Output:
|output=
TwentyOne 21
TwentyOne 21
Two 2
Two 2
One 7
One 7
Four 4
Four 4
</pre>
}}
Note that 'key' and 'val' are actually referring to capture 1 and capture 2. The name does not matter, but it is still a good practice to choose a relevant name.
Note that 'key' and 'val' are actually referring to capture 1 and capture 2. The name does not matter, but it is still a good practice to choose a relevant name.
As you can see, string.gmatch iterated through all the matches in the string and returned only the captures which is basically what captures are for, to capture a certain part of the string to use.
As you can see, string.gmatch iterated through all the matches in the string and returned only the captures which is basically what captures are for, to capture a certain part of the string to use.
Line 267: Line 253:


A final thing you can do with captures is that you can leave the captures empty. In these cases they will capture the current position on the string. This means that unlike the other, non-empty captures, a number is returned instead of a string. Look at this example:
A final thing you can do with captures is that you can leave the captures empty. In these cases they will capture the current position on the string. This means that unlike the other, non-empty captures, a number is returned instead of a string. Look at this example:
<pre>
{{code and output|code=
local s = "Hello!"
local pattern = "()%a+()" --Captures the location of the first character, skips over a string of letters, and then captures the next character's position.
local Pattern = "()%a+()" --Captures the location of the first character, skips over a string of letters, and then captures the next character's position.
local cap1, cap2 = ("Hello!"):match(pattern)
local Start, End, cap1, cap2 = string.find( s, Pattern ) )
print( cap1, cap2 )
print( cap1, cap2 )
 
|output=
Output:
1 6
1 6
</pre>
}}
From the example, once a match was found, string.find returned the first and second captures' positions in the string instead of returning the characters 'H' and '!'.
From the example, once a match was found, string.find returned the first and second captures' positions in the string instead of returning the characters 'H' and '!'.


==See also==
==See also==
*[[Function_Dump/String_Manipulation|String Manipulation]]
*[[Function_Dump/String_Manipulation|String Manipulation]]
*[http://www.lua.org/manual/5.1/manual.html#5.4.1 Lua 5.1 Reference Manual: String Patterns]
*[http://www.lua.org/pil/20.2.html Programming in Lua: Patterns]
[[Category:Scripting Tutorials]]

Latest revision as of 21:28, 28 April 2023

This is an intermediate, scripting related tutorial.

What are String Patterns?

String patterns are, in essence, just strings. What makes them different from ordinary strings then, you ask? String patterns are strings that use a special combination of characters. These characters combinations are generally used with functions in the string library such as 'string.match' and 'string.gsub' to do interesting things with strings. For instance, with string patterns you can do something like this:

local s = "I am a string!"
for i in string.gmatch(s, "%S+") do --Where "%S+" is the string pattern.
    print(i)
end

I am a

string!

But what makes the code above so cool? Perhaps you've wanted to make a list of people without using a table, or maybe you need to parse a string. String patterns can help do this!


As said before, string patterns are strings that look a little different and are used for a different purpose than what strings are usually used for. Here we will look at the basics of just what make a string pattern up. Here we will look at just what the different parts of a string pattern mean.

In these examples, we will use the string.match function.

Simple matching

Guess what? You already know some string patterns! Any string is a pattern!

local pattern = "Roblox"
print( ("Welcome to Roblox"):match(pattern) )
print( ("Welcome to the Wiki"):match(pattern) )

Roblox

nil

Character Classes

There's only so far we can go by using this kind of pattern matching. Sometimes, we want to match any of a set of characters. Here's an example:

local pattern = "%d words"
print( ("This sentence has 5 words"):match(pattern) )
print( ("This one has more than 2 words"):match(pattern) )

5 words

2 words

The following table shows the meaning of each character class:

Pattern Represents Example matches
. Any character #32kas321fslk#?@34
%a An uppercase or lowercase letter aBcDeFgHiJkLmNoPqRsTuVwXyZ
%l A lowercase letter abcdefghijklmnopqrstuvwxyz
%u An uppercase letter ABCDEFGHIJKLMNOPQRSTUVWXYZ
%p Any punctuation character #^;,.
%w An alphanumeric character - either a letter or a digit aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789
%d Any Digit 0123456789
%s A whitespace character  , \n, and \r
%c A control character
%x A hexadecimal (Base 16) digit 0123456789ABCDEF
%z The NULL character, '\0'
%f The frontier pattern (not officially documented)
%bxy The balanced capture. It matches x, y, and everything in between. It allows for the nesting of balanced captures as well. (Note: x and y must be different) %b() captures everything between parentheses (included them).

Any non-magic character (not one of ^$()%.[]*+-?), represents itself in a pattern. To search for a literal magic character, precede it by a space - for example, to look for a percent symbol, use %%.

One of the things you might notice about the character classes above is that they are all lowercase. Making them uppercase reverses their effect. For instance, %s represents whitespace, but %S represents any non-whitespace character. %l represents a lowercase letter while %L represents its compliment - any characters but a lowercase letter. Let's look at this example, which matches a digit, followed by four non-digits:

local pattern = "%d%D%D%D%D%D"
print( ("This sentence has 5 words"):match(pattern) )
print( ("21 times 3 equals 63"):match(pattern) )

5 word

1 time

Quantifiers

Character classes allow you to match any character. Quantifiers allow you to match any number of characters

Pattern Meaning
? Match 0 or 1 of the preceding character specifier
* Match 0 or more of the preceding character specifier
+ Match 1 or more of the preceding character specifier
- Match as few of the preceding character specifier as possible

The + quantifier

Let's say you have a string that contains a number, such as "It costs 100 tix", and you want to extract the number. If you know how many digits the number has, you could use the pattern %d%d%d which would match three digits in a row. But what happens if you don't know how many digits there are? For this, you can use quantifiers. In this example, the + quantifier is suitable.

local pattern = "%d+"
print( ("It costs 100 tix"):match(pattern) ) 
print( ("It costs OVAR 9000 tix"):match(pattern) )

100

9000

Now how does this work exactly? As we know, a character class followed by a '+' matches one or more repetitions. For this example, it means that it would match the first digits it finds until it reaches the end of the string or a non-digit.

The * quantifier

The difference between + and * is that + matches 1 or more characters, while * matches 0 or more. This means that if the character class that is followed by this quantifier isn't represented in the string, it doesn't matter, because no matches are required.

local pattern = "%d%p*%d" --Matches a digit followed by 0 or more punctuation character followed by another digit.
print( ("1,!643"):match(pattern) )
print( ("12349"):match(pattern) )

1,!6

12

As you can see, it matches a digit, punctuation characters (if there are any), and then another digit. If you had used +, the second example would have returned nil, because + requires at least one match. The * pattern is very useful when you have something in the string that is optional.

The - quantifier

Unlike * and +, - matches the shortest possible sequence. For example, if you have a path name, and you want to retrieve a part of the string between /s, then you can use the - item. This example shows you the difference you'd get if you used '-' compared to the '*' item.

local s = "C:/Users/Telamon/Documents"
print( s:match("/.-/") )
print( s:match("/.+/") )

/Users/

/Users/Telamon/

From the example, you see that the - found the shortest possible sequence and stopped at the second /, while the * matched the longest sequence and stopped only at the last / in the string.

The ? quantifier

The ? quantifier is used to make certain characters in the string optional.

local pattern = "wik?is?"
print( ("This is the wiki"):match(pattern) )
print( ("There are multiple wikis"):match(pattern) )
print( ("You do not spell it wikki"):match(pattern) )
print( ("This is not a wii"):match(pattern) )

wiki wikis nil

wii

From the example you can see that the ? made the s and k optional, allowing the pattern to match "wii" and "wikis". However, only one k was allowed, so wikki was not matched

Sets

Sets are used when a single character class cannot do the whole job. For instance, you might want to match both lowercase letters (%l) as well as punctuation characters (%p) using a single class. So how would we do this? Let's take a look at this example:

local s = "123 Hello! I am another string."
local pattern = "[%l%p]+"
print( s:match(pattern) )
ello!

As you can see from the example, sets are defined by the '[' and ']' around them. You also see that the classes for lowercase letters and punctuation are contained within. This means that the set will act as a class that represents both lowercase and punctuation, unlike if you used %l%p, which would match a lowercase letter and a punctuation character following it.

You aren't restricted to using only character classes, though! You can also use normal characters to add to the set. Also, you can specify a range of characters with the '-' symbol. Let's see how this works in the following example:

--A sequence of threes, underscores, and lowercase letters
local pattern = "[3_%l]+"

for match in ("Random_123"):gmatch(pattern) do
    print(match)
end

andom_

3
--A sequence of the numbers 1 to 4 and uppercase letters
local pattern = "[1-4%u]+"

for match in ("37913 Sandwiches!"):gmatch(pattern) do
    print(match)
end

3 13

S

From the example, you can see how string.gmatch manipulated strings s1 and s2 using the string patterns. And yet, there's still one last thing you can do. Like with character classes, sets have compliments of themselves.

--A sequence of characters which are neither spaces nor one of the numbers 1 to 9
local pattern = "[^%s1-9]+" 

local result = ""
for match in ("He29ll0, I like strings1"):gmatch(pattern) do
    result = result .. match
end
print(result)
Hell0,Ilikestrings

This pattern is the compliment of [%s1-9]. As seen from the example, the compliment of a set is defined by using the ^ character at the beginning of the set. All this does is reverse the meaning of the set. As you can easily see from this example, the spaces, the number 29 in the middle of 'Hello', and the 1 at the end were removed.

Captures

Captures are used to get pieces of a string that match a capture. Captures are defined by parentheses around them. For instance, (%a%s) is a capture for a letter and a space character. When a capture is matched, it is then stored for future use. Let's look at this example:

local pattern = "(%a+)%s=%s(%d+)"

key, val = ("TwentyOne = 21"):match(Pattern)
print( key )
print( val )

TwentyOne

21


Now what happens if you want to get a list by using captures? You can use string.gmatch to do this.

local pattern = "(%a+)%s?=%s?(%d+)" --Captures a string of letters seperated by an optional space, an equal, and an optional space and then captures a string of numbers
local s = "TwentyOne = 21 Two=2 One =7 Four= 4"
for key, val in s:gmatch(pattern) do --You see how gmatch returns the captures instead of the matches to the pattern here.
    print( key, val )
end

TwentyOne 21 Two 2 One 7

Four 4

Note that 'key' and 'val' are actually referring to capture 1 and capture 2. The name does not matter, but it is still a good practice to choose a relevant name. As you can see, string.gmatch iterated through all the matches in the string and returned only the captures which is basically what captures are for, to capture a certain part of the string to use.


A final thing you can do with captures is that you can leave the captures empty. In these cases they will capture the current position on the string. This means that unlike the other, non-empty captures, a number is returned instead of a string. Look at this example:

local pattern = "()%a+()" --Captures the location of the first character, skips over a string of letters, and then captures the next character's position.
local cap1, cap2 = ("Hello!"):match(pattern)
print( cap1, cap2 )
1 6

From the example, once a match was found, string.find returned the first and second captures' positions in the string instead of returning the characters 'H' and '!'.

See also