User:Merlin11188/Draft: Difference between revisions

From Legacy Roblox Wiki
Jump to navigationJump to search
>Merlin11188
No edit summary
>Merlin11188
No edit summary
Line 1: Line 1:
===Prerequisites===
===Patterns===
----
{{EmphasisBox|Patterns require some knowledge of [[Function_Dump/String_Manipulation | string manipulation]].|red|dark=yes}}
{{EmphasisBox|Weak Tables require knowledge of [[Scope]] and [[Metatables]].|red|dark=yes}}
<br/>
First off, you must understand what ''references'' and ''objects'' are. Tables, functions, threads, and (full)  
==Classes==
userdata values are ''objects'': variables do not actually contain these values, only ''references''  
Character Class:
to them. Assignment, parameter passing, and function returns always manipulate references to such values;
 
these operations do not imply any kind of copy.<br/>
A character class is used to represent a set of characters. The following are character classes and their representations:
===Weak Tables===
*'''x''' — Where x is any non-magic character (^$()%.[]*+-?), x represents itself
Note: all Lua objects in the global environment are ignored and will never be garbage collected, even if they aren't used again.
*'''.'''  — Represents all characters (#32kas321fslk#?@34)
----
*'''%a''' — Represents all letters (aBcDeFgHiJkLmNoPqRsTuVwXyZ)
Weak tables are the mechanism that you use to tell Lua that a reference should not prevent the reclamation
*'''%c''' — Represents all control characters (all ascii characters below 32 and ascii character 127)
of an object by the garbage collector. A weak reference is a reference to an object that is not considered
*'''%d''' — Represents all base-10 digits (1-10)
by the garbage collector. If all references pointing to an object are weak, the object is collected and
*'''%l''' — Represents all lower-case letters (abcdefghijklmnopqrstuvwxyz)
somehow these weak references are deleted. Lua implements weak references as weak tables: A weak table is a
*'''%p''' — Represents all punctuation characters (#^;,.) etc.
table where all references are weak. That means that, if a reference to an object is only held inside weak tables, Lua will
*'''%s''' — Represents all space characters
collect the object eventually. Tables have keys and values and both may contain any kind of object. Under normal circumstances, the garbage collector does not collect objects that appear as keys or as values of an accessible table. That is, both keys and values are strong references, as they prevent the reclamation of objects to which they refer. In a weak table, keys and values may be weak. That means that there are three kinds of weak tables: tables with weak keys, tables with weak values, and fully weak tables, where both keys and values are weak. Irrespective of the table kind, when a key or a value is collected the whole entry disappears from the table. <br/>The weakness of a table is controlled by the __mode field of its metatable. If the __mode  field is a string containing the character 'k', the keys in the table are weak. If __mode contains the character 'v' the values  in the table are weak.
*'''%u''' — Represents all upper-case letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
===Examples===
*'''%w''' — Represents all alpha-numeric characters (aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789)
----
*'''%x''' — Represents all hexadecimal digits (0123456789ABCDEF)
*'''%z''' — Represents the character with representation 0 (the null terminator)
*'''%x''' — Represents (where x is ''any non-alphanumeric character'') the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a '%' when used to represent itself in a pattern. So, a percent sign in a string is "%%"  <br/>
Here's an example:
 
{{Example|<pre>
{{Example|<pre>
a = {}
String="Ha! You'll never find any of these (323414123114452) numbers inside me!"
b = {}
print(string.match(String, "%d")) -- Find a digit character
setmetatable(a, b)
b.__mode = "k"         -- now `a' has weak keys
key = {}              -- creates first key
a[key] = 1
key = {}              -- creates second key
a[key] = 2
collectgarbage()       -- forces a garbage collection cycle
for k, v in pairs(a) do
    print(v)
end


Output:
Output:
2
3
</pre>}}
</pre>}}
What happened there??? Well, the first key is created, then made into an index in table a. Then, the first key is ''overwritten'' by the second key, meaning the '''only reference''' to the first key is inside table a with a value of 1. Table a has weak keys, so when the garbage collection cycle is forced, it collects the first key because it's only reference is a weak key; however, the second key is still in the global environment, automatically preventing the reclamation of the second key by the garbage collector. So, since the the first key was removed from the table, the whole pair was removed from the table, leaving only the second key and it's corresponding value.


Here's another example, with the value as a reference:
An upper-case version of any of these classes results in the complement of that class. For instance, %A will represent all
non-letter characters. Here's another example:
{{Example|<pre>
{{Example|<pre>
Table={}
Martian="141341432431413415072343E234141241312"
setmetatable(Table, {__mode="v"}) -- Set the values as weak
print(Martian:match("%D")) -- Find a non-digit character
do -- Create a new scope
local ImAPony=newproxy(true) -- Create a new object in this scope, so the object can be garbage collected.
Table[1]=ImAPony
end
collectgarbage()
print(Table[1])


Output:
Output:
nil
E
</pre>}}
</pre>}}
Alright, so we create a table and give it weak values. Then we give it an object as a value (at index/key 1). Since the values with reference to an object are weak, the garbage collector goes ahead and collects the ImAPony userdata.<br/>
==Modifiers==
Now, what would happen if the line setting Table's metatable was commented out? This would happen:
In Lua, there are 4 modifiers:
<pre>
<ul>
Output:
<li>+ — 1 or more repetitions
userdata: [hexadecimal numbers]
<li>* — 0 or more repetitions
</pre>
<li>- — also 0 or more repetitions
The value inside Table is a strong reference, so when the garbage collector made its cycle it didn't pick up the object ImAPony, leaving it at index 1.
<li>? — optional (0 or 1 occurrence)
==See Also==
</ul>
http://www.lua.org/pil/17.html
 
 
* '''[set]''' represents the class which is the union of all characters in set. A range of characters may be specified by separating the end characters of the range with a '-'. All classes %x described above may also be used as components in set. All other characters in set represent themselves. For example, [%w_] (or [_%w]) represents all alphanumeric characters plus the underscore, [0-7] represents the octal digits, and [0-7%l%-] represents the octal digits plus the lowercase letters plus the '-' character.
 
The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z] or [a-%%] have no meaning.
* '''[^set]''' represents the complement of set, where set is interpreted as above.
 
For all classes represented by single letters (%a, %c, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S represents all non-space characters.
 
The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z] may not be equivalent to %l.
Pattern Item:
 
A pattern item may be
 
* a single character class, which matches any single character in the class;
* a single character class followed by '*', which matches 0 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence;
* a single character class followed by '+', which matches 1 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence;
* a single character class followed by '-', which also matches 0 or more repetitions of characters in the class. Unlike '*', these repetition items will always match the shortest possible sequence;
* a single character class followed by '?', which matches 0 or 1 occurrence of a character in the class;
* %n, for n between 1 and 9; such item matches a substring equal to the n-th captured string (see below);
* %bxy, where x and y are two distinct characters; such item matches strings that start with x, end with y, and where the x and y are balanced. This means that, if one reads the string from left to right, counting +1 for an x and -1 for a y, the ending y is the first y where the count reaches 0. For instance, the item %b() matches expressions with balanced parentheses.
 
Pattern:
 
A pattern is a sequence of pattern items. A '^' at the beginning of a pattern anchors the match at the beginning of the subject string. A '$' at the end of a pattern anchors the match at the end of the subject string. At other positions, '^' and '$' have no special meaning and represent themselves.
Captures:
 
A pattern may contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3.
 
As a special case, the empty capture () captures the current string position (a number). For instance, if we apply the pattern "()aa()" on the string "flaaap", there will be two captures: 3 and 5.
 
A pattern cannot contain embedded zeros. Use %z instead.

Revision as of 19:33, 11 July 2011

Patterns

Patterns require some knowledge of string manipulation.


Classes

Character Class:

A character class is used to represent a set of characters. The following are character classes and their representations:

  • x — Where x is any non-magic character (^$()%.[]*+-?), x represents itself
  • . — Represents all characters (#32kas321fslk#?@34)
  • %a — Represents all letters (aBcDeFgHiJkLmNoPqRsTuVwXyZ)
  • %c — Represents all control characters (all ascii characters below 32 and ascii character 127)
  • %d — Represents all base-10 digits (1-10)
  • %l — Represents all lower-case letters (abcdefghijklmnopqrstuvwxyz)
  • %p — Represents all punctuation characters (#^;,.) etc.
  • %s — Represents all space characters
  • %u — Represents all upper-case letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • %w — Represents all alpha-numeric characters (aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789)
  • %x — Represents all hexadecimal digits (0123456789ABCDEF)
  • %z — Represents the character with representation 0 (the null terminator)
  • %x — Represents (where x is any non-alphanumeric character) the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a '%' when used to represent itself in a pattern. So, a percent sign in a string is "%%"

Here's an example:

Example
String="Ha! You'll never find any of these (323414123114452) numbers inside me!"
print(string.match(String, "%d")) -- Find a digit character

Output:
3


An upper-case version of any of these classes results in the complement of that class. For instance, %A will represent all non-letter characters. Here's another example:

Example
Martian="141341432431413415072343E234141241312"
print(Martian:match("%D")) -- Find a non-digit character

Output:
E

Modifiers

In Lua, there are 4 modifiers:

  • + — 1 or more repetitions
  • * — 0 or more repetitions
  • - — also 0 or more repetitions
  • ? — optional (0 or 1 occurrence)


  • [set] represents the class which is the union of all characters in set. A range of characters may be specified by separating the end characters of the range with a '-'. All classes %x described above may also be used as components in set. All other characters in set represent themselves. For example, [%w_] (or [_%w]) represents all alphanumeric characters plus the underscore, [0-7] represents the octal digits, and [0-7%l%-] represents the octal digits plus the lowercase letters plus the '-' character.

The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z] or [a-%%] have no meaning.

  • [^set] represents the complement of set, where set is interpreted as above.

For all classes represented by single letters (%a, %c, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S represents all non-space characters.

The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z] may not be equivalent to %l. Pattern Item:

A pattern item may be

  • a single character class, which matches any single character in the class;
  • a single character class followed by '*', which matches 0 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence;
  • a single character class followed by '+', which matches 1 or more repetitions of characters in the class. These repetition items will always match the longest possible sequence;
  • a single character class followed by '-', which also matches 0 or more repetitions of characters in the class. Unlike '*', these repetition items will always match the shortest possible sequence;
  • a single character class followed by '?', which matches 0 or 1 occurrence of a character in the class;
  • %n, for n between 1 and 9; such item matches a substring equal to the n-th captured string (see below);
  • %bxy, where x and y are two distinct characters; such item matches strings that start with x, end with y, and where the x and y are balanced. This means that, if one reads the string from left to right, counting +1 for an x and -1 for a y, the ending y is the first y where the count reaches 0. For instance, the item %b() matches expressions with balanced parentheses.

Pattern:

A pattern is a sequence of pattern items. A '^' at the beginning of a pattern anchors the match at the beginning of the subject string. A '$' at the end of a pattern anchors the match at the end of the subject string. At other positions, '^' and '$' have no special meaning and represent themselves. Captures:

A pattern may contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3.

As a special case, the empty capture () captures the current string position (a number). For instance, if we apply the pattern "()aa()" on the string "flaaap", there will be two captures: 3 and 5.

A pattern cannot contain embedded zeros. Use %z instead.