<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE pagecontent SYSTEM "http://www.mitchenall.com/dtd/entities.dtd">
<document>
	<article>
		<overview>
			<title>Using Regular Expressions in 4D</title>
			<short_summary>A short discussion on regular expressions and their use within 4D.</short_summary>
			<head_section>
				<head>Using Regular Expressions in 4D</head>

				<para>Regular expressions are an extremely powerful tool for describing text.  They're often included as a power user feature in many text editors and word-processors and many langauges and development environments have built-in support for them.</para>
				<para>Regular expressions are a late entrant into the 4D family of products, and even now are only available through 3rd Party Plug-ins.  Hopefully, 4D will address this and have native support built-in, but for the moment, the choices available for using regular expressions in 4D are fairly good.</para>
				<para>This article will not teach the reader how to write regular expressions, but rather give some ideas of where and how regular expressions can be used with 4D.  It will hopefully show readers new to regular expressions just how useful they can be and give an incentive to go and learn how to write them for themselves.  I am by no means an expert on regular expressions, but I'm learning more all the time and I encourage those 4D developers who currently don't know how to use them to learn about them, especially if any of your work involves text parsing of any kind.</para>
				<para>There is an excellent book on regular expressions entitled, "Mastering Regular Expressions" by Jeffrey E. F. Friedl, which is an absolute must for anybody who wants to use and eventually master regular expressions.  It's the first book I've read on the subject which does a good job of explaining how to use regular expressions to someone who has never used them before.  Again, I will not try and teach regular expression in this article, but will instead provide links where you can find more information for yourself.</para>
				<para>Since discovering Regular Expressions (or Regex patterns) a year or so ago I have wondered how I managed to live so long without them.  I now use them regularly when programming in PHP or 4D, but also for editing HTML and other source code.  For example, using regular expressions, creating the hyperlinked 4D source code on this website was very much easier than you would think.  Creating the index of all the 4D commands was done with 1 simple regex applied to each page in the index of the manual.  Converting the code was also done with a series of simple regular expressions and some other find and replace operations to create the XML.  Then the PHP XML process puts the two together and along with the stylesheet I get formatted 4D code with hyperlinks on the commands.  Using regular expressions this took no time at all and is completely reusable.</para>
				<para><bold>Note:</bold> This article will remain a "work in progress" for a while as we find out new ways to entice 4D developers into using regular expressions.  Expect fairly regular updates to this article, if you would like to contribute some ideas, I'd love to hear from you.</para>
				<head_section>
					<head>Planet 4D</head>
					<para>The first part of a tutorial on using Regular Expressions aimed at 4D developers is available in Planete 4D Issue 7, the French journal for 4D developers.  Visit the <a href="http://www.planete4d.com/" title="Visit the Planete 4D Website" target="_blank">Planete 4D</a> website to order your copy.</para>
				</head_section>
			</head_section>
		</overview>
		
<!--
		========================
		Section 1 of the Article
		========================
-->
		
		<section>
			<title>When/How can Regular Expressions be used in 4D?</title>
			<short_summary>This section shows how and when Regular Expressions can be used within the 4D environment.</short_summary>
			<head_section>
				<head>When can Regular Expression be used in 4D?</head>
				<para>Regular expresions can be used both in your applications as an effective means of</para>
				<head_section>
					<head>What tools are available?</head>
					<para>At the time of writing there are two main options available to 4D developers wanting to make use of regular expressions.  QFree 2.0 contains 4 fantastic functions for using regular expressions within your applications.  QuickCodePro 4.2 (since b5) now includes "Find and Replace" in your 4D methods using regular expressions.  Let's look at the two options in more detail.</para>
					<head_section>
						<head>Using Regular Expressions in your applications</head>
						<para>There are a huge number of times when regular expressions can be used with great effect in your applications.  With 4D becoming ever more internet centric, the ability to use regular expressions has become ever more important.</para>
						<para>Regular expressions are extremely useful for text processing operations in 4D.  For example, validating user input, parsing and importing non-standard file formats, searching for all occurances of hyperlinks in a web-page, extracting email addresses from documents.  All this can be achieved without using the power of regular expressions, but in order to make the code efficient, code-readability sacrifices need to be made and each parser tend to be fairly inflexible.  This depends on the complexity of the task in hand, but with regular expressions, complex text matching can be carried out with the minimum of code and also at very reasonable speeds, even when running interpretted.</para>
						<para>4D has no regular expression support built-in at this time, but there is a plug-in available from Escape which does the job admirably.  QFree 2.0 is a free, unsupported plug-in containing a number of functions including Windows Taskbar functions, RGB Picker dialog, some URL functions, some QuickTime Atom functions, and what we're most interested in here, Regex functions.  Although QFree is unsupported, I have so far had no problems with it and Escape will not leave you completely out in the code if you do have a problem.</para>
						<para>The 4 Regex functions are <plugincommand>QF_REMatch</plugincommand>, <plugincommand>QF_RESubstitute</plugincommand>, <plugincommand>QF_RESplit</plugincommand> and <plugincommand>QF_REExtract</plugincommand>.  I will cover each of these commands in turn below.  For the moment, I'll just say that just these 4 commands add a huge amount of power to your 4D toolbox.</para>
						<para>QFree's demo database comes with a "Regex Lab" which is simple dialog which lets you test your regular expressions out with the various options to ensure they work before setting out coding for them.  It has pages with a tab control which lets you try out each of the different regex functions and see the results instantly.  There are fields to let you enter your regex pattern, target text and substution text or submatch groups depending on which function you're using, plus checkboxes for each of the different additional flags which can be set for the regex engine.  This simple dialog lets you get working with regular expressions in 4D quickly and examine the results.  It's also an good source of example code to see how it's put together.</para>
						<para>For a free piece of software it's extremely nicely packaged.  The documentation gives a fairly thorough overview of the regular expression syntax which is supported, but I think the sample database is what makes it a really fantastic piece of kit.  Escape have done the 4D community a big favour putting so much effort into a piece of freeware.</para>
						<para>QFree uses Perl Compatible Regular Expressions (PCRE with certain POSIX extensions, so finding documentation on the web is very straightforward.  The O'Reilly book "Mastering Regular Expressions" is really a must for anyone wanting to really get into using Regular Expressions and contains plenty of information on Perl regular expressions.</para>
						<para>QFree is cross-platform, so you can use these features on both platforms without any difficulties.</para>
					</head_section>
					<head_section>
						<head>Using Regular Expressions in the Method Editor</head>
						<para>A common feature in many text editors, especially those aimed at programmers, is the ability to use regular expressions for finding and replacing text within the current document.  I'm sure we're all familiar with the "Use Grep" or "Use Regular Expression/Regex" checkbox in editors such as BBEdit and HomeSite and other popular HTML editors.  This feature has been off limits to 4D developers until Automated Solutions Group added the support for Regex matching in QuickCodePro 4.2b5 (QCP).</para>
						<para>In the past, if a developer wanted to be able to use Regex substitutions on 4D code, he/she would have to copy the code to an editor such as BBEdit and perform the substition there, then copy the code back to 4D when complete.  Now this can be done directly in the method editor via QCP's Find and Replace dialog.  QCP currently allows you to apply regex matching and substitutions on the entire method, or all open method windows.</para>
						<para>Being able to apply regex substitutions across all open method windows allows you to easily perform tasks like replacing all occurances of a function call with another, but swapping the positions of the parameters at the same time.  Also, ensuring the correct capitlisation of a group of variables or enforcing style guidelines on blank lines and comments.  Later we will cover some specific examples.</para>
						<para>Like QFree, QCP uses PCRE so what you use in one should work in a similar fashion in the other.</para>
						<para>QCP is a commercial piece of software costing around $200 US per developer for a copy, but for the time savings you can gain by having a copy and spending a short time every month flicking through the manual, it's well worth it.  It practically pays for itself the first day you own a copy even without the regular expression support.  With this added benefit it makes the 4D method into a more serious piece of code editing equipment.  The only problem with QCP is that it's currently only available for the Macintosh, but hopefully this will change with version 5.</para>
					</head_section>
				</head_section>
			</head_section>
		</section>
		
<!--
		========================
		Section 2 of the Article
		========================
-->

		<section>
			<title>Uses for Regular Expressions in your applications</title>
			<short_summary>Here we will look in more details at using regular expressions in your applications and the types of things they're useful for.</short_summary>
			<head_section>
				<head>Uses for Regular Expressions in your applications</head>
				<para>A common feature in databases which store contact details is to store a person's email address and possibly website address.  Checking whether these details have been entered correctly is fairly straightforward, but still fiddly and a multi-line code affair.  Using a regular expression, it's an extremely simple operation.  It's even possible to find strings which look like email addresses or website addresses in a larger block of text using only a slightly modified version of the regex.</para>
				<head_section>
					<head>Parsing and Validating Email Addresses</head>
					<para>In the case of an email address, e.g. mark@mitchenall.com, we're looking for a piece of text with the following characteristics...</para>
					<unordered_list>
						<list_item>Starts with the username which can contain letters, numbers, full-stops, hyphens, and underscores.</list_item>
						<list_item>An '@' sign follows the mail username.</list_item>
						<list_item>The server address will normally consist of 2 or more groups of letters, numbers, hyphens or underscores (never less than 2 groups) seperated by full-stops (periods).</list_item>
						<list_item>The address is not case sensitive.</list_item>
					</unordered_list>
					<para>This is only a very brief description and we could certainly still enter a non-conforming email address if we only checked for these characteristics.  But how much 4D code is necessary to write this?</para>
					<para>Using QFree 2.0, very little 4D code is necessary.  Here's an example:</para>
					<code>
						<codeline><comment>`Method: Validate_EmailAddress(emailAddress:text) --> Valid:boolean</comment></codeline>
						<codeline><comment>`Author: Mark Mitchenall</comment></codeline>

						<codeline><command>C_TEXT</command>($1;$emailAddress)</codeline>
						<codeline><command>C_BOOLEAN</command>($0;$valid)</codeline>
						<codeline></codeline>
						<codeline>$emailAddress:=$1</codeline>
						<codeline>$valid:=<command>False</command></codeline>
						<codeline></codeline>
						<codeline><command>C_BLOB</command>($regexTarget)</codeline>
						<codeline><command>TEXT TO BLOB</command>($emailAddress;$regexTarget;Text without length)</codeline>
						<codeline></codeline>
						<codeline><command>C_TEXT</command>($Regex)</codeline>
						<codeline>$Regex:="^([-a-zA-Z0-9_.]+)@(([-a-zA-Z0-9_]+[.])+[a-zA-Z]+)$"</codeline>
						<codeline></codeline>
						<codeline><command>ARRAY LONGINT</command>($matchOffsets;0)</codeline>
						<codeline><command>ARRAY LONGINT</command>($matchLengths;0)</codeline>
						<codeline><command>C_LONGINT</command>($targetOffset;$flags)</codeline>
						<codeline>$targetOffset:=0</codeline>
						<codeline>$flags:=0</codeline>
						<codeline></codeline>
						<codeline><command>C_LONGINT</command>($Err)</codeline>
						<codeline>$Err:=<plugincommand>QF_REMatch</plugincommand>($Regex;$regexTarget;$</codeline>
						<codeline>      targetOffset;$matchOffsets;$matchLengths;$flags)</codeline>
						<codeline><command>If</command>($Err=0)</codeline>
						<codeline>  <command>If</command>($matchLengths{0}=Length($emailAddress))   `<comment>We have a successful match</comment></codeline>
						<codeline>    $mailUsername:=<command>Substring</command>($emailAddress;$matchOffsets{1}+1;$matchLengths{1})</codeline>
						<codeline>    $serverDomain:=<command>Substring</command>($emailAddress;$matchOffsets{2}+1;$matchLengths{2})</codeline>
						<codeline>    $valid:=<command>True</command></codeline>
						<codeline>  <command>End if</command></codeline>
						<codeline><command>End if</command></codeline>
						<codeline>$0:=$valid</codeline>

						</code>
					<para>The piece of code above checks the validity of an email address based on the rules we made up earlier.  Notice that using the regex we also found the positions of the mail username and the server domain.  In some applications, having these two seperate may be useful.  For example, to further check the validity of the address, you could lookup the domain name to see if it exists.  By extending the pattern slightly we could also parse more complex email addresses, such as those contained in emails themselves, e.g.</para>
					<example_xml>
"Mark Mitchenall" &lt;mark@mitchenall.com&gt;
					</example_xml>
					<para>Writing a regex to parse the different parts of this address isn't really any more complicated than just testing the addresses approximate validity, but writing a fully conforming regex is pretty hard work.  By the end of the book, Jeffrey Friedl shows a complete regex for matching an email address.  With every possible option covered it's over 6,000 bytes long.  For most purposes you can get away with something very much simpler.</para>
					
				</head_section>
				<head_section>
					<head>Parsing URLs</head>
					<para>The same applies when writing applications which need to deal with web addresses.  Users who are inexperienced with the internet regularly have problems entering website addresses correctly.  Using a regex you can quickly decide whether what the user entered is correct with a minimum amount of code.  But also, you can extract more important information from the URL which can be used to potentially access or test the site using 4D Internet Commands.  e.g. using a simple HTTP client in 4D, you generally need to provide the server address and page URL seperately.  This is fairly easily to do in 4D code, but using a regex is a much more elegant way.</para>
					<para>Take the following URL as an example:</para>
					<para><a href="http://www.mitchenall.com/resources/library/4d/regular_expressions/index.phtml">http://www.mitchenall.com/resources/library/4d/regular_expressions/index.phtml</a></para>
					<para>If we split the URL into the different parts we could get:</para>
					<para>(scheme)://(host)/(path)</para>
					<para>We could go one further and split URLs completely, e.g.</para>
					<para>http://user:password@www.host.com:1234/sample/uri?query#fragment</para>
					<para><a href="http://www.escape.gr/" target="_blank">Escape</a> have used this as an example in the QFree demo database, so I wont repeat the code here (see screenshot below).  The code is fairly straightforward and can be easily reused in any project.  In their example they use 2 regular expressions in order to get each part of the URL separately.  Web developers using languages such as Perl and PHP have been using these techniques for years and if more 4D developers start complaining of the lack of support directly in 4D, perhaps 4D SA will do something about it.  It would be especially good if they use QFree as an example of one way of achieving it.</para>
					<para><img src="images/parse_url.gif" alt="Screenshot of Parse URL" width="394" height="209" border="0" /></para>
					<para>Another example of parsing URLs is when you want to find out what other HTML pages a particular page refers to.  This is useful in many web applications, including robots and spiders but also website management tools.  Using regular expressions this is now easy to achieve in 4D.  Again, <a href="http://www.escape.gr/" target="_blank">Escape</a> have provided a good example of this in the QFree Demo, so I strongly advise you to download it and start playing.  The example in the QFree sample is actually for parsing Bookmark files, but it works just the same with HTML pages as shown in the screenshot below.</para>
					<para><img src="images/parse_bookmarks.gif" alt="Screenshot of Parse Bookmarks" width="394" height="322" border="0" /></para>
					<para></para>
				</head_section>
				<head_section>
					<head>Parsing 4D Digests into individual messages</head>
					<para>Recently, the 4D NUG changed it's name.  This in turn caused the long running Digest Reader 4 to break and it would no longer read digest messages.  Also, there's been the addition of another English mailing list for 4D (DataBasics, see links page) which I'd also like to parse the digests for.  It occured to me that there's also plenty of other mailing lists with digest options which don't get parsed by Outlook Express (also, I don't like the way Outlook Express does it), so I decided to replace the reasonably complex parsing code in Digest Reader by Matthew Whyte into a simplified version which simply used a Regex pattern.  The advantage of this is that only the regex needs to be modified to be able to read different digests, so supporting different mailing lists with the same package become extremely simple.  Also, if the format of the list ever changes, Matthew would only have to supply a new Regex for people to enter into their preferences.</para>
					<para>The <plugincommand>QF_REExtract</plugincommand> command is perfect for this task.  It allows you to extract all matches and submatches of a regex pattern in a BLOB into two 2D arrays in just 1 line of code.  Using these two arrays you can then easily extract the necessary text from the BLOB to do with as you please.  In the case of Digest Reader, I needed to extract each individual message, plus for each one, the subject, sender's name, sender's email address, date and message contents.  This was all achieved using one pattern and extracting only certain submatches.</para>
					<para>The new code to extract the messages from a digest stored in a file looks something like the following:</para>
					<code>
						<codeline><comment>`Method: digest_Parse</comment></codeline>
						<codeline><comment>`Author: Mark Mitchenall</comment></codeline>
						<codeline><comment>`Created: 12/24/00 at 4:01 PM</comment></codeline>
						<codeline></codeline>
						<codeline><command>C_TIME</command>(<localvar>$docRef</localvar>)</codeline>
						<codeline><localvar>$docRef</localvar>:=<command>Open document</command>(&quot;&quot;;&quot;TEXT&quot;;<constant>Get Pathname</constant> )</codeline>
						<codeline></codeline>
						<codeline><command>If</command> (OK=1)</codeline>
						<codeline><comment>`Load the Digest document into a BLOB  </comment></codeline>
						<codeline>  <command>DOCUMENT TO BLOB</command>(<processvar>document</processvar>;<localvar>$digest</localvar>)</codeline>
						<codeline></codeline>
						<codeline><comment>`We now know what kind of Digest it is, so we just need to use the correct regex </comment></codeline>
						<codeline><comment>`to extract the messages, plus extract the right groups.</comment></codeline>
						<codeline>  <command>C_TEXT</command>(<localvar>$regex</localvar>;<localvar>$groups</localvar>)</codeline>
						<codeline>  <command>C_LONGINT</command>(<localvar>$flags</localvar>)</codeline>
						<codeline>  <localvar>$regex</localvar>:=&quot;!!!!!!! INSERT REGEX HERE !!!!!!&quot;<comment>`Get the Message Regex from the mailing list record here.</comment></codeline>
						<codeline>  <localvar>$groups</localvar>:=&quot;1 2 4 5 7 8&quot;<comment>`These are the submatches.  Should be in the record</comment></codeline>
						<codeline>  <localvar>$flags</localvar>:=(<constant>qf_RECaseless</constant>+<constant>qf_REDotAll</constant>+<constant>qf_REUngreedy</constant> )<comment>`This longint should be in the record</comment></codeline>
						<codeline></codeline>
						<codeline><comment>`Initialise the 2D arrays which will hold the results of the extraction</comment></codeline>
						<codeline>  <command>ARRAY LONGINT</command>(<localvar>$matchOffsets</localvar>;0;0)</codeline>
						<codeline>  <command>ARRAY LONGINT</command>(<localvar>$matchLengths</localvar>;0;0)</codeline>
						<codeline></codeline>
						<codeline><comment>`Do the message extraction using QFree regex routine</comment></codeline>
						<codeline>  <command>C_LONGINT</command>(<localvar>$err</localvar>)</codeline>
						<codeline>  <localvar>$err</localvar>:=<plugincommand>QF_REExtract</plugincommand>(<localvar>$regex</localvar>;<localvar>$digest</localvar>;<localvar>$groups</localvar>;<localvar>$matchOffsets</localvar>;<localvar>$matchLengths</localvar>;<localvar>$flags</localvar>)</codeline>
						<codeline></codeline>
						<codeline>  <command>C_LONGINT</command>(<localvar>$numMessages</localvar>;<localvar>$messages</localvar>;<localvar>$offset</localvar>;<localvar>$length</localvar>))</codeline>
						<codeline></codeline>
						<codeline><comment>`Now we just need to save those messages which the user hasn't asked to be </comment></codeline>
						<codeline><comment>`filtered out.</comment></codeline>
						<codeline>  <localvar>$numMessages</localvar>:=<command>Size of array</command>(<localvar>$matchOffsets</localvar>)</codeline>
						<codeline></codeline>
						<codeline>  <command>For</command> (<localvar>$messages</localvar>;1;<localvar>$numMessages</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <command>CREATE RECORD</command>(<table>[Articles]</table>)</codeline>
						<codeline></codeline>
						<codeline>    <localvar>$offset</localvar>:=<localvar>$matchOffsets</localvar>{<localvar>$messages</localvar>}{1}</codeline>
						<codeline>    <localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{<localvar>$messages</localvar>}{1}</codeline>
						<codeline>    <field>[Articles]Subject</field>:=<command>BLOB to text</command>(<localvar>$digest</localvar>;<constant>Text without length</constant> ;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <localvar>$offset</localvar>:=<localvar>$matchOffsets</localvar>{<localvar>$messages</localvar>}{2}</codeline>
						<codeline>    <localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{<localvar>$messages</localvar>}{2}</codeline>
						<codeline>    <field>[Articles]FromName</field>:=<command>BLOB to text</command>(<localvar>$digest</localvar>;<constant>Text without length</constant> ;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <localvar>$offset</localvar>:=<localvar>$matchOffsets</localvar>{<localvar>$messages</localvar>}{3}</codeline>
						<codeline>    <localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{<localvar>$messages</localvar>}{3}</codeline>
						<codeline>    <field>[Articles]FromMail</field>:=<command>BLOB to text</command>(<localvar>$digest</localvar>;<constant>Text without length</constant> ;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <localvar>$offset</localvar>:=<localvar>$matchOffsets</localvar>{<localvar>$messages</localvar>}{4}</codeline>
						<codeline>    <localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{<localvar>$messages</localvar>}{4}</codeline>
						<codeline>    <field>[Articles]ArtDate</field>:=<command>BLOB to text</command>(<localvar>$digest</localvar>;<constant>Text without length</constant> ;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <localvar>$offset</localvar>:=<localvar>$matchOffsets</localvar>{<localvar>$messages</localvar>}{5}</codeline>
						<codeline>    <localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{<localvar>$messages</localvar>}{5}</codeline>
						<codeline>    <command>COPY BLOB</command>(<localvar>$digest</localvar>;[Articles]Article;<localvar>$offset</localvar>;0;<localvar>$length</localvar>)</codeline>
						<codeline></codeline>
						<codeline>    <command>SAVE RECORD</command>(<table>[Articles]</table>)</codeline>
						<codeline></codeline>
						<codeline>  <command>End for</command> </codeline>
						<codeline></codeline>
						<codeline><command>End if</command></codeline>
						</code>
					<para>You may noticed that I left out the regex pattern itself.  This is because I didn't do a complete implementation, but left it so that Matthew could decide for himself how he wanted to store the regexes.  The pattern in the demo was stored in a TEXT resource as these are fairly easy to get to and modify in a hurry.</para>
					<para>The pattern used to extract the messages was as follows (Note: here it's split over more than one line, but in reality it's all one line.  It's just presented this way for readability):</para>
					<code>
<codeline>\r\rSubject: (.*)</codeline>
<codeline>\rFrom: (["]?(([-a-z0-9_%.\s]+)["]?)?[\s]?&lt;([-a-z0-9_%.]+@[-a-z0-9_]+([.][-a-z0-9_]+)+)&gt;)</codeline>
<codeline>\rDate: (.*)</codeline>
<codeline>\r\r(.*)</codeline>
<codeline>\r\r-{70}</codeline>
					</code>
					<para>From this we extract the groups 1, 4, 5, 7, and 8 to get the Subject, Sender's name (without the quotes), Sender's email address (without the &lt;&gt;s), the date and the message respectively.</para>
					<para>When extracting more than 1 submatch, QFree returns its results as two 2D Longint Arrays giving the positions of each match and the lengths.  If we look at the following example of a 4D NUG digest we can see below have this will be parsed by <plugincommand>QF_REExtract</plugincommand> with the above regular expression.</para>
					<example_xml>
Subject: 4D Tech Digest - V11 #234 - 03/11/01
Date: Sunday, March 11, 2001 8:15 PM
From: 4D Tech Mailing List &lt;4d-tech@lists.4dnug.com&gt;
To: 4D Tech Mailing List &lt;4d-tech@lists.4dnug.com&gt;

4D Tech Digest - V11 #234 - Sunday, March 11, 2001

  QFree is a cool plugin
          by "Mark Mitchenall" &lt;mark@mitchenall.com&gt;
  Re: QFree is a cool plugin
          by "Joe Bloggs" &lt;joe@bloggs.com&gt;

----------------------------------------------------------------------

Subject: QFree is a cool plugin
From: "Mark Mitchenall" &lt;mark@mitchenall.com&gt;
Date: Sat, 10 Mar 2001 22:31:30 -0800

QFree is a really cool plug-in.  Everyone should download
themselves a copy and start getting into using the regex
functions immediately.

Best,

Mark


----------------------------------------------------------------------

Subject: Re: QFree is a cool plugin
From: "Joe Bloggs" &lt;joe@bloggs.com&gt;
Date: Sun, 11 Mar 2001 00:20:12 -0800

"Mark Mitchenall" &lt;mark@mitchenall.com&gt; wrote:

&gt; QFree is a really cool plug-in.  Everyone should download
&gt; themselves a copy and start getting into using the regex
&gt; functions immediately.

Just downloaded it and realised how cool it was.  Thanks!

Joe

----------------------------------------------------------------------
End of 4D Tech Digest - V11

**********************************************************************
4th Dimension Networked Users Group (4D NUG)
FAQ:        http://www.4dnug.com/4d_mailing_list_faq.html
Resources:  http://www.4dnug.com/resources.html
Admin:      Karen R Sabog &lt;4d-admin@4dnug.com&gt;
Unsub:      mailto:4d-tech-off@lists.4dnug.com
**********************************************************************

</example_xml>
					<para>The sample code below shows the values populated into the arrays by <plugincommand>QF_REExtract</plugincommand>.  You can think of the 2D arrays as table where the first dimension is for each row, or in this case match of the regular expression, and the 2nd dimension being the column or submatch group, e.g. <localvar>$matchOffset</localvar>{<localvar>row</localvar>}{<localvar>column</localvar>} or <localvar>$matchOffset</localvar>{<localvar>match</localvar>}{<localvar>submatch</localvar>}.  One thing I forgot to mention is that column zero, i.e. <localvar>$matchOffset</localvar>{<localvar>row</localvar>}{<localvar>0</localvar>} always contains the entire match, so in this case, the whole message, plus the headers at the top and the row of hyphens at the bottom.</para>
					<code>
<codeline><comment>`First Message in digest</comment></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{1}{1}<comment>`position of start of 1st subject text</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{1}{1}<comment>`length of 1st subject text</comment></codeline>
<codeline><localvar>$subject</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$subject</localvar> = "QFree is a cool plugin"</comment></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{1}{2}<comment>`position of start of 1st sender's name</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{1}{2}<comment>`length of 1st sender's name</comment></codeline>
<codeline><localvar>$name</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$name</localvar> = "Mark Mitchenall"</comment></codeline>
<codeline></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{1}{3}<comment>`position of start of 1st sender's emailaddress</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{1}{3}<comment>`length of 1st sender's emailaddress</comment></codeline>
<codeline><localvar>$address</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$address</localvar> = "QFree is a cool plugin"</comment></codeline>
<codeline></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{1}{4}<comment>`position of start of 1st date</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{1}{4}<comment>`length of 1st date</comment></codeline>
<codeline><localvar>$date</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$date</localvar> = "Sat, 10 Mar 2001 22:31:30 -0800"</comment></codeline>
<codeline></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{1}{5}<comment>`position of start of 1st message body</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{1}{5}<comment>`length of 1st message body</comment></codeline>
<codeline><localvar>$body</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$subject</localvar> = "QFree is a really cool plug-in.  Every...."</comment></codeline>
<codeline></codeline>
<codeline><comment>`Second Message in digest</comment></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{2}{1}<comment>`position of start of 2nd subject text</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{2}{1}<comment>`length of 1st subject text</comment></codeline>
<codeline><localvar>$subject</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$subject</localvar> = "Re: QFree is a cool plugin"</comment></codeline>
<codeline></codeline>
<codeline><localvar>$offset</localvar>:=<localvar>$matchOffset</localvar>{2}{2}<comment>`position of start of 2nd sender's name</comment></codeline>
<codeline><localvar>$length</localvar>:=<localvar>$matchLengths</localvar>{2}{2}<comment>`length of 1st sender's name</comment></codeline>
<codeline><localvar>$name</localvar>:=<command>BLOB to text</command>(<localvar>$blob</localvar>;<constant>Text without length</constant>;<localvar>$offset</localvar>;<localvar>$length</localvar>)</codeline>
<codeline><comment>`<localvar>$name</localvar> = "Joe Bloggs"</comment></codeline>
<codeline></codeline>
<codeline><comment>`...etc...</comment></codeline>
</code>
					<para>One thing to remember with using this method of parsing the digest is that it's extremely fast, even when running interpretted.  Also, it's fairly memory efficient as the BLOB is not copied in RAM and the return result is simply two 2D arrays of longints telling you the positions of all the strings you're looking for in the text and all with just one call to the plug-in.  The way this has been implemented by Escape is excellent.  It's a really elegant solution for many text parsing requirements.</para>
					<para></para>
				</head_section>
			</head_section>
		</section>
<!--
		========================
		Section 3 of the Article
		========================
-->
		
		<section>
			<title>Another potential application for Regular Expressions</title>
			<short_summary>Some more examples of regular expression usage in your applications.</short_summary>
			<head_section>
				<head>Another potential application for Regular Expressions</head>
				<para>QuickCodePro v4 is an exceptional product.  For one thing it has made 4D Insider comments (the ones you can now edit in the Explorer window) an extremely useful tool.  Firstly, you can have the parameters auto-entered if you put them in the comments.  Secondly, the comments can be displayed in balloon help in the method editor.  This gives developers even more reason to use them.</para>
				<para>During a project I worked on recently, there came a time when the system needed to be fully documented so that future developers would be able to figure the system out without a problem.  The system had already been handled by 5 different developers at different times and documentation was sparce.  I decided one of the easiest things to do would be to enter most of the information needed straight into the structure itself using the 4D Insider comments.  This would give me the benefit of having on-line help for myself and future developers.  But I also wanted to product other printed documentation and so I wanted to extract all these comments from 4D Insider exports allowing me to create documentation from them.  Here regular expressions came to the rescue allowing me to extract the information from the exports into a database where it could then be exported to be worked on in the word processor.</para>
				<para>In the end I wrote a number of regex patterns to extract various information from the 4D Insider Exports.  Because the structure of the exports wasn't as strict as I'd hoped, I had to export the various parts of the structure into separate files, i.e. one file for the comments, another for the fields and tables, another for methods, another for forms, etc.  This didn't take long and once exported they take absolutely no time whatsoever to parse to be used in the documentation database.</para>
				<head_section>
					<head>Parsing 4D Insider Exports</head>
					<para>First open your structure file in 4D Insider.  Then show only the methods in the database.  Choose "Export..." from the "File" menu and check only the "Documentation" checkbox and hit the Export button.</para>
					<para>This file will then just contain the comments for each project method as entered in the Explorer window.  These unfortunately wont contain any of the colour or other text formatting, but you can easily extract all these comments and enter them into a database.  The following regular expression, used with <plugincommand>QF_REExtract</plugincommand> returns the name, ID and size of the project method found in the BLOB which can then be used to extract the method contents for each one.</para>
					<code>
<codeline>\r+Project\sMethod\s([a-zA-Z0-9_\s]+)\s([0-9]+[/][0-9]+[/][0-9]+).\(ID.([0-9]+)...([0-9,]+).bytes\)\r+</codeline>
</code>
					<para>The same format is used when you just export the project method contents using 4D Insider, so you can use the same regular expression for that if you wish to import this too.</para>
					<para>Once you have the contents of the project methods, you can easily parse them for common items.  e.g. Parse object methods to extract the contents of all of the form event handlers or triggers for database actions.  You can also easily search for all parameter declarations or even just local variables with great ease.  You need to export the form object methods, form methods and triggers seperately and apply slightly different patterns to them, but it's still a fairly straightforward way of collecting all the documentation from your structure file.</para>
					<para>Another potential use of regular expressions is to find all comments in your source code so that they can be gathered and read separately from the source.  Again, once you have this information in a database, you can easily export it to a variety of formats.  e.g. creating an web-based documentation resources for developers, converting to RTF format for import into a word-processor or documentation tool, etc.</para>
					<para>The patterns used for form methods, object methods, fields and tables are:</para>
					<code>
<codeline><bold>Form Methods</bold></codeline>
<codeline>Form\sMethod\s\[([a-zA-Z_][a-zA-Z0-9_]*)\][.]([a-zA-Z0-9_]*)\s([0-9]+[/][0-9]+[/][0-9]+).\(ID.([0-9]+)...([0-9,]+).bytes\)\r+</codeline>
<codeline></codeline>
<codeline>Groups: 1 - Tablename, 2 - Formname, 3 - Date, 5 - ID, 6 - Size</codeline>
<codeline></codeline>
<codeline><bold>Object Methods</bold></codeline>
<codeline>\r+Object\sMethod\s\[([a-zA-Z_][a-zA-Z0-9_]*)\][.]([a-zA-Z0-9_]*)[.]([a-zA-Z0-9_]*)\s([0-9]+[/][0-9]+[/][0-9]+).\(ID.([0-9]+)...([0-9,]+).bytes\)\r+</codeline>
<codeline></codeline>
<codeline>Groups: 1 - Tablename, 2 - Formname, 3 - ObjectMethodName, 4 - Date, 6 - ID, 7 - Size</codeline>
<codeline></codeline>
<codeline><bold>Tables</bold></codeline>
<codeline>\r+Table\s\[([a-zA-Z_][a-zA-Z0-9_]*)\]\s([0-9]+[/][0-9]+[/][0-9]+).\(ID.([0-9]+)...([0-9,]+).bytes\)\r+
<codeline></codeline>
<codeline>Groups: 1 - Tablename, 2 - Date, 4 - ID, 5 - Size</codeline>
</codeline>
<codeline></codeline>
<codeline><bold>Fields</bold></codeline>
<codeline>Field\s\[([a-zA-Z_][a-zA-Z0-9_]*)\]([a-zA-Z_][a-zA-Z0-9_]*)
<codeline></codeline>
<codeline>Groups: 1 - Tablename, 2 - Formname</codeline>
</codeline>
</code>
					<para>A sample application showing these in action will be produced over the next few months.  Using QFree to parse this from 4D Insider export files is extremely quick.  The only current drawback is having to save each time of object seperately, and export the documentation for each type of object seperately.  Still, this is only a minor drawback when you consider how fast the parsing is.</para>
					<para>In all my applications I use the same format for all my object methods when it comes to form events.  I use a <a href="http://www.asgsoft.com/" target="_blank">QuickCodePro</a> macro each time I open a new object method which pastes my standard skeleton into the method editor, e.g.</para>
					<code>
<codeline><comment>`Method: myObject</comment></codeline>
<codeline><comment>`Author: Mark Mitchenall</comment></codeline>
<codeline><comment>`Created: 13/3/01 at 12:48 AM</comment></codeline>
<codeline></codeline>
<codeline><command>C_LONGINT</command>(<localvar>$formEvent</localvar>)</codeline>
<codeline><localvar>$formEvent</localvar>:=Form event</codeline>
<codeline></codeline>
<codeline><command>Case of</command> </codeline>
<codeline>  : (<localvar>$formEvent</localvar>=<constant>On Load</constant>)</codeline>
<codeline></codeline>
<codeline>  : (<localvar>$formEvent</localvar>=<constant>On Clicked</constant>)</codeline>
<codeline></codeline>
<codeline>  : (<localvar>$formEvent</localvar>=<constant>On Data Change</constant>)</codeline>
<codeline></codeline>
<codeline>  : (<localvar>$formEvent</localvar>=<constant>On Unload</constant>)</codeline>
<codeline></codeline>
<codeline><command>End case</command> </codeline>
</code>
					<para>Because all my form methods and object methods use this format, I can easily parse them using a regular expression to find out which form events I've actually coded.  If necessary I could then parse the individual event's code.  The following regex pattern can be applied to an object or form method to retrieve which form event constants have been used.  The submatch group gives you the form event constant and the whole match gives you the line with the case.</para>
					<code>
<codeline><bold>Form Event Handlers</bold></codeline>
<codeline>\r.*[:]\s\([$]formEvent=(On\s[a-zA-Z0-9\s]*)\).*\r</codeline>
<codeline></codeline>
<codeline>Groups: 1 - Form Event Constant</codeline>
</code>
					<para>As you will read in the next section, I have a standard way of presenting parameters in methods.  This means it's very easy for me to use this information to add to the documentation or even help in the creation of some of the documentation.  The following code sample is an example of the standard way I present parameters.</para>
					<code>
<codeline><comment>`Method: myMethod</comment></codeline>
<codeline><comment>`Author: Mark Mitchenall</comment></codeline>
<codeline><comment>`Created: 13/3/01 at 12:48 AM</comment></codeline>
	<codeline></codeline>
	<codeline><command>C_TEXT</command>($1;$textParameter)<comment>`some information about the 1st parameter</comment></codeline>
	<codeline><command>C_LONGINT</command>($2;$longintParameter)<comment>`some information about the 2nd parameter</comment></codeline>
	<codeline><command>C_POINTER</command>($3;$pointerParameter)<comment>`some information about the 3rd parameter</comment></codeline>
	<codeline></codeline>
	<codeline>$textParameter:=$1</codeline>
	<codeline>$longintParameter:=$2</codeline>
	<codeline>$pointerParameter:=$3</codeline>
	</code>
					<para>Once the contents of the method have been extracted from the 4D Insider Export, the following regex pattern can be applied using the <plugincommand>QF_REExtract</plugincommand> command to extract parameter information from the method contents.  This can be very useful to help create documentation and Insider comments for use with QCP's HotHelp and AutoComplete features.</para>
					<code>
<codeline><bold>Form Event Handlers</bold></codeline>
<codeline>C_([A-Z]+)\([$]([0-9]);[$]([a-zA-Z_0-9]+)\)</codeline>
<codeline></codeline>
<codeline>Groups: 1 - Variable Type, 2 - Parameter Number, 3 - Local Variable Name</codeline>
</code>
					<para>In order to use these last couple of patterns, you need keep to a fairly strict style in your code.  Personally, I like opening methods which take parameters in this way as I can give the local variable which takes the parameter a more meaningful name to help make the code more readable.  This gives me the added benefit of having something which is easy to parse for documentation purposes later on.</para>
					<para>Using QCP Macros for creating skeletons for methods is a good way of maintain a coding style.  As you will see in the next section, QCP also contains features to help save on typing these types of parameter declarations and assignments using AutoComplete and the new regular expression features.</para>
				</head_section>
			</head_section>
		</section>
<!--
		========================
		Section 4 of the Article
		========================
-->
		
		<section>
			<title>Using the preg_module</title>
			<short_summary>Examples of using the preg_module for regular expression matching.</short_summary>
			<head_section>
				<head>Using the preg_module</head>
				<para>I wrote the <a href="/products/freeware/4d/preg_module/index.phtml">preg_module</a> after spending lots of time using PHP for web application development.  PHP has two kinds of regular expression, standard POSIX compliant regular expressions, and Perl compatible regular expressions (PCRE).  The Perl compatible regular expressions in PHP are based on a very similar library to those routines in QFree 2.0, but the routines themselves aren't quite as straightforward to use in QFree 2.0, although they are just as powerful.</para>
				<para>The <a href="/products/freeware/4d/preg_module/index.phtml">preg_module</a> contains 10 methods which allow the developer to use the QFree functions in a similar manner to the PHP Perl compatible regular expressions, although keeping QFree normal syntax for the regex patterns themselves.</para>
				<para>Here, I'm only going to show the routines for doing regex matching on text variables, but in the module, each of these routines also has a version for working on BLOBs.  For more information on the BLOB versions of these routines, please download the module itself where there is further documentation.</para>
				<head_section>
					<head>Function: preg_Match</head>
					<para>Earlier we looked at validating an email address using regular expressions.  Although the code was fairly straightfoward, it still wasn't as easy as doing the same thing in a language such as PHP.  The same code in PHP could have been written as:
					</para>
					<example_xml>
&lt;?php

  function validate_emailaddress($emailaddress) {

    $pattern = '/^([-a-zA-Z0-9_.]+)@(([-a-zA-Z0-9_]+[.])+[a-zA-Z]+)$/i' ;
    if(preg_match($pattern, $address)) {
      return TRUE ;
    } else {
      return FALSE ;
    }
  }

?&gt;
					</example_xml>
					<para>The preg_module allows you to write your 4D version in a similar fashion to that in PHP.  The following example shows how to write the above PHP script in 4D using the preg_module.</para>
<code>		
<codeline><comment>`Method: validate_EmailAddress(emailAddress) --> TRUE if value</comment></codeline>
<codeline></codeline>
<codeline><command>C_TEXT</command>(<local_var>$1</local_var>;<local_var>$emailAddress</local_var>)</codeline><codeline><command>C_BOOLEAN</command>(<local_var>$0</local_var>)</codeline>
<codeline></codeline>
<codeline><local_var>$emailAddress</local_var>:=<local_var>$1</local_var></codeline>
<codeline><local_var>$pattern</local_var>:="^([-a-zA-Z0-9_.]+)@(([-a-zA-Z0-9_]+[.])+[a-zA-Z]+)$(?i)"</codeline>
<codeline></codeline>
<codeline><command>If</command> (<method>preg_Match</method> (<local_var>$pattern</local_var>;<local_var>$emailAddress</local_var>))</codeline>
<codeline>  $0:=<command>True</command></codeline>
<codeline><command>Else</command> </codeline>
<codeline>  $0:=<command>False</command></codeline>
<codeline><command>End if</command> </codeline>
</code>				
					<para>As you can see, this routine is now very similar to the PHP example and for me, is easier to read what's going on.  There is one main difference between the two, however, and that is the regex pattern itself.  In PHP's Perl compatible regular expressions, the pattern syntax is very similar to Perl's itself, i.e. the pattern begins with a slash ("/") and ends with another slash plus any options, in this case, a slash then a letter "i" to denote that the match is to be case-insensitive ("/pattern/i").</para>
					<para>QFree's regex routines, although based on PCRE, don't use this syntax for the patterns (although I may add support for this in the preg_module at a later stage).  Instead, the pattern options can be inserted as "extension operators" in the form "(?....)".  If you look at the 4D example code, you will see that the regex pattern is identical to the one in the PHP example, except that the leading and trailing "/"s have been removed and the "i" has been replaced with "(?i)".</para>
					<para>Using extension operators in the pattern in this way is very useful as the regex flags don't need to be passed to the QFree commands in a separate parameter (making them easier to store in TEXT resources), but also because they can be applied to a submatch within the regular expression as well as on the their pattern.  The following examples show how the case-insensitivity option can be applied to a submatch.</para>
<code>
<codeline><method>preg_Match</method> ("ab(c)";"abc")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("ab(c)";"abC")  = <command>False</command></codeline>
<codeline><method>preg_Match</method> ("ab((?i)c)";"abc")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("ab((?i)c)";"abC")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("ab((?i)c)";"ABc")  = <command>False</command></codeline>
<codeline><method>preg_Match</method> ("ab(c)(?i)";"ABC")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("ab(c)(?i)";"aBc")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("ab(c(?i))";"abC")  = <command>False</command><comment>`extension op not at the start of the group</comment></codeline>
<codeline><method>preg_Match</method> ("(?i)ab(c)";"abc")  = <command>True</command></codeline>
<codeline><method>preg_Match</method> ("(?i)ab(c)";"ABC")  = <command>True</command></codeline>
</code>
					<para>Using <method>preg_Match</method> instead of <plugincommand>QF_REMatch</plugincommand> allows a simple regex match to be done in far less code, but at first glance, it doesn't have the same level of features as <plugincommand>QF_REMatch</plugincommand> because it doesn't return the matched groups.  Or does it?  <method>preg_Match</method> actually has an optional 3rd parameter which takes a pointer to a Text Array.  If this parameter is passed, <method>preg_Match</method> returns the match and any submatches into the array passed.  This works just the same as the PHP preg_match function.  We can extend the previous example slightly by adding an additional text array and pointer to it. e.g.</para>

<code>		
<codeline><comment>`Method: validate_EmailAddress(emailAddress) --> TRUE if value</comment></codeline>
<codeline></codeline>
<codeline><command>C_TEXT</command>(<local_var>$1</local_var>;<local_var>$emailAddress</local_var>)</codeline><codeline><command>C_BOOLEAN</command>(<local_var>$0</local_var>)</codeline>
<codeline></codeline>
<codeline><local_var>$emailAddress</local_var>:=<local_var>$1</local_var></codeline>
<codeline><local_var>$pattern</local_var>:="^([-a-zA-Z0-9_.]+)@(([-a-zA-Z0-9_]+[.])+[a-zA-Z]+)$(?i)"</codeline>
<codeline></codeline>
<codeline><command>ARRAY TEXT</command> (aMatches;0)</codeline>
<codeline></codeline>
<codeline><command>If</command> (<method>preg_Match</method> (<local_var>$pattern</local_var>;<local_var>$emailAddress</local_var>;->aMatches))</codeline>
<codeline>  $0:=<command>True</command></codeline>
<codeline>  <command></command>$mailusername:=aMatches{1}</codeline>
<codeline>  <command></command>$serverdomain:=aMatches{2}</codeline>
<codeline><command>Else</command> </codeline>
<codeline>  $0:=<command>False</command></codeline>
<codeline><command>End if</command> </codeline>
</code>				

					<para>Now the example extracts the mail username and server domain from the emailaddress, but obviously, in this example nothing is actually done with them.  This example is only to highlight how easy it is to extract the individual submatches using <method>preg_Match</method>.  The text matched by the entire expression is found in <processvar>aMatch{0}</processvar>.
					</para>
				</head_section>
				<head_section>
					<head>Function: preg_Match_All</head>
					<para>This function is basically an easier way of calling <plugincommand>QF_REExtract</plugincommand>.  It allows every match of a particular pattern in a piece of text to be returned in one go, into an easy to use 2D Text array.  The 2D Text Array is the perfect way of returning results from this type of function and for me, makes the resultant code easier to read and understand.</para>
					<para>
						If we look at our earlier example for reading 4D NUG digests, we can simplify the code quite a bit by using preg_Match_All instead of <plugincommand>QF_REExtract</plugincommand> directly.
					</para>
				</head_section>
				<head_section>
					<head>Function: preg_Replace</head>
					<para>This function allows you to do Find and Replace using a regular expression on a piece of text.  You need to pass the routine your match pattern, substitution pattern and the target text and it will return the result.  This function is extremely useful in certain circumstances and is something which I use in BBEdit all the time.  Being able to use regular expressions for Find and Replace is an extremely powerful feature.</para>
					<para>Unlike the <plugincommand>QF_REReplace</plugincommand> command in QFree, this version is non-destructive, i.e. the text you pass will not be modified in any way.  This is the same with the BLOB version of this method.
					</para>
				</head_section>
				<head_section>
					<head>Function: preg_Split</head>
					<para>
						This function is similar to <method>preg_Match_All</method>, except that it returns only a single dimension array contain the text split by the regex.  This method uses the <plugincommand>QF_RESplit </plugincommand>command to perform the split.  The only difference is that it return a text array, rather than 2 longint arrays containing the offset and lengths of each segment.
					</para>
					<para>This has many applications, but for the moment I'll just use one simple example to try and give an idea of it's potential.  The following example function splits a line of text into an array with each element containing a word from the line of text.
					</para>

<code>		
<codeline><comment>`Method: split_Text(text; ->textArray)</comment></codeline>
<codeline></codeline>
<codeline><command>C_TEXT</command>(<local_var>$1</local_var>;<local_var>$text</local_var>)</codeline><codeline>
<command>C_POINTER</command>(<local_var>$2</local_var>)</codeline>
<codeline></codeline>
<codeline><local_var>$text</local_var>:=<local_var>$1</local_var></codeline>
<codeline><local_var>$pattern</local_var>:="[\s,.-;:]+"</codeline>
<codeline></codeline>
<codeline><command>If</command> (<method>preg_Split</method> (<local_var>$pattern</local_var>;<local_var>$text</local_var>;$2)=0)</codeline>
<codeline>     `do something with array...</codeline>
<codeline><command>Else</command> </codeline>
<codeline>  <command>ARRAY TEXT</command>($2->;0) </codeline>
<codeline><command>End if</command> </codeline>
</code>				
					<para>If we passed this current paragraph of text as the text parameter to this function, plus a pointer to a the global text array, <processvar>aMatches</processvar>, we would get the following elements in <processvar>aMatches</processvar> at the end.
					</para>
<code>
<codeline>aMatch{1}:="If"</codeline>
<codeline>aMatch{2}:="we"</codeline>
<codeline>aMatch{3}:="passed"</codeline>
<codeline>aMatch{4}:="this"</codeline>
<codeline>aMatch{5}:="current"</codeline>
<codeline>aMatch{6}:="paragraph"</codeline>
<codeline>aMatch{7}:="of"</codeline>
<codeline>aMatch{8}:="text"</codeline>
<codeline>aMatch{9}:="as"</codeline>
<codeline>aMatch{10}:="the"</codeline>
<codeline>aMatch{11}:="text"</codeline>
<codeline>aMatch{12}:="parameter"</codeline>
<codeline>aMatch{13}:="to"</codeline>
<codeline>aMatch{14}:="this"</codeline>
<codeline>aMatch{15}:="function"</codeline>
<codeline>aMatch{16}:="plus"</codeline>
<codeline>aMatch{17}:="a"</codeline>
<codeline>aMatch{18}:="pointer"</codeline>
<codeline>aMatch{19}:="to"</codeline>
<codeline>aMatch{20}:="a"</codeline>
<codeline>aMatch{21}:="the"</codeline>
<codeline>aMatch{22}:="global"</codeline>
<codeline>aMatch{23}:="text"</codeline>
<codeline>aMatch{24}:="array"</codeline>
<codeline>aMatch{25}:="aMatches"</codeline>
<codeline>aMatch{26}:="we"</codeline>
<codeline>aMatch{27}:="would"</codeline>
<codeline>aMatch{28}:="get"</codeline>
<codeline>aMatch{29}:="the"</codeline>
<codeline>aMatch{30}:="following"</codeline>
<codeline>aMatch{31}:="elements"</codeline>
<codeline>aMatch{32}:="in"</codeline>
<codeline>aMatch{33}:="aMatches"</codeline>
<codeline>aMatch{34}:="at"</codeline>
<codeline>aMatch{35}:="the"</codeline>
<codeline>aMatch{36}:="end"</codeline>
</code>
				</head_section>
				<head_section>
					<head>preg_Module Summary</head>
					<para>Developers who have spent any time using the Perl compatible regular expressions in PHP should definitely like the syntax of these routines as they're very similar to their PHP counterparts.  They can also cut down on coding when only a simple match is required or when you want to work with text variables instead of the QFree regex command standard way of working with BLOBs.  The module also contains BLOB equivalents of all these functions.</para>
				</head_section>
			</head_section>
		</section>
<!--
		========================
		Section 5 of the Article
		========================
-->
		
		<section>
			<title>Using Regular Expressions in the Method Editor</title>
			<short_summary>Some examples of useful regular expressions for use with QCP 4.2's Regex find and replace functionality.</short_summary>
			<head_section>
				<head>Using Regular Expressions in the Method Editor</head>
				<para>Automated Solutions Group answered many peoples' prayers when they added Regex support to QuickCodePro 4.2.  The only problem with QuickCodePro now is that it's still not available for Windows, but hopefully sometime in the not too distant future, this too will be realised with version 5.</para>
				<para>So what would you use regular expression for in the method editor?  There are quite a few uses for regular expression find and replace in the 4D Method Editor.  The only thing which is currently missing is the ability to find and replace with a regex on the selected text only or replace all from the current cursor position, but hopefully this refinement will be made in future versions.  At the time of writing it was possible to Find individual matches forwards from the current cursor position, and replace them using the regex substution one by one, so some of the following examples can still be used.</para>
				<head_section>
					<head>Converting Parameter Declarations to Assignments</head>
					<para>In nearly all 4D method I write, I begin by declaring any parameters and assigning their values to local variables which give a better indication of the parameters contents.  All my methods start the same way, and I've always wanted a quick method of converting the declarations into the assignments to save myself a bit of time.  Although not the perfect solution yet, QCP comes to the rescue with this particular operation.  Using the regex find and replace, I can fairly quickly copy and paste the declarations, then apply the regex to each copied line to produce my variable assignments.  The following code example shows how it's done.</para>
					<code>
	<codeline><comment>`Method: Regex Parameter Assignment Creation</comment></codeline>
	<codeline></codeline>
	<codeline><command>C_TEXT</command>($1;$textParameter)<comment>`some information about the 1st parameter</comment></codeline>
	<codeline><command>C_LONGINT</command>($2;$longintParameter)<comment>`some information about the 2nd parameter</comment></codeline>
	<codeline><command>C_POINTER</command>($3;$pointerParameter)<comment>`some information about the 3rd parameter</comment></codeline>
	<codeline></codeline>
	<codeline>$textParameter:=$1</codeline>
	<codeline>$longintParameter:=$2</codeline>
	<codeline>$pointerParameter:=$3</codeline>
	</code>
					<para>In this example I want to be able to create the 3 assignments after the 3 declarations without retyping them.  Ideally I'd love it if ASG could provide some way of writing a QCP Macro for this, but in the meantime it can be achieved using the following regular expression, and a couple of steps.</para>
					<code>
	<codeline><bold>Find String:</bold></codeline>
	<codeline>^C_..*\(([$][0-9]+)[;]([$].+)\).*$</codeline>
	<codeline></codeline>
	<codeline><bold>Replace String:</bold></codeline>
	<codeline>\2:=\1</codeline>
	</code>
					<para>To use this regex, type the parameter declarations as shown, then make a copy of them a line below, e.g.</para>
					<code>
	<codeline><comment>`Method: Regex Parameter Assignment Creation</comment></codeline>
	<codeline></codeline>
	<codeline><command>C_TEXT</command>($1;$textParameter)<comment>`some information about the 1st parameter</comment></codeline>
	<codeline><command>C_LONGINT</command>($2;$longintParameter)<comment>`some information about the 2nd parameter</comment></codeline>
	<codeline><command>C_POINTER</command>($3;$pointerParameter)<comment>`some information about the 3rd parameter</comment></codeline>
	<codeline></codeline>
	<codeline><command>C_TEXT</command>($1;$textParameter)<comment>`some information about the 1st parameter</comment></codeline>
	<codeline><command>C_LONGINT</command>($2;$longintParameter)<comment>`some information about the 2nd parameter</comment></codeline>
	<codeline><command>C_POINTER</command>($3;$pointerParameter)<comment>`some information about the 3rd parameter</comment></codeline>
	</code>
					<para>Now, put the cursor in between the two groups of declarations and go to QCP's "Find and Replace" dialog.  Make sure the "Regex" checkbox is selected and enter the find and replace strings as shown above.  Now hit "Find" and the first declaration in the 2nd group should be highlighted.  Use Cmd-R to replace the string and then Cmd-G to find the next one.  Repeat until all the declarations in the 2nd group have been converted to assignments as shown above.</para>
					<para>QCP remembers the recent patterns used, so you don't necessarily have to keep retyping them.  Also, if you have a regular expression you're using regularly, you can add it to the "Pattern" menu on the QCP Find and Replace dialog for easy access in future.</para>
				</head_section>
				<head_section>
					<head>Replacing calls to commands and changing parameters simultaneously</head>
					<para>Another example I can think of is to replace all occurances of a particular command across a group of methods with a new command, but at the same time, changing the order of the parameters.  This would be a pain-staking task without regular expressions as you could never be sure that the variables or literals used in each call would be the same, so a normal find and replace would be useless.  In the following example I want to replace all calls to <command>GET LIST ITEM</command> when 4 parameters have been passed, and convert them to 2 seperate calls from my hierarchical list component.  In this case, I don't necessarily know the names of the variables I've used in the <command>GET LIST ITEM</command> call, so the regex match is the only sure way of getting it right.  The calls to <command>GET LIST ITEM</command> could take the form of one of the following:</para>
					<code>
	<codeline><command>GET LIST ITEM</command>($list;<command>Selected list item</command>($list);$itemRef;$itemText)<comment>`a comment</comment></codeline>
	<codeline><command>GET LIST ITEM</command>($list;$position;$itemRef;$itemText)</codeline>
	</code>
					<para>If a line using this has a comment, I want to preserve the comment in the substituted text.  The replacements for these two would look like the following:</para>
					<code>
	<codeline>$itemRef:=<method>hlist_Item_Ref_Get</method>($list;<command>Selected list item</command>($list))<comment>`a comment</comment></codeline>
	<codeline>$itemText:=<method>hlist_Item_Text_Get</method>($list;<command>Selected list item</command>($list))</codeline>
	<codeline></codeline>
	<codeline>$itemRef:=<method>hlist_Item_Ref_Get</method>($list;$position)</codeline>
	<codeline>$itemText:=<method>hlist_Item_Text_Get</method>($list;$position)</codeline>
	</code>
					<para>The following regex and substitution strings do this job for you using QCP.  <bold>Note:</bold> in this example, the replace string has a line break in it.  This must be entered into QCP replace string box as \n or \r don't work here.  The easiest way to enter the line break is to copy and paste the string from another application.  There must also be an additional line break at the end of the replace string.</para>
					<code>
	<codeline><bold>Find String:</bold></codeline>
	<codeline>GET LIST ITEM\(([^;]+);([^;]+);([^;]+);([^;]+)\)(.*)$</codeline>
	<codeline></codeline>
	<codeline><bold>Replace String:</bold></codeline>
	<codeline>\3:=hlist_Item_Ref_Get(\1;\2)\5</codeline>
	<codeline>\4:=hlist_Item_Text_Get(\1;\2)</codeline>
	</code>
				</head_section>
				<head_section>
					<head>Additional Replace Options in QuickCodePro</head>
					<para>QCP has some additional replace options in addition to those offered by PCRE.  These allow you to do things like adjust the capitlisation of strings matched.  Options like these are useful when enforcing style guide lines, especially as 4D isn't case sensitive.  It's easy to get lazy about keeping your capitlisation intact, and using these options in QCP it is now much easier to go over many routines in one go to ensure they stick to the style guide.</para>
					<para>Previously, this kind of operation would have taken many separate find and replace operations, whereas it can now be carried out in just a few.</para>
					<para>One simple example is making sure that all local variables start with a lowercase character.  This may not always be practical, but it show one example of how the additional featuers can be applied.</para>
				</head_section>
			</head_section>
		</section>
<!--
		========================
		Section 6 of the AaMatch{1}:="rticle"
		Conclusion, etc.
		========================
-->
		
		<section>
			<title>Conclusion</title>
			<short_summary>The end of the article.</short_summary>
			<head_section>
				<head>Conclusion</head>
				<para>This article has shown a number of ways in which regular expressions can be applied to certain tasks which a 4D developer may have to get involved with.  There are many uses of regular expressions which I haven't yet covered and hope to add to the article over the coming months.  So far, I've only covered validating an email address, but there are a huge number of other form validation tasks which regular expressions made extremely easy.</para>
				<para>Another thing is that the regular expression support in QCP is a very new feature, so it still hasn't had time to sink in with me at least.  I didn't expected to get this feature until version 5.</para>
				<para>The regular expression engines used in <a href="http://www.escape.gr/" target="_blank">QFree 2.0</a> and <a href="http://www.asgsoft.com/" target="_blank">QuickCodePro 4.2</a> are very similar, so you don't need to learn different types of regular expressions to get the most out of either product.</para>
				<para>Personally, I'm only sorry it took me so many years to sit down and learn about them.</para>
			</head_section>
			<head_section>
				<head>Disclaimer</head>
				<para>As stated earlier in the article, I am still a novice with regular expressions, but learn a bit more about them every day.  As such I cannot gaurantee that any of the Regex patterns in this tutorial will work correctly in your own environment, nor can I give any warranty on them.</para>
				<para>All the patterns and code shown in this tutorial has been tested with 4th Dimension 6.7.1, QFree 2.0 and QuickCodePro 4.2b6 on a PowerMac G4/400 with MacOS 9.0.4 and worked fine here during testing.  If you have a problem with any of the code displayed, please let us know so that we can get it fixed as soon as possible.</para>
				<para>As the patterns and code in this article get further optimised and improved, I will be posting it to the site.  If you'd like to be kept imformed of changes to this page, please send me an email.</para>
				<para><bold><a href="mailto:mark@mitchenall.com">Mark Mitchenall</a></bold>, March 2001</para>
			</head_section>
		</section>
	</article>
</document>

