(alternate) Introduction to Custom markup for Beginners

(:Summary: PmWiki group header. Includes styles and trail.:)

{PmWiki.CustomMarkupAlt$:Audience}

(:Summary:Alternate introduction to custom markup for beginners:)

The most typical kind of "plugin" (I'll call it a "recipe" from here on out because that's how they're named in the PmWiki world) is to establish some kind of "markup rule". This means you are defining some particular "pattern" of text in your page which will cause some action and cause that particular text to be replaced with something else.

The simplest possible markup would be a straight replacement. Here is a markup to replace all occurrences of the letter "a" with the letter "z":

Markup('a2z', '>', '/a/', 'z');

Then your page with this text:

The alphabet begins with "abc"

will display as this:

The zlphzbet begins with "zbc"

It's not very useful, but it gives you the most basic idea of what markup text is doing.

Creating a new markup involves calling the PHP Markup() function. This is usually done by editing your config.php, but you can also put it in a custom group or custom page PHP file -- you can read about those options at LocalCustomizations and PerGroupCustomizations.

The Markup() function takes 4 arguments:

1: The arbitrary name you are going to give your new markup. It should be short but descriptive. Be careful you don't use the same name as another markup out there or that markup will no longer be active. (You can see the standard markup definitions in scripts/stdmarkup.php.)

2: An indicator of WHEN you want this to occur. PmWiki has dozens of these markup rules and it makes a big difference in what order they occur. If one markup rules (#1) changes all occurrences of "a" into "b" and another markup (#2) changes all occurrences of "az" into "zz" it obviously makes a big difference in what order they occur. If #1 occurs before #2 on the text "azazaz" then you will end up with "bzbzbz". But if #2 occurs before #1 then you will end up with "zzzzzz". This argument is normally specified as a left-angle bracket ("before") or a right-angle bracket ("after") followed by the name of another rule. In my experience the most significant rule in terms of ordering is "" which substitutes variables -- if you say "<" then your markup will be processed before variables are substituted whereas if you say ">" then your markup will be processed after variables are substituted. But there are lots of other places in the whole order of rules -- someone else will have to go into more detail if you need it. That CustomMarkup page gives some good pointers there.

Arguments 3 and 4 are simply arguments which will be passed to preg_replace. You search for argument #3 and you replace it with argument #4.

3: This is a regular expression. It can be as simple as "/a/" (match every occurrence of the character "a") up to very complicated and intricate patterns. Every time this pattern matches in your text it will be replaced with argument #4. Note that your pattern is always surrounded by forward slashes and there can be modifiers after the closing forward slash. These modifiers are single characters which you can read more about them at http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php. The key ones are "i" (ignore case), "s" (allow dot to match newlines), "m" (allow ^ and $ to match before/after newlines as well as begin/end of strings), and [perhaps most importantly in this context] "e" (evaluate the replacement text as a PHP expression - this allows you to call functions to do much more complicated things than a simple search/replace).

4: This is the replacement text. It can be a simple string or it can include things like $1, $2, etc if you have parenthesized groups in argument #3 (you've got to be careful to put backslashes in front of the $ or else surround it in single-quotes, etc to delay the interpolation of those variables). Or it can be a call to a PHP function if you included the /e modifier in argument #3. Once you are into PHP functions then you need to read some of the many PHP tutorials on the net to see which way to go.

Having said all that, the single best way to learn how to write your own recipe or markup is to look at examples of what other people have done.

The `(:comment ...:)` markup rule

Here is the definition of the markup rule for the (:comment ...:) markup from scripts/stdmarkup.php:

Markup('comment', 'directives', '/\$:comment .*?:\$/i', '');

The purpose of the this markup is to allow you to put some kind of text in your source that is simply not displayed when browsing the page. So let's look at each argument:

ARG1: 'comment' -- a short, descriptive name - the ID of the rule
ARG2: 'directives' -- this is one of the 9 phases to answer the question when should the markup rule be processed.
ARG3: '/\$:comment .*?:\$/i' -- this is a regular expression that will match (:comment ANY TEXT HERE:) -- this is the pattern that will be searched for. Since the last slash is followed by an i ("/i") then (:COMMENT some text:) and (:CoMmEnT some text:) would be matched as well -- the pattern being matched is case insensitive.
ARG4: '' -- any occurrence of that pattern will be replaced with NOTHING. Thus the comment will simply disappear which is exactly what you want.

The `(:include ...:)` markup rule

Let's look at another example from stdmarkup.php, the (:include PAGENAME:) markup rule. This rule is designed to pull the text from another page into the current page. Here's how it's defined in stdmarkup.php:

Markup('include', '>if',
  '/\\(:include\\s+(\\S.*?):\\)/ei',
  "PRR(IncludeText(\$pagename, PSS('$1')))");

Each argument, in order:

ARG1: 'include' -- short and descriptive identification of what this rule does
ARG2: '>if' -- process this rule after the rule with the ID 'if'
ARG3: '/\$:include\\s+(\\S.*?):\$/ei' -- this regular expression matches a pattern (:include pagename:) where "pagename" is any sequence of non-whitespace characters. (Whitespace is a space or a tab or a newline character.) The important change is the /ei at the end. You already know tha the "i" means to make the match case insensitive. The "e" means that the replacement text is a PHP expression that should be evaluated. (It also means that a bunch of backslashes will be put in front of certain characters that come from the search pattern when you use parentheses for regex captures.) Note that the "\\S.*?" is surrounded by parentheses which means it will be captured and available as $1 in the replacement text.
ARG4: "PRR(IncludeText(\$pagename, PSS('$1')))" -- this is what the search pattern will be replaced with. But since that "e" was present above, this text will be interpreted as a PHP expression and evaluated. Note several things about this text that are important:
- any variable names are protected from immediate substitution, either by putting a backslash in front of the dollar sign or by putting single-quotes around them.
- The function PSS('$1') "Strips Slashes" -- it gets rid of those slashes that are inserted by the /e option above.
- The function IncludeText() is a PmWiki function which accepts 2 arguments (a reference pagename and the pagename of the text that should be retrieved) and it returns the text from that page. This means that the (:include pagename:) in your source will be replaced by the text from that page.
- The surrounding PRR() just tells tells PmWiki to run through the markup rules again in case something came in from the included text that needs to be processed by rules that have already been processed. PRR() always returns the value of its argument, so it's kind of a transparent function that has a side effect...

The `(:nogroupheader:)` markup rule

This markup rule is for the purpose of suppressing a group header from being displayed. For this to occur the global variable $GroupHeaderFmt must be set to a blank string. Here's the markup definition:

Markup('nogroupheader', '>include',
  '/\\(:nogroupheader:\\)/ei',
  "PZZ(\$GLOBALS['GroupHeaderFmt']='')");

ARG1: 'nogroupheader' -- short and descriptive identification of what this rule does
ARG2: '>include' -- process this rule after the rule with the ID 'include' (yes, that's the rule we just looked at above)
ARG3: '/\$:nogroupheader:\$/ei' -- a very simple regular expression with the same /ei options we saw above
ARG4: "PZZ(\$GLOBALS['GroupHeaderFmt']='')"
- Once again the variables are protected by escaping the dollar sign (placing a backslash in front of it)
- The global variable $GroupHeaderFmt is accessed by the PHP super-global $GLOBALS[]
- The core of this rule has this PHP text: $GLOBALS['GroupHeaderFmt']='' -- it sets that value to blank
- The surrounding PZZ() simply says to get rid of any return value -- the search pattern will be replaced with a blank string

Documentation index

Variables

(alternate) Introduction to Custom markup for Beginners

The (:comment ...:) markup rule

The (:include ...:) markup rule

The (:nogroupheader:) markup rule

The `(:comment ...:)` markup rule

The `(:include ...:)` markup rule

The `(:nogroupheader:)` markup rule