APL in 20 Minutes

This article is currently under construction

Which flavour of APL?

All code in this article is supposed to work with either APL2 or APLX or Dyalog APL. There are minor differences, these will be mentioned.

Run APL and use the Session Manager

As a starting point let us assume that an imaginary user has just started APL by selecting the appropriate command from the Windows "Start" menu.

What you get is APL's development environment, a so-called session-manager. Since APL is an interpreted language, you can type something into the session window and then press <Enter>. APL will try to evaluate your expression and display the result, or it will tell you that something is wrong. The symbol ⍝, for obvious reasons called "lamp", indicates a comment: anything on the right of a lamp character is therefore ignored by the interpreter.

Here you can see some simple examples. Input lines are indented by 6 characters, the interpreter's response starts on the left:

Okay, we got a first impression.

APL: Powerful, Short, Concise

It is believed that with APL even complex problems can be solved in some lines, if not a single one. Is this really true?

I suggest that we solve a small (but not too small) real problem: Let us take the source code of a web page and remove the entire HTML tags from this. As a result we expect to see the real content of the page and nothing else. How fast can this be done?

The task

To make life a bit easier we are going to deal with well-formed code only. So let's take code from a website which is definitely supposed to provide well-formed code, although most HTML pages on the web are still syntactically incorrect:

http://www.w3.org/

You should see something similar to this in your browser window:

Right-click on the page and select "View Source" from the context menu. In a separate window you will now see something similar to this:

source.gif

Put the focus onto the text, then press Ctrl+A to select the entire HTML code and finally press Ctrl+C to copy it into the clipboard. Now we return to APL's session window. We need to create a new variable which is supposed to hold the HTML code we have just copied. The following statement does the trick:

Dyalog:

While "myHtml" is the name of the variable we are going to create, ")ed" is a bit special: the closing parenthesis tells APL that this statement is a system command. The following two characters (ed) then tell APL to invoke the editor.

The ∊ characters, which is in fact a Greek character, is a shortcut which tells APL to create a special type of variable: one which can hold vectors of strings. After pressing <Enter>, a new empty window pops up.

Now we can insert the HTML code from the clipboard into the edit window by selecting "Edit>Paste" from the menubar of the edit window.

We will see something like this:

ed_1.gif

Examine the HTML Code

After selecting "Exit" from the "File" menu APL will establish the variable in what is called a "workspace". Let's examine the length of the variable we have just created. For this purpose there is a function called "shape", represented by the "⍴" symbol, which is in fact the Greek "rho" character. Generally, in APL a function may take one argument or two arguments or no argument at all. In our case it is exactly one argument, the variable. The variable then must be specified to the right of the function:

The result is possibly not exactly the same when this is done right now because the page might have changed in the meantime.

The 648 represents the number of lines (or records) in the file the source code was saved in.

To find out the length of each of the strings in the 648 items of myHtml, we need to introduce APL operators, a concept that is radically different from anything called "operator" elsewhere. In APL, an operator takes at least one function as an operand. It then creates a so-called "derived function" by applying operator-specific rules to that function or functions.

Sounds impressive and means nothing to you? Well, let's try to work out what that means in practice. APL comes with an operator "Each", represented by the " symbol. This operator takes its operand and applies it to all elements of the array provided to the right.

To find out the length of every single string in myHtml we have to provide the function ⍴ to every single item in that variable:

In fact this is a loop, executed exactly 648 times, but we do not need to know this, or to care about. <Ruby value="ignore">Every programmer must get exited right now!</Ruby>

Operators

Let us deviate from the main point for a moment and introduce another operator.

The operator "reduce", represented by the / characters, is defined as "take its operand (the function) and put it between all the items of the array passed to the derived function. The expression:

therefore means that according to the rules we have just defined, APL will build up this:

You might have an idea how extraordinary powerful this concept is, since you can specify any function here which fits, including self-defined ones.

Extract content from Code

The Strategy

Now we want to get the content from the code. For this we need a strategy:

So the strategy is easy: build up a mask of Booleans which allows us to get rid of all the HTML stuff. Let's work out how we can do this in APL. In the first step, we create a small string we can play with:

Find Start Points and End Points

In first place, we need to know where the < and the > characters are. For this, we have to master the "membership" function, represented by the Greek character ∊

It does exactly what the name suggests: it looks for every element in the left argument if it is contained in the right argument. If that is the case, a 1 is returned, otherwise a 0:

As you can see, Booleans are represented by 1 (true) and 0 (false) in APL.

For the next step we need another operator which is very close to the "reduce" operator we already met: it is called "expand". Let us start with some expressions to get familiar with this new operator:

Expand returns a result for every single step. Let's follow the interpreter step by step:

  1. The first item (the 1) is taken and printed
  2. The first and the second item are added up and the result (3) is printed
  3. The result we just got (3) is taken and added to the third element which results to 6
  4. The result we just got (6) is taken and added to the forth element which results to 10

So far so good. Let's try the same thing with the ≠ function. This is a very simple function that takes a left and a right argument and checks them for being different. The result is a Boolean:

In APL, you can use Booleans in arithmetic operations:

First 9=9 is processed (right-to-left!) which returns a 1 for true and then the 1 is added to 3.

Let's use the membership function to find out where the < and > are located:

And now we put some magic in place:

That is a big step forward. Let's look into the details:

According to the rule we just worked out, the "expand" operator \ performs the following steps:

  1. Take the first item (1) and print it
  2. Take the first and second item and pass them as left and right argument to the function, here ≠; That leads to 1 ≠ 0 which is true, so a 1 is returned.
  3. Take the last result (1) and take the third item: 1 ≠ 0 results in 1
  4. Take the last result (1) and take the forth item: 1 ≠ 0 results in 1
  5. Take the last result (1) and take the fifth item: 1 ≠ 1 results in 0
  6. Take the last result (0) and take the sixth item: 0 ≠ 0 results in 0
  7. Take the last result (0) and take the seventh item: 0 ≠ 1 results in 1

Now we can use the vector of Booleans we got to mask the HTML code. For this we use again "reduce" (/), but this time we do not provide a function as left operand but the vector of Booleans. An easy example:

As you can see, items associated with a 1 are still represented in the result while those associated with a zero are not. We can use this to do:

Upps. We where looking for the content, not the HTML code. We have to negate the Booleans. This can be done with the function ~:

We are almost there. Only that the starting character of any piece of HTML code has survived is still a problem. In the next step the Boolean "or" function is used to solve this problem

The curly braces define an anonymous function. No reason to go into the details here; let's only work out that inside the curly brackets the ⍺ character stands for an optional left argument, while the ⍵ character stands for the mandatory right argument. If ⍺ is set in the first line, this is taken as a default which might be overwritten if actually a left argument is provided. If we assign the curly brackets to a string, the string is taken as a name. That means that we could use the name to invoke the function later:

Note that we have defined a function "Strip" which in turn defines an anonymous function. However, this function is not visible to the outside world. The vector of Booleans is assigned to the variable "mask" which is by default a local one. That means that this variable exists only inside the "Strip" function.

Problem solved

Let us test our newly created function:

Looks good. Now let's perform this function to all items from the w3c's website with the help of the "each" operator and assign the result to a variable "content". We then will display the result in an editor window:

The result obviously contains many empty lines. If you are surprised: any line in the source code which contains nothing but HTML tags is now empty. This is true for all the lines containing <meta> tags, for example.

So let's remove all empty lines in a last step without looking into the details:

Having executed that expression, the contents of the editor window has changed:

Make it General

A last remark: the function "Strip" is able to do more than simply remove HTML code. Thanks to the fact that we can provide a left argument which will override the default (<>) we can use the function for other purposes as well:

Created by KaiJaeger


CategoryGuides