Data (9 of 14)
Contents
APL owes a considerable amount of its power and conciseness to the way it handles data. Many lines of code in non-APL programs are devoted to 'dimensioning' the data the program will use and to setting up loops and counts to control data structure. With APL you can create variables dynamically as you need them and the structure you give a data item when you create it determines how it will be treated when it's processed.
Data is an important subject in APL. The rest of this chapter is a survey of its main characteristics.
Variables
As in most programming languages, data can be directly quoted in a statement, for example:
234.98 × 3409÷12.4
or it can be 'assigned' to a name by the symbol .., in which case it's called a variable:
VAR ← 183.6
We concentrate on variables in this chapter, but the comments on data type, size and shape are equally applicable to directly quoted numbers and characters.
Names
Variables, user-defined functions and user-defined operators have names which are composed of letters and digits. The full rules vary from one APL to another, but here are some examples:
PRICE A albert A999 ITEM∆1 THIS_ONE That¯One
APL uses upper-case and lower-case characters. Many APLs also allow the symbols Delta (∆), underlined Delta (⍙), the underline (_), and the high minus (¯).
Types of data
Data can be numbers, characters or a mixture of the two. Characters are enclosed in single quotes and include any letter, number or symbol you can type on the keyboard, plus other, non-printing characters. The space counts as a character:
this item is 7 characters long: '1 ABC. '
this is a single number: 84724.869
this is a number and a character: 12.3 'E'
Numeric digits, if enclosed in quotes, have no numeric significance and can't be involved in arithmetic.
this is a numeric value: 2876
this variable is composed of 3 characters: '749'
In addition to these basic types, some versions of APL also support complex numbers and some support APL classes and objects.
Size, shape and depth
An array in APL can be anything from a single letter or number to an N-dimensional array. Elements within the item may themselves be arrays. Here are some examples of data items:
a single number or a single character, formally known as a Scalar
e.g.
294
or
'A'
a list of numbers or characters, formally known as a Vector
e.g.
23 8 0 12 3
or
'A B C'
or
28 3 'A' 'BC'
a table of numbers or characters, formally known as a Matrix
e.g.
7 45 2 89 16 15 10 21 8 0 13 99 83 19 4 27
or
WILSO 393 ADAMS 7183 CAIRN 87 SAMSO 8467
As you'll have gathered, data is considered to have dimensions.
A single number or character scalar (like a point) has no dimensions. A vector has one dimension, length. A matrix has two dimensions, height and length. The word 'array' is a general term applicable to a data structure of any dimension. Arrays of many dimensions are possible in APL.
An array which contains other arrays is called nested. An array which does not is called simple.
This is how APL displays a three-dimensional array:
23 30 11 8 30 22 23 20 3 19 27 9 14 23 15 8 9 11 5 15 27 28 2 28 16 16 10 30 15 8 3 29 3 16 12 9
Each of the three blocks of numbers has two dimensions represented by the rows and columns. The three blocks form three planes which constitute another dimension, depth. You will notice that the array is displayed on the screen in such a way that you can identify the different dimensions. No spaces are left between the rows of each plane. One blank line is left between each plane. A four dimensional array would be displayed with two blank lines between each set of planes.
More complicated arrays, where some of the elements are themselves arrays, will also have a 'depth' which measures the degree of complexity of the structure. Thus a simple scalar has a depth of 0 and a structure whose elements are purely simple scalars (such as the array shown above) has a depth of 1. If any element of an array is itself an array, the array has a depth of 2. The depth will go on increasing with the complexity of the structure. An array which has an element which in turn has a non-scalar element has a depth of 3, and so on.
Setting up data structures
It isn't always necessary to explicitly define the size or shape of data:
X ← 23 9 144 12 5 0
In the case above, X is a six-element vector, by virtue of the fact that six elements are assigned to it. Vectors which contain both characters and numbers may be set up by enclosing the characters in ' (quote) characters. Here is another six-element vector, this time containing four numbers and two characters.
X ← 1 2 'A' 'B' 3 4
Explicit instructions would be necessary if we wanted the six elements to be rearranged as rows and columns. The two-argument form of the function ⍴ (Rho) is used to give such instructions:
2 3 ⍴ 23 9 144 12 5 0 23 9 144 12 5 0
The left argument specifies the number of rows (in this case 2) and the number of columns (in this case 3). The right argument defines the data to be arranged in rows and columns.
Notice that the dimensions are always specified in this order, that is: - columns are the last dimension - rows precede columns and, if there are only two dimensions, are the first dimension. In the case of data with more than two dimensions, the highest dimension comes first. So in the three-dimensional example used earlier, the plane dimension is the first dimension followed by the rows, then the columns. (The ordering of dimensions is an important point and will be discussed again later in this chapter.)
To return to the ⍴ function, if the data in the right argument is insufficient to fill the matrix as specified, APL simply goes back to the beginning of the data and uses it again. If too much data is supplied, APL uses it in the order given and ignores superfluous data.
Arrays of three or more dimensions are set up in a similar way to matrices. The following statement specifies that the data in a variable called NUMS is to be arranged in three planes, each consisting of three rows and four columns:
3 3 4⍴NUMS
The result would look like the three-dimensional array shown in the previous section.
The ⍴ function can also be used to set up vectors. This statement specifies that the number 9 is to be used to form a six-element vector:
6⍴9 9 9 9 9 9 9
Arrays of arrays (or 'nested arrays') may be set up by a combination of these rules. Here we set up another vector, some of whose elements are themselves vectors or matrices. Note the use of parentheses to indicate those elements which are actually arrays.
VAR ← (2 3⍴9) (1 2 3) 'A' 'ABCD' 88 16.1
The variable VAR is another six-element vector, but its first element is a 2 by 3 matrix, the second a three-element vector, the third a single character, and so on.
Data structure versus data value
A data structure has certain attributes, regardless of the specific data it contains. For example, a vector has one dimension while a single number has no dimensions.
You can take advantage of this fact.
If you intend to use a single number for certain purposes, it may be convenient to set it up as a one-element vector. In this next example X is defined as a one-element vector containing the value 22:
X ← 1 ⍴ 22
For contrast, here 22 is assigned to Y as a single number:
Y ← 22
The difference between X and Y will be seen if we apply the one-argument form of ⍴ to each of them. (This form of ⍴ tells you the size of each dimension of a data item.)
⍴X 1 ⍴Y (empty response)
Both variables contain the value 22. But X is a vector and has the dimension of length, so the ⍴ enquiry produces the answer 1 indicating that X is one-element long. On the other hand, Y is a single number with no dimensions. The answer 1 would be inappropriate since it would suggest that it had the dimension of length. So an empty answer is displayed.
The result of the ⍴ enquiry can itself be used as data in an APL statement. It might, for example be the basis of a decision about what to do next. For this reason, it may suit you to define a value sometimes as a one-element vector and sometimes as a single number.
Similarly, it may be convenient in certain situations to define a vector as a one-row matrix. Here Z is defined as a matrix of one row and five columns:
Z ← 1 5 ⍴ 12 5 38 3 6
It looks like a vector when displayed:
Z 12 5 38 3 6
But an enquiry about its size returns information about both its dimensions:
⍴Z 1 5
Empty data structures
Variables which have a structure but no content may also be useful, for example as predefined storage areas to which elements can be added. An 'empty vector' is a variable which has been defined as a vector, but which has no elements. Similarly, an 'empty matrix' has the appropriate structure, but no elements.
There are many ways of creating empty data structures. To take one example, the function ⍳ (Iota) produces a vector of the number of numbers in right hand argument. So ⍳0 produces the vector of no numbers, that is, a vector in which there are no elements:
X ← ⍳0
X contains no elements, as can be demonstrated by displaying its contents (nothing is displayed):
X
But it is a vector (albeit an empty one) and does have the dimension of length. If the one-argument form of ⍴ is used to enquire about the size of its dimensions, the answer 0 is returned:
⍴X 0
This indicates that its length is zero elements. Contrast this with the answer returned if you apply ⍴ to a single number (which has no dimensions):
⍴ 45
An empty answer is displayed since the item has no dimensions.
An empty matrix can be created in the same way as an empty vector. In the following example, an empty matrix is created consisting of 3 rows and no columns:
TAB ← 3 0⍴⍳0
Dimension ordering
When a function is applied to an item with more than one dimension, you need to know which dimension the function will operate on. If you apply an add operation to a matrix, for example, will it produce the sums of the rows or the sums of the columns?
COL 1 COL 2 COL 3 COL 4 ROW 1 1 + 2 + 3 + 4 = 10 ROW 2 5 + 6 + 7 + 8 = 26 ROW 3 9 + 10 + 11 + 12 = 42 == == == == 15 18 21 24
The rule is that unless you specify otherwise, operations take place on the last dimension.
The 'last' dimension is the one specified last in the size statement:
TABLE ← 3 4⍴DATA
The 4 above is the last of the two dimensions specified. It represents the number of columns.
An add operation 'on' the columns adds each element in column 1 to the corresponding element in columns 2, 3 and 4.
COL 1 COL 2 COL 3 COL 4 1 → 2 → 3 → 4 = 10 5 → 6 → 7 → 8 = 26 9 → 10 → 11 → 12 = 42
So, as can be seen, an add operation 'on' the columns produces the sum of the elements in each row.
Similarly, if you were to apply the add operation to the first dimension of the matrix, that is to the rows, it would add all the items in row 1 to the corresponding items in rows 2 and 3:
ROW 1 | 1 2 3 4 | ↓ ↓ ↓ ↓ ROW 2 | 5 6 7 8 | ↓ ↓ ↓ ↓ ROW 3 | 9 10 11 12 ↓ ↓ ↓ ↓ 15 18 21 24
So an add operation applied to the rows produces the sum of each column.
As already described, by default operations are applied to the last dimension (the columns). If you want to specify a different dimension, you can do so by using the axis ([]) operator which is discussed later in the Operators section.
Indexing
Warning: This section discusses Indexing - the selection of one or more elements from a vector or matrix. It leads to the question: What should the first element of a vector be called? Is it item 0 (as used in some other computer languages) or item 1? APL lets you choose either Index Origin using the system variable ⎕IO. All the examples in this Tutorial use Index Origin 1. If you get different answers when trying these examples try setting ⎕IO ← 1 |
To select elements from a vector or matrix a technique called indexing is used. For example, if you have a ten-element vector like this:
X ← 1 45 6 3 9 33 6 0 1 22
the following expression selects the fourth element and adds it to the tenth element:
X[4] + X[10]
Note that square brackets are used to enclose the index.
To index a matrix, two numbers are necessary, the row number and the column number:
TABLE 12 34 27 9 28 14 66 0 31 TABLE[3;2] 0
In the last example the index selected the element in row 3, column 2. Note the semicolon used as a separator between the rows and columns. Note also the order in which dimensions are specified. This corresponds to the order used in the ⍴ statement.
Items can be selected from data with three or more dimensions in exactly the same way:
DATA[2;1;4]
selects the item in plane 2, row 1, column 4 of a three-dimensional data structure.
To select an entire row from the matrix above you could type:
TABLE[1;1 2 3]
That is, you could specify all three columns in row 1. A shorter way of specifying this is:
TABLE[1;]
Similarly, to select a column, say column 2, you would enter:
TABLE[;2]
The expression you put in square brackets doesn't have to be a direct reference to the item you want to select. It can be a variable name which contains the number which identifies the item. Or it can be an expression which when evaluated yields the number of the item:
(3 8 4)[1+2] 4
The above statement selects item 3. The item selected by the following statement depends on the value of P when the statement is obeyed. If P contains 2, say, then the letter B is selected:
'ABCDE'[P] B
You can also use indexing to re-arrange elements of a vector or matrix:
'ABCDE'[4 5 1 4] DEAD
Finally note that the data or variables used within an indexing expression may be of a higher dimension than the object being indexed. Thus:
'ABCDE'[2 2⍴4 5 1 4] DE AD
For more details on this point check the entry for [] in the Operators Section. In addition to the [] (bracket) symbols, the ⌷ (squad) function can be used for indexing. The left argument to ⌷ indicates the element or elements to be indexed.
2⌷ 'ABCD'
selects the second element from 'ABCD'.