10952
Comment:
|
11039
|
Deletions are marked like this. | Additions are marked like this. |
Line 197: | Line 197: |
The attached application contains all the code needed to implement the ideas we have discussed. You can run it and follow the behavior with the Tracer. | The attached application contains all the code needed to implement the ideas we have discussed. You can run it and follow the behavior with the Tracer. It is a workspace created with Dyalog APL/W Version 10.0 ---- CategoryArticlesDyalog |
Error Trapping in Dyalog APL
The Problem
To write bug-free code in a complex system and to forecast all errors is impossible. Therefore, implementing some kind of error trapping in an application which is supposed to run in a productive environment is a must. But to do this in a general and efficient manner is not easy. This article discusses techniques to solve this problem.
Introduction
Versions covered
The techniques we are going to discuss have been available in Dyalog APL/W for a long time. Even with version 8.0, most examples should work. They will also work in version 11. When you try to implement error trapping you should be very careful: it is easy to implement a non-interruptible loop. If this happens you have to kill the process in the task manager and the workspace is lost, so it is a good idea to save the workspace before you execute code after a change.
Considerations
When an application is executed in a production environment, error trapping can be used to solve the following goals:
- Save a workspace as a snapshot reflecting the state of the application when the error appeared. This makes it easy to analyze a problem, and sometimes it is the only way to do so.
- After having saved a snapshot it might be possible to restart the application. This might enable the user to run the application with different data, or to run different parts of the application on the same data.
- Prevent the user from interrupting the application by pressing the keys the strong and the weak interrupt are associated with.
- Continue execution in case a developer has forgotten to remove a stop vector.
When an application is started, it needs to be initialized. If an error occurs at this early stage, normally there is no way to recover. Once an application is fully initialized, it might be a good idea to try to restart it. However, if this procedure crashes itself, we must prevent an endless loop from generating tons of useless snapshot workspaces.
Preparation
It is good programming style to avoid using numbers in code. Instead of talking about 1001, for example, we should use a meaningful name:
⎕cs ‘Events‘ #.⎕NS ‘‘ ⎕FX ‘r←StopVector‘ ‘r←1001‘ ⎕FX ‘r←WeakInterrupt‘ ‘r←1002‘ ⎕FX ‘r←StrongInterrupt‘ ‘r ←1003‘
In a large system you want those to be constants, so a user cannot change them. That’s why they are niladic functions.
We need also a user-defined event for restarting the application. This is explained soon:
#.Events.RestartAppl←501
According to the help file, users should use the range from 500 to 999 to define their own events.
Setting []TRAP
[]TRAP allows us to implement a general mechanism on a global level. For discussion purposes let’s assume the following:
- []LX is set to run function “Run”
- This function calls 3 sub-functions: “Initial“, “Work“ and “Shutdown“
- “Initial“ initializes the application: it opens files, interprets an INI file, takes the Windows registry into account, builds the GUI and so forth.
- “Work“ simply runs []DQ
- “Shutdown“ cleans up: it closes files, says good-by.
Solving the stop vector problem
Let’s start with solving the stop vector problem:
⎕TRAP←⊂(#.Events.StopVector ‘E‘ ‘→⎕LC‘)
#.Events.StopVector returns 1001 which is the event number a stop vector is associated with. As soon as APL stops on a stop vector []EN is set to 1001. This event can be caught with []TRAP, so we can tell APL to execute (‘E’) the expression given as third argument. In this case it tells APL to simply ignore stop vectors by resuming execution.
Preventing users from interrupting an application
The same technique can be used to prevent the user from interrupting the application, accidentally or purposely. Depending on the type of the application it might be a good idea to allow the user to interrupt the application by pressing either the key for the weak or the strong interrupt, to ask the user for confirmation and then to restart the application. This would allow the user to quit a lengthy operation that needs more time than expected.
Here, however, we will use this simple approach:
events←#.Events.WeakInterrupt #.Events.StrongInterrupt ⎕TRAP,←⊂(events ‘E‘ ‘→⎕LC‘)
Restarting the application
For reasons explained in a minute we now have to define the “Restart the application” procedure. For this, for the first time we do not use the ‘E’ statement but the ‘C’ statement. The “C” is a shortcut for “Cut back”. This instructs APL to cut the status indicator back to the level where []TRAP is localized – that is not necessarily where it was set – and execute the expression in the 3rd argument there. However, if []TRAP is not localized at all, i.e. it is in the workspace, the status indicator is cut back completely and the expression is executed in the workspace.
⎕TRAP,←⊂(#.Events.RestartAppl ‘C‘ ‘→∆Restart‘)
To make this work the function in which []TRAP is localized must have a label ∆Restart or a fn that returns a valid line number to branch to of course.
Catching Errors
If an unexpected error occurs, we want to execute a particular function to do the hard work.
⎕TRAP,←⊂((0 1000) ‘E‘ ‘#.HandleError‘ )
The 0 stands for all the events from 1 to 999 while the 1000 stands for all events larger than 1000.
[]TRAP may contain more than one error catching group. Since the contents of []TRAP is scanned from left to right, a statement will ONLY be executed for an event not processed earlier. That is the reason why we must define the restart event first.
For example, in the following statement:
⎕TRAP←(333 ‘E’ ‘expA’) (0 ‘E’ ‘expB’)
event 333 will be caught by the 1st group and NOT by the 2nd even though 0 stands for “events from 1 to 999”. Only the expression ‘expA’ will be executed.
The #.HandleError function
The HandleError function should do at least the following:
- Perhaps neutralize []LX
- Save the []DM and []EN settings
- Save the snapshot
- Maybe create an HTML page with general information about the error
- Ask the user about a restart
- Either try a restart of shut the application down
Misc
Developers and others
Of course the error trapping mechanism must distinguish between developers and others. Often it is good enough to check the APL version: use error trapping in case of runtime, otherwise not. If this is not possible, because some or all of the users are running the development version too, you can specify a parameter to tell the application that you are a developer. By default the application can then use error trapping.
Testing Error Trapping
Keep in mind that you want to have an easy opportunity to test the system with error trapping. So you may need another parameter that tells the system that error trapping has to be used. Last but not least, there should be an easy opportunity to let the application crash on purpose. I prefer to have a “developers menu”, which is displayed only to developers. Among other useful commands it offers a “Let’s crash” option.
Control Structure :Trap
If you use :Trap, keep in mind that []TRAP and :Trap are both taken into account. That means that in case of
:Trap 0 -’a’ :Else . :EndTrap
the error caused by the “-‘a’” statement is caught by the :else, while the ”.” is caught by the []TRAP setting.
When using :Trap try to be as specific as possible. For example, this code is faulty:
:Trap 0 filename ⎕FTIE 0 :else filename ⎕FCREATE 0 :EndTrap
because it tries to create a file not only if this file does not already exist but also if the current user lacks the right to tie it, for example because somebody else has already tied it exclusively. Therefore, it is a better to be specific:
:Trap 22 filename ⎕FTIE 0 :else filename ⎕FCREATE 0 :EndTrap
The best idea, however, is to check the file for already being created. In general it is a good idea to use error trapping only for extraordinary problems.
[]SIGNAL
Note that an event which is []SIGNALled can be intercepted with []TRAP but not :Trap If you execute this function:
∇test ⎕TRAP←501 'E' '⎕←''caught by ⎕TRAP''' :Trap 501 ⎕SIGNAL 501 :Else ⎕←'Caugth in :Else' :EndTrap ∇
you get this:
caught by ⎕TRAP
Ensure future trouble
A very easy way to create problems in the future is to do this:
:Trap 0 DoSomethingHere :EndTrap
This technique is called “silent trapping”. If something is going wrong, do not take care and do not tell anybody about it!
Switching Error Trapping on or off
When you use error trapping, make sure that you can switch off error trapping on a general level. The easiest way to implement this idea is something like this:
:Trap #.ErrorTrapFlag/0 DoSomething :else TakeCare EndTrap
If the flag is true, error trapping is active, if not, the “DoSomeThing” statement will fail if an error occurs. This makes is much easier to debug an application.
You might need a more sophisticated mechanism for this, because under some circumstances you want to switch off most but not all error trapping statements. For example, if you use a logging mechanism which is logging every user action for analyzing purposes, the code doing this may cause an interrupt itself, for example because the disk is full which holds the logging files. In such a case it might be inappropriate that the logging code breaks the application. Therefore, you might control this code with :Trap-statements.
In such a case it might be a good idea to control the behavior of the application on different levels for code which is really essential in terms of business logic, for example, and for code which is not essential.
But even in such a case the problem should be communicated. I found the idea of a watchdog application very useful, which, among other tasks, is listening to UDP telegrams on a particular port. An application in trouble can then send a telegram to the watchdog, telling about the problem. Using a type of error class, the client can tell the watchdog about the seriousness of the problem, and the watchdog can then decide to simply display it on it’s GUI or send a SMS message or/and an email to the admin.
Code
The attached application contains all the code needed to implement the ideas we have discussed. You can run it and follow the behavior with the Tracer. It is a workspace created with Dyalog APL/W Version 10.0