Experiencing K4 First Hand
K4 is an esoteric language. That does not mean it is difficult. In our case - it matches the uses we put it too very well.
Let’s go off on a short tangent…
I hear many critiques of various computer languages from across the spectrum of experience, needs and quality control. “Language A is superior to Language B because” etc etc. Let’s be careful here: separate the language from its eco-system and then talk about each - you will get a better evaluation. Firstly the merits of the actual language relate to:
How it expresses the kinds of problems you need to solve. Someone doing numeric vector processing, or building user interfaces or working on communications etc, etc have very different “domains” of logic and thinking they work in.
How “correct” and/or “maintainable” your source code needs to be. Web UI developers have very different needs in this respect to say embedded avionics developers.
The perfomance of the generated code in time, energy and memory.
And lastly - the need to speak the language of your local tribe!
The eco-system provides you with something different. Anyone working in data-science or science in general will greatly appreciate powerful frameworks and libraries that do complex numeric work without them resorting to programming the basics. If they move to a different language, that power might not be with them. The same applies to embedded software, or database processing or web interfaces. There are frameworks everywhere, but often they are part of a particular language.
The conclusion here is simple: horses-for-courses. So we come back to K4: its our pony used to inexpensively run modest embedded hardware. We believe it provides us with:
A low-cost (to develop and to deploy) language.
It requires little memory and is fast for what it does.
It is very extensible by its dictionary based defining words (functions)
And we get the freedom to shape the language as we develop it.
So lets get into it.
K4 is like FORTH (https://en.wikipedia.org/wiki/Forth_(programming_language). To be clear though: its not an implementation of FORTH, nor of its cousin STOIC or for that matter any other particular stack based language. K4 is Bored Owl’s own language, borrowing heavily from this heritage, but deviating to suit the needs of our systems (and to some extent my own whims and interests in computer languages).
So what is a “stack-based language”?
A stack is a type of data structure in programming that implements a last-in, first-out operation. Imagine a stack of plates in a busy restaurant. The dishwasher puts plates onto the stack, and the servers remove them from the top. The last plate onto the stack is the first plate off the stack. In stack based languages we substitute pieces of data (numbers) for plates and do all our operations in this order.
A conventional programming language (or even your desktop calculator) understands: ‘1 + 2 =’ and returns ‘3’. This is ok until you say ‘1 + 2 * 3 =’ and you argue endlessly on social media if the answer is 9 or 7. (It’s 7 no arguments!) The rules of precedence apply ao that * and / happen before + and -. We can override those rules with parentheses: ‘(1 + 2) * 3 =’ and get 9 if we wish. Precedence complicates the life of the computer so it has to write out the whole expression and gather it up in the order of precedence. So what though - computers are fast and have tons of memory, its not a big deal. We need to elaborate some more.
K4 is neither an interpreting language nor a compiling one. To straighten out the jargon:
An Interpreter keeps its source code at hand (generally in memory) and when the program is run, the source code is interpreted (understood) and its instructions are executed.
A Compiler reads its source code once (often from a file on disk) and produces instructions for a hardware machine or a virtual machine to execute in memory.
This is a very great simplification as there are combinations of both in the myriad of available languages. The distinction has become less clear and less important with time. The main things to note are that interpreters provide the convenience of being available to run all the time, but slower and more memory bound while compilers require machines with editors and storage and the ability to run the binary after it has been compiled. Compilers and operating systems are good friends.
K4 (like all Forth type systems) does things another way.
It presents the user with a command line - like an old school command line operating system would. What you type at the command line is interpreted and executed. (If you are a modern language user this is like Python or Scala’s ‘repl’). However with a simple change to what we type, the input is “compiled” into a “word” and stored in memory.
What gets typed on this command line is a mixture of constants and words. There is no other concept for K4. The very simplest example of a K4 command line might be:
>1 1 + .
The output from this line would be:
2>
(I’ve included the prompt character ‘>’ to illustrate the K4 tells you its ready for input).
What happened here? The line was scanned and executed as each ‘token’ was recognised. All parts of a K4 line are separated by a space and each part between spaces is called a token. Tokens are evaluated - firstly to see if they represent a constant (decimal or hexadecimal number, or character constant) and if not to see if they are a “word”. Step-by-step the line was broken into tokens:
1 | One is a decimal number, a constant and it is “pushed” onto the stack |
1 | The same it also goes on the stack. (Now the stack contains two numbers 1 and 1) |
+ | This is not a number, so the dictionary is searched. It happens there is a built in word for ‘+’, so K4 proceeds by executing the word. The built-in code for ‘+’ removes two words from the stack, adds them together and pushes the result ‘2’ back onto the stack. |
. | This also is not a number, so the same process of finding the word in the dictionary is followed. The word ‘.’ has a built in executable that is called. The function of the ‘dot’ executable is to remove a number from the stack and print it on the console. See that the print function does not automatically add a new-line so the ‘>’ prompt appears immediately after the ‘2’ output. |
Another possibility is that we can emit a character using a character constant and the ‘emit’ word:
>\A emit
Output:
A>
In this case, the ‘\A’ token is interpreted as a character constant for the letter ‘A’ and this is pushed onto the stack. The next token ‘emit’ is found as a built-in word. The function of ‘emit’ is to print the single character from the top of the stack to the console. Hence the output ‘A’.
Note that the character constant in this interpretation is not a C-style escaped character: ‘\n’ means the letter ’n’. To emit a new-line we need to know the character code for a new-line which is 0x0A or 10. So the expression ’10 emit’ outputs a new-line command. We’ll use this a bit later.
The concept of a “word” is central to K4. A word is an executable unit and can be “built-in” or “defined”. Built-in words are compiled with K4 and they perform the fundamental operations like stack manipulation, arithmetic and I/O. Defined words are built upon the fundamentals using “colon”, “variable” and “constant” definitions.
Either built-in or defined, words exist in the “dictionary”. A word has a structure in memory like this:
| WORD: | Purpose: |
| Identifier | The name of the word |
| Flags | Special information about the word. There are eight available flags but presently only two are used. (eg: can its output be treated as a string?) |
| Previous Word | Points to the address of the previous word in the dictionary. (See the explanation of the dictionary below) |
| Executor | Points to a function that can execute this word. |
| Parameter | A single number for any useful purpose to the word |
| Extensions... | A word can continue on for as long as required and how this extended data is treated is up to the function that executes the word.
(Only the first 14 characters of a word are significant) |
The dictionary pointer in K4 points to the last word that was defined. A search of the dictionary simply starts at this pointer and if the identifier we are looking for does not match, we use the previous word pointer and move backwards through the dictionary until we find what we need. When the previous pointer refers to nothing our search ends unsuccessfully and we have to deal with an identifier we can’t treat as a word.
The current implementation of K4 has about 150 built in words. These provide:
Arithmetic and logical operations on the stack
Console I/O (printing etc.)
Stack manipulations
Dictionary and compilation management
Constant and variable definition
Control structures (if/then, while/until etc.)
String creation, alteration and control
File operations (when supported)
Interrupts (when supported)
System functions - specifically supported by the environment
All the built-ins are registered in the dictionary at start-up.
The words that are built-in cannot do everything though. Defining new words based on the basic functions is the at the heart of K4.
The Colon Definition
The console operations that we showed earlier are executed immediately. This isn’t much help towards writing actual programs. We need a mechanism for defining new functions the we can use over and over. For this we have a “special” word known as “the colon”. It is literally the colon character, which is executed as a word (from the built-in dictionary).
When ‘:’ is executed, the K4 interpreter starts compiling a new word. This means space is found at the end of the dictionary and each token is ‘compiled’ into this space. Before we discuss these technicalities, lets look a typical colon definition:
>: add10 10 + dup . ;
>
>4 add10
14>
Colon definitions (that don’t have errors) produce no console output. They just compile their definition.
Looking at this expression, we see that it ends with the token ‘;’. This is another special word that ends compilation. Immediately after the colon, we see the token ‘add10’ which is the identifier for our new word. This is the identifier used in the dictionary definition. Following that are the words that are executed later when the word is invoked.
The first token is a ‘literal’, meaning the value is loaded onto the stack - so 10 is pushed onto the stack. The next is ‘+’ which pulls the top two values from the stack, adds them and pushes the result back onto the stack. The word ‘dup’ is a built-in that takes the value on the top of the stack and pushes another copy. So if the stack was [ 7 ], then we execute ‘dup’ , the stack becomes: [ 7 7 ]. Finally we use the dot word again to print the value on the stack. Dot removes one value from the stack so the previous ‘dup’ leaves the result of our addition on the stack for us to keep using.
The word is used by typing ‘4 add10’. The definition is consulted, 10 is pushed, added to the 4 already on the stack and 14 is pushed back, duplicated and printed. If we typed:
>.
14>
The dot prints the top of stack which is still 14 because we duplicated the previous result. See that the stack retains what it contained from one word to the next regardless of whether we are simply interpreting or compiling. The stack will be cleared if an error occurs though (and we have the word ‘clear’ if we want to clear the stack manually). If we type (continuing on from the last):
14>.
0
Stack Underflow: - ?
? (. )
>
This execution of dot is a problem! There is nothing else on the stack, so the attempt to pop a value off the stack fails: the pop operation returns 0 and sets a stack exception (for an underflow). The consequential output is ‘0 \ Stack Underflow: - ?’, followed by ‘? (. )’. The first bit is the result of the failed pop and the second prints (between the parentheses) the point in the original input where the error occurred (at the ‘.’).
Lets define something different but useful:
>: .r . 10 emit ;
>
This defines a new word ‘.r’ that prints the top of stack and then emits a new line. This helps us see our output a bit more easily. We can redefine ‘add10’ thus:
>: add10 + dup .r ;
>4 add10
14
>
The subtle difference being the use of our newly defined ‘.r’ word that means the answer 14 is nicely separated from the next ‘>’ prompt. Now try:
>4 add10 2 * .r
14
28
>
And viola! We used the answer from add10 left on the stack, multiplied it by two and prints the extra answer ’28’. Note also that without the use of ‘.r’ the output would have been the confusing string ‘1428>’!
Concluding (for now)
This has been a very basic introduction to K4. Enough to start understanding expressions and words and the dictionary. In future posts we will talk about how K4 is implemented and we’ll start to talk about more of the built in functions and how we can handle strings - look forward to it.