The Patch vs. 8.2.0

The Patch vs. 8.4.0

Well, it happened again. Someone asks a perfectly innocent question, and the next thing I know I'm in touch with a deep vein of hostility and several paragraphs into a jeremiad on the evils of Tcl's **expr** command. It isn't Tcl's problem, really, that I hate writing code that uses **expr**. It's my problem. So here's my problem and here's my solution.

**Expr Before**

The ** expr** command is the Tcl user's window into the world of computations with numbers. It implements a variety of unary and binary arithmetic operators, using the same infix syntax as one finds in C, FORTRAN, and dozens of other programming languages in common use. It also implements a collection of math functions, using the same prefix function call syntax as one finds in C, FORTRAN, and dozens of other programming languages in common use. Thus, we can write mathematical expressions like`2*atan2(-1,0)`

or `sqrt(1+1)`

, give them to **expr** to evaluate, and we get the result which our experience as programmers has led us to expect.

But the story doesn't end there, because the **expr** command is embedded in a larger programming language called Tcl which doesn't much resemble other programming languages at all. So, when it came time to implement variable reference and procedure call in **expr**, Tcl's creator decided to use the Tcl syntax for variable reference, which is **$variable**, and to use the Tcl syntax for procedure call, which is **[command arg1 ...]**, neither of which has anything to do with C, FORTRAN, or dozens of other programming languages in common use. Thus, if we want to write the recursive **factorial** function in Tcl, we write:

proc factorial {n} { expr { $n > 1 ? $n * [factorial [expr {$n-1}]] : 1 } }

This is a result that only a Tcl programmer's experience would lead one to expect.

There were some reasons for this choice of hybrid expression syntax. Originally, the variable reference and function call syntax was handled by the main Tcl evaluation code, and all **expr** had to do was to sort out the result of the expression with these substitutions already done. However, this didn't work too well with the conditional expression operators, **?:**, **&&**, and **||**. In expressions involving these operators, only some parts of the expressions should be evaluated. So **expr** had to be taught to parse variable references and function calls and recursively call the Tcl evaluation routines, and expressions involving the conditional operators had to be quoted, as in our **factorial** example above, to prevent premature evaluation.

I suppose that there might be some argument made that consistency had a part in choosing the hybrid expression syntax, but it's a funny sort of consistency. One would have expected consistency to produce either a fully prefix, lisp-like syntax for **expr**, or a fully C-like syntax for **expr**. Either of those choices would have been fully consistent with some subset of the design decisions made for **expr**. But the hybrid syntax is neither consistent with C-like expressions, nor is it consistent with Tcl's syntax.

The distinctive function call syntax given to mathematical functions had a different motivation. Originally, all Tcl values were stored as strings. Distinguishing the mathematical functions allowed **expr** to maintain their operand and result values as numeric values, rather than converting the intermediate results back and forth between string and numeric representations. This meant that the mathematical functions were more efficient than Tcl commands implementing the same operations.

The original reason for using **$variable** and **[ command arg1 ...]** as expression syntax, to reuse the main Tcl parser and evaluator, actually went out the window as soon as the conditional operators were properly implemented. At that point, the syntax of expressions could have reverted to the syntax used in C and the expression parser, the expression evaluator, and the main Tcl parser all would have become simpler.

The original reason for distinguishing mathematical functions from general functions became moot when Tcl converted to the Tcl_Object representation for values. In a Tcl_Object, a value may be a string, a numeric, or some other type. These days the **expr** keeps all its intermediate values as Tcl_Object's with a numeric representation, and there is no implementation advantage for a mathematical function over a builtin Tcl command.

Meanwhile, Tcl has also become a compiled-on-the-fly language, and there's a new reason for **expr** operands to be quoted against evaluation by the main Tcl evaluator. The expression compiler wants its operands quoted because then it can see the syntactic boundaries in expressions and compile the expression for most efficient evaluation. If its operand is unquoted and contains variable references, like:

expr $a+$b

then evaluation of those variable references by the Tcl evaluator might produce something like:

expr 6-4+9/2

which would need to be reparsed at runtime in order to get the right answer. Quoting the operands of **expr** tells the expression compiler that it's okay to generate code that assumes that the results of variable and command substitution will make sense in the expression.

Tcl's **expr** implements a hybrid syntax, half of which is taken from programming languages like C, and half of which is taken from Tcl itself. But the half that is taken from Tcl must be quoted, at all costs, to protect it from evaluation by Tcl itself or the semantics of conditional operators will be violated and the compiler will generate suboptimal code.

**Expr After**

So, let's enhance Tcl's **expr** syntax so that it's more consistent with the C-like languages that started it on the path to inconsistency with Tcl so long ago. What's it take?

Well, it would be nice to reference Tcl variables without the dollar signs, so that

expr a

returns the value of the variable **a** if one exists, and an unknown variable error otherwise. That expression currently yields a "syntax error" which comes from in **ParsePrimaryExpr** in **generic/tclParseExpr.c**. Instead of the error, we'll just stuff two tokens into the parsed token stream that result in a variable reference, and unread the token that wasn't an open parenthesis.

It would also be nice to call a Tcl procedure with the same syntax that the mathematical functions use, so that

expr min(a,b)

returns the result of calling the Tcl command **min** if one exists, and an unknown command error otherwise. That **expr** currently yields an "undefined math function" error which comes from **CompileMathFuncCall** in **generic/tclParseExpr.c**. Instead of the error, we'll just compile a generic Tcl command call. Then the **unknown** function can catch

any undefined functions. Hmm, I guess that implements autoloaded math functions, too, or something very like them.

That, and a few lines to make the **expr** parser accept **namespace** qualifiers in identifiers, is all there is to the Tcl Expr Patch. The original **expr** syntax is fully supported, so code using the original syntax will continue to work just as it does.

Our **factorial** example,

```
proc factorial {n} {
expr {
$n > 1 ?
$n * [factorial [expr {$n-1}]] :
1
}
}
```

can now be rewritten as,

proc factorial {n} { expr n > 1 ? n * factorial(n-1) : 1 }

What has happened? The quotes went away, because there are no variable or command substitutions to be protected from premature evaluation. The dollar signs went away. The square brackets turned into a conventional function call. Oh, look, the nested call to **expr** went away, too. Since the call to **factorial** is parsed as a math function, expressions in its argument get evaluated without explicitly calling **expr**. That wasn't part of the original spec, but it's reason enough to adopt the enhancement by itself. And the whole definition is now short enough to write on one line without messing up my html layout.

So, 294 lines of patch and we have an **expr** command with expressions that:

- look like C expressions,
- work like C expressions,
- don't need to be quoted to evaluate correctly,
- don't need to be quoted to compile efficiently,
- autoload math functions,
- evaluate expressions in function call argument

lists without explicit calls to**expr**,

And the changes have no effect on existing code. That's a pretty good yield for a fairly limited patch.

The only gotcha that I've discovered thus far is a new variation on *quoting hell*. If you call one of the Tcl commands which takes a variable name as a parameter using the new syntax, then you will need to quote the variable name. So,

set a 1; expr set(a,2)

will set the variable named "1" to the value "2", while

expr set({a},2)

will set the variable named "a" to the value "2".

**The real reasons**

I've given all sorts of reasons and half reasons for hacking on **expr**'s syntax, but the real reasons I did it are quite simple. Firstly, the thought of writing anything else in **expr**'s existing syntax makes me cringe. I can try to rationalize that by saying all sorts of nasty things about the existing syntax, but at root it's a personal, visceral reaction. Secondly, the more I thought about what needed to be done, the surer I was that it would be simple. And thirdly, it was simple.

**A fellow traveler**

John, who apparently wishes to remain anonymous, has contributed an version of tcl-expr-patch for the 8.4.0 release of tcl. Happy thanksgiving, 2002.