Tcl Expr Patch

The Patch vs. 8.2.0

The Patch vs. 8.4.0

Well, it happened again. Someone asks a perfectly innocent question, and the next thing I know I'm in touch with a deep vein of hostility and several paragraphs into a jeremiad on the evils of Tcl's expr command. It isn't Tcl's problem, really, that I hate writing code that uses expr. It's my problem. So here's my problem and here's my solution.

Expr Before

The expr command is the Tcl user's window into the world of computations with numbers. It implements a variety of unary and binary arithmetic operators, using the same infix syntax as one finds in C, FORTRAN, and dozens of other programming languages in common use. It also implements a collection of math functions, using the same prefix function call syntax as one finds in C, FORTRAN, and dozens of other programming languages in common use. Thus, we can write mathematical expressions like2*atan2(-1,0) or sqrt(1+1), give them to expr to evaluate, and we get the result which our experience as programmers has led us to expect.

But the story doesn't end there, because the expr command is embedded in a larger programming language called Tcl which doesn't much resemble other programming languages at all. So, when it came time to implement variable reference and procedure call in expr, Tcl's creator decided to use the Tcl syntax for variable reference, which is $variable, and to use the Tcl syntax for procedure call, which is [command arg1 ...], neither of which has anything to do with C, FORTRAN, or dozens of other programming languages in common use. Thus, if we want to write the recursive factorial function in Tcl, we write:

proc factorial {n} {
  expr {
    $n > 1 ?
     $n * [factorial [expr {$n-1}]] :
     1
  }
}

This is a result that only a Tcl programmer's experience would lead one to expect.

There were some reasons for this choice of hybrid expression syntax. Originally, the variable reference and function call syntax was handled by the main Tcl evaluation code, and all expr had to do was to sort out the result of the expression with these substitutions already done. However, this didn't work too well with the conditional expression operators, ?:, &&, and ||. In expressions involving these operators, only some parts of the expressions should be evaluated. So expr had to be taught to parse variable references and function calls and recursively call the Tcl evaluation routines, and expressions involving the conditional operators had to be quoted, as in our factorial example above, to prevent premature evaluation.

I suppose that there might be some argument made that consistency had a part in choosing the hybrid expression syntax, but it's a funny sort of consistency. One would have expected consistency to produce either a fully prefix, lisp-like syntax for expr, or a fully C-like syntax for expr. Either of those choices would have been fully consistent with some subset of the design decisions made for expr. But the hybrid syntax is neither consistent with C-like expressions, nor is it consistent with Tcl's syntax.

The distinctive function call syntax given to mathematical functions had a different motivation. Originally, all Tcl values were stored as strings. Distinguishing the mathematical functions allowed expr to maintain their operand and result values as numeric values, rather than converting the intermediate results back and forth between string and numeric representations. This meant that the mathematical functions were more efficient than Tcl commands implementing the same operations.

The original reason for using $variable and [ command arg1 ...] as expression syntax, to reuse the main Tcl parser and evaluator, actually went out the window as soon as the conditional operators were properly implemented. At that point, the syntax of expressions could have reverted to the syntax used in C and the expression parser, the expression evaluator, and the main Tcl parser all would have become simpler.

The original reason for distinguishing mathematical functions from general functions became moot when Tcl converted to the Tcl_Object representation for values. In a Tcl_Object, a value may be a string, a numeric, or some other type. These days the expr keeps all its intermediate values as Tcl_Object's with a numeric representation, and there is no implementation advantage for a mathematical function over a builtin Tcl command.

Meanwhile, Tcl has also become a compiled-on-the-fly language, and there's a new reason for expr operands to be quoted against evaluation by the main Tcl evaluator. The expression compiler wants its operands quoted because then it can see the syntactic boundaries in expressions and compile the expression for most efficient evaluation. If its operand is unquoted and contains variable references, like:

expr $a+$b

then evaluation of those variable references by the Tcl evaluator might produce something like:

expr 6-4+9/2

which would need to be reparsed at runtime in order to get the right answer. Quoting the operands of expr tells the expression compiler that it's okay to generate code that assumes that the results of variable and command substitution will make sense in the expression.

Tcl's expr implements a hybrid syntax, half of which is taken from programming languages like C, and half of which is taken from Tcl itself. But the half that is taken from Tcl must be quoted, at all costs, to protect it from evaluation by Tcl itself or the semantics of conditional operators will be violated and the compiler will generate suboptimal code.

Expr After

So, let's enhance Tcl's expr syntax so that it's more consistent with the C-like languages that started it on the path to inconsistency with Tcl so long ago. What's it take?

Well, it would be nice to reference Tcl variables without the dollar signs, so that

expr a

returns the value of the variable a if one exists, and an unknown variable error otherwise. That expression currently yields a "syntax error" which comes from in ParsePrimaryExpr in generic/tclParseExpr.c. Instead of the error, we'll just stuff two tokens into the parsed token stream that result in a variable reference, and unread the token that wasn't an open parenthesis.

It would also be nice to call a Tcl procedure with the same syntax that the mathematical functions use, so that

expr min(a,b)

returns the result of calling the Tcl command min if one exists, and an unknown command error otherwise. That expr currently yields an "undefined math function" error which comes from CompileMathFuncCall in generic/tclParseExpr.c. Instead of the error, we'll just compile a generic Tcl command call. Then the unknown function can catch
any undefined functions. Hmm, I guess that implements autoloaded math functions, too, or something very like them.

That, and a few lines to make the expr parser accept namespace qualifiers in identifiers, is all there is to the Tcl Expr Patch. The original expr syntax is fully supported, so code using the original syntax will continue to work just as it does.

Our factorial example,

proc factorial {n} {
  expr {
    $n > 1 ?
     $n * [factorial [expr {$n-1}]] :
     1
  }
}

can now be rewritten as,

proc factorial {n} {
  expr n > 1 ? n * factorial(n-1) : 1
}

What has happened? The quotes went away, because there are no variable or command substitutions to be protected from premature evaluation. The dollar signs went away. The square brackets turned into a conventional function call. Oh, look, the nested call to expr went away, too. Since the call to factorial is parsed as a math function, expressions in its argument get evaluated without explicitly calling expr. That wasn't part of the original spec, but it's reason enough to adopt the enhancement by itself. And the whole definition is now short enough to write on one line without messing up my html layout.

So, 294 lines of patch and we have an expr command with expressions that:

look like C expressions,
work like C expressions,
don't need to be quoted to evaluate correctly,
don't need to be quoted to compile efficiently,
autoload math functions,
evaluate expressions in function call argument
lists without explicit calls to expr,

And the changes have no effect on existing code. That's a pretty good yield for a fairly limited patch.

The only gotcha that I've discovered thus far is a new variation on quoting hell. If you call one of the Tcl commands which takes a variable name as a parameter using the new syntax, then you will need to quote the variable name. So,

set a 1; expr set(a,2)

will set the variable named "1" to the value "2", while

expr set({a},2)

will set the variable named "a" to the value "2".

The real reasons

I've given all sorts of reasons and half reasons for hacking on expr's syntax, but the real reasons I did it are quite simple. Firstly, the thought of writing anything else in expr's existing syntax makes me cringe. I can try to rationalize that by saying all sorts of nasty things about the existing syntax, but at root it's a personal, visceral reaction. Secondly, the more I thought about what needed to be done, the surer I was that it would be simple. And thirdly, it was simple.

A fellow traveler

John, who apparently wishes to remain anonymous, has contributed an version of tcl-expr-patch for the 8.4.0 release of tcl. Happy thanksgiving, 2002.

the entropy liberation front

Stuff