Dear colleagues,
I am seeing data-dependent bugs in TIP457, both in the design as well as
in the current implementation, and I want to explain why I think this
should be fixed.
Sorry for the long text, but I have the feeling that my previous
concerns regarding this topic were not fully understood.
tl;dr: Please do not switch the interpretation of arguments based on the
string content of the actual argument.
1. *====== INTRODUCTION: Data Dependent Bugs ======================*
By data-dependent or data-driven bugs I mean a bug that is not exposed
during regular testing, but suddenly surfaces when a special input is
given to the program, be it accidentally or maliciously constructed.
Good (or rather bad) examples of this kind of bug are SQL injections,
unbraced exprs, or treating user input as a list. They are among the
worst possible bugs, because they remain undiscovered during regular
testing, even at 100% coverage.
Before coming to the concrete problem with TIP 457, let's have a look at
another example inside of Tcl.
2. *=========== Example: Data Dependent Bug in file & open ============*
The following code is a cut-down version of some real code that appeared
within the Tk file open dialog box:
foreach f [glob *] {
puts [file exists $f]
}
This code contains a data-dependent bug which is non-obvious at first
glance. Regular testing does not reveal the bug:
(test) 55 % ls
/Users/chris/Sources/test:
a b
(test) 56 % foreach f [glob *] { puts "$f [file exists $f]" }
a 1
b 1
This code fails, however, if an MS Word file is opened in the directory.
MS Word creates lock files with a name similar to this:
(test) 57 % touch ~mist
(test) 58 % foreach f [glob *] { puts "$f [file exists $f]" }
a 1
b 1
~mist 0
So glob enumerates a file which the file command ensemble is unable to
work on. Also "open" does not open this file returned by glob. To
correct the bug, the code has to be changed into:
foreach f [glob *] { puts "$f [file exists ./$f]" }
The reason for this behaviour is a (mis-)feature, not a bug, in the file
handling of Tcl. The leading tilde ~ of the file ~mist is interpreted as
the specifier for "home directory", and the "./" is the quoting
mechanism to prevent this expansion. This means that ALL Tcl programs
which open a file from a variable name are buggy:
(test) 98 % foreach f [glob *] { puts [open $f] }
file16
file17
user "mist" doesn't exist
This code
set fd [open $fn r]
is generally wrong, it should be
set fd [open ./$fn r]
unless $fn is an absolute path starting with a slash, or a Windows drive
letter, in which case $fn would be correct. So the final correct way to
open a file, given the file name in $fn, is
if {[string index $fn 0] eq "~"} {
set fd [open ./$fn r]
} else
set fd [open $fn r]
}
How often do you see this code, and how often do you see simply
open $fd r
?
3. *============ Data dependent bug in TIP 457 =====================*
We have seen in the previous paragraph, that a data dependent behaviour
in core commands, in that case all file handling commands, leads to a
data dependent bug in the code that people write building upon these
commands. The programmer using "open" is completely *unaware* that there
is an invisible switch inside the core file commands, and simply relies
on them, that they open just the file with the given name. The root
problem is an invisible switch inside "open" & friends.
The problem with TIP 457 is that it contains an invisible switch, which
decides if a given actual argument is a name or an argument, depending
on the string of the actual argument. If it starts with "-" it is
(almost) unconditionally treated as a name, whereas if not, it is an
argument.
This gives rise to problems when named arguments are mixed with
positional arguments. Consider the following attempt to simulate a
subset of lsort:
proc mylsort {{ordering -default ascii -switch {ascii dictionary
integer real}} list} {
puts "Sort mode $ordering"
lsort -$ordering $list
}
It works with usual input:
(test) 117 % set list {value some other thing}
value some other thing
(test) 118 % mylsort -dictionary $list
Sort mode dictionary
other some thing value
but fails when a dash is present [1]:
(test) 119 % set list {-value some other thing}
-value some other thing
(test) 120 % mylsort -dictionary $list
wrong # args: should be "mylsort
?|-ascii|-dictionary|-integer|-real|? list"
whereas the original lsort has no problem:
(test) 121 % lsort -dictionary $list
-value other some thing
It even breaks EIAS [1]:
(test) 122 % lindex $list 0
-value
(test) 123 % mylsort -dictionary $list
Sort mode dictionary
-value other some thing
The remedy to this as per TIP 457 is the -- switch:
(test) 124 % mylsort -dictionary -- $list
Sort mode dictionary
-value other some thing
This always works, but it puts the burden on the programmer to add -- at
all invocations. The current core "lsort" shows that the arguments are
never ambiguous, the assignment can be resolved by counting. The last
arg is the list, everything before is treated as an option. This
strategy always works if a fixed number of positional arguments is at
the beginning and/or end of the argument list. If core "lsort" were to
be implemented in TIP457, then ALL Tcl programs using lsort without the
-- switch would become buggy with a data dependent bug.
4. *=========== Possible solutions to the issue ====================*
In some cases, the assignment of an actual argument to the named or
positional parameters can be based on counting only. If there is a fixed
number of non-optional positional arguments at the beginning or the end
of the argument list, this can be assigned first and then the rest is
processed as options. Many core commands follow this strategy, e.g.
"lsort", "lsearch" or "canvas create object coordlist ?-options?". In
other cases it cannot be decided where to split between options and
values; -- is needed to separate the two. This is the case when both
parts, the named args and the positional args, are of variable length.
A possible solution would be first to detect, if the argument list can
be parsed unambiguously. In that case, create an interface like lsort.
Second, if it is determined, that there is ambiguity, this could raise
an error. Either the new proc-457 *disallows* mixing variable number of
arguments with named arguments, or it throws a runtime error *upon
invocation*, if "--" was not seen in the argument list. This would catch
the ambiguity early on, while the code is developed, not some time in
the future when accidentally a value with a leading dash is passed.
It might also be possible to come up with a set of rules, how to assign
the arguments in a way that the common cases do not bear a surprise.
While it might be tricky to derive these rules or prove that they work
as intended, it could be a way to formalise the sensible expectation. I
am still hoping that Kevin Kenny can find the rules he formulated some
time back.
In addition to making the code safer, the static binding of arguments
can also make the code faster. In general, string representations should
not be generated for data which is passed into a proc. The TIP code
already inspects the value to see if it is a list, but e.g. a huge
VecTcl array can not even have a strig representation, because it would
not fit into memory, but the current code would force it to compute the
string in many cases.
5. *================= Conclusion =================================
I hope that this long tractate will be read and contributes to the
development of TIP 457; all in al this TIP is not bad and there hase
been a lot of work going into the amalgamation of a syntax which seems
very Tclish and a good overall compromise. All I'm asking for is a way
to implement this in a way which does not tempt people into writing
buggy code.
Christian
[1] Note: this one is tricky to reproduce, because of the history
caching / literal caching. It only works in a fresh tclsh.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
Tcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tcl-core