Re: [TCLCORE] Variable access (was Re: [Tcl9-cloverfield] Parser)
by Neil Madden other posts by this author
May 8 2008 3:32PM messages near this date
[TCLCORE] Variable access (was Re: [Tcl9-cloverfield] Parser)
|
[TCLCORE] TIP 257 testing?
Hi Frédéric,
I'm not entirely sure of the context of this discussion, or whether
it is supposed to apply to Tcl itself, or just to Cloverfield (with
which I have only a passing familiarity). Therefore, please take my
comments with the appropriate grain of salt -- I may be missing the
point entirely!
On 7 May 2008, at 14:04, Frédéric Bonnet wrote:
> ...
> Variable access semantics.
> ==========================
>
> Variables can be accessed either by value or by reference.
> Regarding the
> latter, there are two main kinds of references:
>
> - Weak references, e.g. access-by-name.
> - Strong references, such as pointers.
I'm not sure whether I understand the distinction being made here. In
any system of reference there are 3 parts: (i) the reference (name,
pointer), (ii) the thing refered to, and (iii) some means of mapping
from (i)s to (ii)s. For instance, in a hashtable the three parts are:
(i) the hashtable keys, (ii) the objects contained in the hashtable
entries, (iii) the hashtable itself. Pointers are not special in this
regard, where the three parts are (i) the pointer (an integer,
basically), (ii) some object at some location in memory, (iii) the
(virtual) memory system which is conceptually like a very large
array. So from this point of view there is nothing strong or weak
about names vs pointers -- a pointer is just one kind of name in one
kind of naming system. The problems are basically the same for each
kind of system: e.g. having references that don't correspond to a
valid object (invalid pointer, non existent variable, etc), or
looking up a valid reference but in the wrong context (hashtable,
namespace, etc). One difference is that the "map" from pointers to
objects (i.e. the memory system) is usually fixed and global:
dereferencing a pointer doesn't usually depend on context, whereas a
string variable name obviously does. I'm not sure if this is the
distinction you are making between strong and weak references
however. To me, the distinction seems better applied to referencing
systems as a whole. For instance, some statically-typed languages
have a very strong notion of references: as with pointers, the thing
refered to is fixed and not dependent on context, but further it is
usually impossible to manufacture a reference independently of the
reference system (the type system prevents this), and it is possible
to accurately track what refers to what in the system and manage
lifecycles effectively. String names as references are obviously
"weaker" in this sense: it is possible to manufacture strings at
will, and perform all sorts of manipulations on them, which makes it
very hard to track what refers to what, or e.g. if there are still
some references left to some object.
[...]
> Strong references are bound to the internal value held by the variable
> at the time it is defined, and may be shared and passed around
> contexts.
> So a referenced value remains valid as long as a strong reference
> points
> to it. When the last one disappears the underlying value is garbage
> collected. Variables can be seen as named strong references.
OK, just to be clear: a variable (Var structure in terms of
implementation) holds a "strong" reference to a value (Tcl_Obj). That
variable is then named in one (or more) contexts/namespaces at the
script level -- i.e., there may exist several "weak" references to
the variable. Thus there are two levels of reference here: the Var-
> Tcl_Obj level and the Name (string)->Var level.
> Tcl
> ===
>
> At the script level, Tcl only provides weak references using variable
> names. Commands such as [set], [lappend] or [incr] access variable by
> name, and may create the variable if it doesn't exist (this is a
> recent
> feature of [incr]). Moreover, [global] and [upvar] can access
> variables
> by name from other contexts.
Clarification: [set] etc may create *a* variable if the name given
doesn't map to an existing variable in the current context (the
intended variable may exist in a different context).
>
> Tcl provides no way to create strong references at the script level,
> however it uses strong references internally using refcounted Tcl_Obj
> structures. Exposing strong references at the script level involves
> hacks with object types that are prone to failure because of
> shimmering.
Yes. "Everything is a string" is fundamentally incompatible with the
idea of strong references, as far as I can tell (i.e. it is
impossible to create a properly abstracted system of reference using
strings as references). Tcl_Obj internal rep hacks are not just prone
to failure, but also break EIAS.
> What Cloverfield needs
> ======================
>
> Cloverfield needs both kinds of references. The original
> Tridekalogue introduces $& as a syntax for strong references. But it
> must also keep existing Tcl semantics regarding variable names, as
> they
> are an essential part of its philosophy. Variable names have no
> alternative when using introspection or designing mini-languages,
> a field where Tcl shines.
I need some clarification about what $& notation introduces exactly.
As mentioned earlier, there are two levels to Tcl (and presumably
Cloverfield) variable references: the "strong" reference of a Var to
the value it refers to, and the "weak" references of any string names
that refer to that var in various contexts. From this, my expectation
would be that $&foo syntax is a way of denoting the var structure
rather than the value that it refers to. So e.g. while [foo $bar]
means 'call foo passing the value contained in the variable refered
to by the name "bar" in the current context', the new syntax [foo
$&bar] means something like 'call foo passing the variable refered to
by the name "bar" in the current context' -- where "passing" a
variable would presumably mean linking the first parameter of foo to
the same variable that bar refers to, i.e. that:
proc foo v { ... }
foo $&bar
is roughly equivalent to:
proc foo vName { upvar 1 $vName v; ... }
foo bar
Is that correct?
> [...]
> Proposals
> =========
>
> Andy Goth proposed to allow references to non-existing variables. This
> has the effect of delaying the resolution of the variable until the
> first access. This means that they are weak references. Alternatively,
> Andy also proposed a new syntax for weak references using the @
> prefix.
What then is the difference between these weak references and just
names? Are they looked up in a different context/namespace?
> I proposed to keep references strong, and extend grouping rules of {}
> and () so that variable syntax doesn't clash with other rules. But
> this
> raises other questions, notably on the order of substitution, so it
> doesn't solve the whole issue.
>
> Alexandre Ferrieux raised a good point about implementation
> considerations, namely: "explicit reference syntax is a good thing ...
> for the bytecode (or whatever) compiler". Indeed, access-by-name is a
> performance killer in the sense that the compiler has no clue
> whether a
> given argument can be used as a variable name by the called
> command. So
> this requires ugly hardwiring of commands such as [set] into the
> bytecode.
For this particular application, the best solution then would seem to
be having the syntax be part of the declaration of the proc itself so
that the byte-code compiler knows the intention by just looking at
that rather than looking at usage. i.e.:
proc foo {&bar ..} { ... }
I believe there is code on the wiki that implements this as sugar for
[upvar]. If you adopted this convention in Cloverfield instead of
explicit upvar (or perhaps in addition to), then that would seem to
solve the compiler optimisation issue, wouldn't it?
[...]
To summarise, my point is basically that Tcl already has "strong"
references in the form of the Var structures that underlie variables,
and it has script-accessible means of manipulating these and linking
them via [set], [upvar] etc. I don't therefore think that adding a
new notion of reference at the script level would clarify things, but
rather just complicate them. Some things I think could be done:
1.) Provide some syntactic support for linking variables such as that
outlined above, which would perhaps make a common idiom (upvar 1 ..)
slightly easier to grasp and use, and possibly allow for further
optimisations.
2.) Generalise the notion of what can be referred to by a Var to that
of a general "resource" -- i.e., not just Tcl_Objs, but also "opaque"
entities like commands, channels, objects, etc. This would allow for
finer control over the lifetime/scope of these resources (e.g. proc-
local commands or channels), and could be extended to encompass
things like general reference-counting of resources and perhaps even
a generalised serialisation framework (i.e. $foo means "attempt to
serialise the resource referred to by the var named foo").
Anyway, I hope some of that is of interest or help to you. I'm not
sure if this is particular on-topic for TCLCORE, but I'm not a member
of the Cloverfield list -- feel free to follow up to my personal
email if you want to discuss anything further.
-- Neil
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Tcl-Core mailing list
Tcl-Core@[...].net
https://lists.sourceforge.net/lists/listinfo/tcl-core
Thread:
fbonnet
Neil Madden
|