[TCLCORE] (FYI, prev emails: Shimmering and large Tcl_Obj's)
by Jean-Claude Wippler other posts by this author
Jun 19 2006 6:44AM messages near this date
Re: [TCLCORE] Handles vs. internal rep's
|
[TCLCORE] Shimmering and large Tcl_Obj's
From: Jean-Claude Wippler <jcw@[...].com>
Date: June 10, 2006 17:11:36 GMT+02:00
To: Jeff Hobbs <JeffH@[...].com> , Miguel Sofer <mig@[...].edu>,
Don Porter <dgp@[...].net> , "Donal K. Fellows"
<donal.k.fellows@[...].uk>
Subject: Q about string representation
Hi,
I've got a question, hoping one of you wizzes could answer it.
Q: Am I allowed to *change* the string rep when a Tcl_Obj loses its
internal rep?
Semantically, I think it should be allowable: it's like having a
string " 123 ", turning it into an int, and then (suppose the string
rep ever got lost, bit of a long shot) converting it back to "123"
without the leading space. Even "expr" does this, apparently.
In my case, I have two different representations for an object, one
happens to be real short while the internal rep exists, the other is
potentially huge when it doesn't. I'm trying to show the short one
as much as possible, because otherwise interactive use and testing
becomes very hard, i.e. even "set a [myproc-returning-that-big-
object]" would be trouble in an interactive session which wants to
print out the string value.
My thought would be to alter bytes & length in my
FreeBlobInternalRep, just before freeing the internal rep.
Haven't tried it out yet, as it affects quite a bit of code.
Is there an easy YES or NO to this question?
-jcw
========================================================================
==
From: "Donal K. Fellows" <donal.k.fellows@[...].uk>
Date: June 11, 2006 17:34:29 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: Jeff Hobbs <jeffh@[...].com> , Miguel Sofer
<msofer@[...].net> , Don Porter <dgp@[...].net>
Subject: Re: Q about string representation
Jean-Claude Wippler wrote:
> Q: Am I allowed to *change* the string rep when a Tcl_Obj loses
> its internal rep?
If the object is shared, no. If the object is unshared (or, I suppose,
if you know you own all the references, but that's probably a tricky
condition to establish) yes. This rule is particularly applicable to any
object created as a literal (it happens a lot, and changing any rep
there could result in literals all over the thread getting changed
mysteriously) but since you don't really know what's using things for
what, it's a general rule.
> In my case, I have two different representations for an object,
> one happens to be real short while the internal rep exists, the
> other is potentially huge when it doesn't. I'm trying to show the
> short one as much as possible, because otherwise interactive use
> and testing becomes very hard, i.e. even "set a [myproc-returning-
> that-big- object]" would be trouble in an interactive session which
> wants to print out the string value.
As long as you have ownership of the only reference(s), you've got a
free hand to do whatever you want. The core object types equate
ownership with a refcount of less than 2.
> Haven't tried it out yet, as it affects quite a bit of code.
> Is there an easy YES or NO to this question?
To my mind, the easy answer is YES provided the side condition is
establshed. I don't know if you think that is an easy answer. ;-)
Donal.
========================================================================
==
From: miguel sofer <mig@[...].edu>
Date: June 10, 2006 18:58:41 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: Jeff Hobbs <JeffH@[...].com> , Don Porter
<dgp@[...].net> , "Donal K. Fellows"
<donal.k.fellows@[...].uk>
Subject: Re: Q about string representation
Reply-To: mig@[...].edu
Jean-Claude Wippler wrote:
> Hi,
> I've got a question, hoping one of you wizzes could answer it.
> Q: Am I allowed to *change* the string rep when a Tcl_Obj loses its
> internal rep?
> Semantically, I think it should be allowable: it's like having a
> string " 123 ", turning it into an int, and then (suppose the
> string rep ever got lost, bit of a long shot) converting it back to
> "123" without the leading space. Even "expr" does this, apparently.
If the string rep got lost, you have a pure int with no string rep.
If you then request a string rep and there is none, TclGetString()
calls the corresponding UpdateString() and creates a "canonical"
string rep. That canonical form is independent of the existence of an
intrep.
> In my case, I have two different representations for an object, one
> happens to be real short while the internal rep exists, the other
> is potentially huge when it doesn't. I'm trying to show the short
> one as much as possible, because otherwise interactive use and
> testing becomes very hard, i.e. even "set a [myproc-returning-that-
> big-object]" would be trouble in an interactive session which wants
> to print out the string value.
> My thought would be to alter bytes & length in my
> FreeBlobInternalRep, just before freeing the internal rep.
> Haven't tried it out yet, as it affects quite a bit of code.
> Is there an easy YES or NO to this question?
Ahhh - but this is different: you do not want to recreate a missing
stringrep, you want to change an existing one. Mmhhh ... I guess the
core could be changed to accomodate something like that, but right
now it is not a good idea.
Take a look at eg SetListFromAny in tclListObj.c, assume we pass it
one of your objs that has an internal rep and a short stringrep.
It first does TclGetString (and gets the short version); parses that
to get the list struct, and creates an internal List struct. It then
frees the old internal rep, causing the string rep to be rewritten to
long form, and finally hooks the new List intrep to the obj.
IIUC what you are trying to do, a simple [llength $yourObj] will
create an inconsistent listobj - it has both an intrep and a
stringrep, but they do not coincide. I am not sure what kind of
nastyness could ensue, but it is definitely ominous.
-- OT? --
I think that a better solution for the immediate problem might be to
change the interactive loop, so that it prints out a shortened
stringrep of the result object. Something like
% set a [string repeat a 10]
aaaaaaaaaa
% tcl::shortInteractiveOutput 5
5
% set a
aaaaa...
% set b aaaaa...
aaaaa...
% tcl::shortInteractiveOutput
5
% tcl::shortInteractiveOutput -1
-1
% set a
aaaaaaaaaa
% set b
aaaaa...
Of course, this could lead to some confusion: I *see* $a and $b as
equal, but are they? But this is already there:
% set a "123"
123
% set b "123 "
123
Worth a tip? I know I hate myself when I I initialise a long list and
forget to add ';puts {}' at the end ;)
% set a [lrepeat 100000 0]; puts {}
HTH
Miguel
========================================================================
==
From: Jean-Claude Wippler <jcw@[...].com>
Date: June 10, 2006 21:27:29 GMT+02:00
To: mig@[...].edu
Cc: Jeff Hobbs <JeffH@[...].com> , Don Porter
<dgp@[...].net> , "Donal K. Fellows"
<donal.k.fellows@[...].uk> , Steve Landers
<steve@[...].com> , Mark Roseman <mark@[...].com>
Subject: Re: Q about string representation
miguel sofer wrote:
> If the string rep got lost, you have a pure int with no string rep.
> If you then request a string rep and there is none, TclGetString()
> calls the corresponding UpdateString() and creates a "canonical"
> string rep. That canonical form is independent of the existence of
> an intrep.
Understood, thanks.
> Ahhh - but this is different: you do not want to recreate a missing
> stringrep, you want to change an existing one. Mmhhh ... I guess
> the core could be changed to accomodate something like that, but
> right now it is not a good idea.
[...]
> IIUC what you are trying to do, a simple [llength $yourObj] will
> create an inconsistent listobj - it has both an intrep and a
> stringrep, but they do not coincide. I am not sure what kind of
> nastyness could ensue, but it is definitely ominous.
Yep, that's the nastiness I'm afraid of ...
I'm hoping that the "but it hurts when" can be countered by a "then
don't do that". Applying a list op to these big objects, in fact
doing anything other than passing them around and applying only
specific commands to them, is going to be flagged as a big no-no.
Not unlike "opaque object pointers" in C, except that Tcl won't
protect you as a missing struct can.
Hm, maybe there is a way to flag certain Tcl_Obj types as not being
allowed to shimmer away their intrep? I know people have been
thinking about various tricks in the past.
> I think that a better solution for the immediate problem might be
> to change the interactive loop, so that it prints out a shortened
> stringrep of the result object. Something like
>
> % set a [string repeat a 10]
> aaaaaaaaaa
> % tcl::shortInteractiveOutput 5
> 5
> % set a
> aaaaa...
> % set b aaaaa...
> aaaaa...
> % tcl::shortInteractiveOutput
> 5
[...]
Yes, I've been thinking along those lines as well. In fact, it's my
fallback to hack tkcon and detect these big objects and do something
special.
But *showing* the string rep is actually the lesser problem. I need
to avoid even constructing one. I have objects which use a very
compact form in their internal representation, but could explode to
*any* size when converted to a string.
Obvious thought: use a handle naming system. But that brings back
referencing and garbage collection issues which I've been able to
completely get rid of by treating everything as true "values". I am
getting a lot of mileage in the best possible Tcl way by taking full
advantage of copy-on-write and dual-representations with my latest
database experiments. There are some very interesting options once
you fully align with Tcl_Obj's, even for massive database uses. Side-
effect free coding (i.e values and COW) has interesting connections
with database transactions and ACID properties.
The problem is that in my case a string rep is so expensive with
large "values" that most of the time you really don't want them at all.
> Worth a tip? I know I hate myself when I I initialise a long list
> and forget to add ';puts {}' at the end ;)
> % set a [lrepeat 100000 0]; puts {}
Maybe. As I said, a modified tkcon might be workable as well.
I'm appending a note I sent to Jeff recently, about other ideas about
how to avoid string reps and shimmering from becoming a problem with
large objects. It's related in the sense that it would help me avoid
that awful-big-string-in-the-middle (ABSITM?) aspect of using Tcl's
"hydra" model, in many practical cases.
None of this is crucial for what I do, btw. I can get everything
done with Tcl as is. It's just that the extension I'm working on
might be harder to play with and debug than necessary if the ABSITM
is not improved somehow.
Right now, I'm tempted to try having a short *or* a big string rep,
and switching on loss of internal rep. Unexpected switchovers are
detectable, because these two reps can never be exactly the same.
The list inconsistency you described is still nasty. There are some
known intrep-loss cases, such as events firing commands (and maybe
traces?), but I've got a workaround which appears to work well (wrap
the big object in a cmd alias when needed).
-jcw
PS. Am adding Mark and Steve to the list, because they've been
following my work.
========================================================================
==
From: Jean-Claude Wippler <jcw@[...].com>
Date: June 11, 2006 18:05:27 GMT+02:00
To: "Donal K. Fellows" <donal.k.fellows@[...].uk>
Cc: Jeff Hobbs <jeffh@[...].com> , Miguel Sofer
<msofer@[...].net> , Don Porter <dgp@[...].net>
Subject: Re: Q about string representation
Donal K. Fellows wrote:
> > Q: Am I allowed to *change* the string rep when a Tcl_Obj loses
> > its internal rep?
>
> If the object is shared, no. If the object is unshared (or, I suppose,
> if you know you own all the references, but that's probably a tricky
> condition to establish) yes.
[...]
> To my mind, the easy answer is YES provided the side condition is
> establshed. I don't know if you think that is an easy answer. ;-)
The "if" in your answer is the gotcha.
I don't know who holds a reference to the object. I'm returning an
object, and will give it a string rep that is in fact a "my string
rep is none of your business" placeholder. So the object can be
passed around at will. Using it in a puts will not be useful - in a
way, the object can't be passed out of the interp ("object" is a bit
overloaded here, a Tcl_Obj is a value).
But with my own code, I can look inside and get at the real stuff.
With two major benefits:
* the internal rep can be used if there, and re-constructed if not
* things clean up automatically
In a way, this is the opposite of EIAS: for these Tcl_Obj's,
everything is the internal rep. The drawback of not being able to
trivially pass it in or out of an interp is moot here - this is
database stuff, with its own efficient mechanisms for that (for
files, channels, or Tcl binary data).
I know I'm stretching things. Am trying to stay on the right side of
the cliff's edge.
-jcw
========================================================================
==
From: "Jeff Hobbs" <jeffh@[...].com>
Date: June 12, 2006 23:46:41 GMT+02:00
To: "'Jean-Claude Wippler'" <jcw@[...].com> , <mig@[...].edu>
Cc: "'Don Porter'" <dgp@[...].net> , "'Donal K. Fellows'"
<donal.k.fellows@[...].uk> , "'Steve Landers'"
<steve@[...].com> , "'Mark Roseman'" <mark@[...].com>
Subject: RE: Q about string representation
Jean-Claude Wippler wrote:
> Hm, maybe there is a way to flag certain Tcl_Obj types as not being
> allowed to shimmer away their intrep? I know people have been
> thinking about various tricks in the past.
Are you thinking you want one? Because currently there is no way to
make an
immutable object, although others have asked.
> > I think that a better solution for the immediate problem might be
> > to change the interactive loop, so that it prints out a shortened
> > stringrep of the result object. Something like
...
> Yes, I've been thinking along those lines as well. In fact, it's my
> fallback to hack tkcon and detect these big objects and do something
> special.
Ah, well, for tkcon you have:
tkcon linelength ?maxLength?
to auto-truncate long lines. Yet another of a gazillion hidden
options ...
It doesn't matter that we do string ops on the output, as you are
about to
print them anyways. BTW, there is also the magic $_ var for the last
result
in tkcon, which will contain the untruncated value for any next
operation.
> But *showing* the string rep is actually the lesser problem. I need
> to avoid even constructing one. I have objects which use a very
> compact form in their internal representation, but could explode to
> *any* size when converted to a string.
Yes, that's a separate issue. This is what the opaque handle is all
about
though, no? Or do you perhaps want to show shorter output?
> Maybe. As I said, a modified tkcon might be workable as well.
See above.
Jeff
========================================================================
==
From: Jean-Claude Wippler <jcw@[...].com>
Date: June 16, 2006 17:42:22 GMT+02:00
To: Jeff Hobbs <jeffh@[...].com>
Cc: <mig@[...].edu> , "'Don Porter'" <dgp@[...].net>,
"'Donal K. Fellows'" <donal.k.fellows@[...].uk> , "'Steve
Landers'" <steve@[...].com> , "'Mark Roseman'"
<mark@[...].com>
Subject: Re: Q about string representation
Jeff Hobbs wrote:
> > Hm, maybe there is a way to flag certain Tcl_Obj types as not being
> > allowed to shimmer away their intrep? I know people have been
> > thinking about various tricks in the past.
>
> Are you thinking you want one? Because currently there is no way
> to make an immutable object, although others have asked.
Immutable, yes - my obj's have a "fixed" meaning, i.e. values -
though the internal rep can change substantially due to caching and
such.
As I said, I am ok with losing the internal rep provided I get a
chance to adjust the string to be a loss-less representation. But I
don't want do this potentially expensive conversion when people just
want some handle-like string for interactive use.
> Ah, well, for tkcon you have:
> tkcon linelength ?maxLength?
> to auto-truncate long lines.
Yes, but:
> > But *showing* the string rep is actually the lesser problem. I need
> > to avoid even constructing one. I have objects which use a very
> > compact form in their internal representation, but could explode to
> > *any* size when converted to a string.
>
> Yes, that's a separate issue. This is what the opaque handle is
> all about though, no? Or do you perhaps want to show shorter output?
I'm trying to avoid traditional handles like "file12" because then I
lose the automatic cleanup. Which is very important because my
objects get constructed and returned all the time, in constructs such
as:
set x [foo [foo [foo ...] ... ] ...]
Then again, in a way I am treating these objects as handles. They
just happen to carry their real value as internal rep, not in some
hash table.
To summarize, say I have a var $v with a ref to my weirdo Tcl_Obj:
- it has an internal rep (otherwise it'd be a plain string)
- "puts $v" prints "obj<1234> ", i.e. a short handle
- "string length $v" prints 9
- "llength $v" prints 1, BUT...
it also destroys the internal rep
and it leads to str <-> list inconsistency
- so now, "string length $v" might return 1000000
- when used in my commands, the internal rep would be reconstructed
- after that, "puts $v" might print "obj<56789> "
So yes, these tricks are scary and perhaps dubious.
But the point is that these objects are only supposed to be used as
arguments for my own commands. The string rep is used for two things:
1) as a short readable handle, useful in interactive sessions and
for application level debugging
2) as loss-less string equivalent when the internal rep is lost
(a huge, totally unreadable, binary-data, thing)
I think all I need is the permission from you Tcl guys to alter a
string rep in the FreeInternalRep() code of my objects. Well, what
I'm really trying to establish is how much trouble I may be getting
myself into...
As Miguel pointed out, there can easily be an inconsistency between
the string rep and some *other* internal rep, such as lists. In the
case of lists, I could perhaps pick string reps which are never valid
as list (like "}blah{"). Then "llength" would fail instead of losing
my rep.
Is that the only big issue, or is this the "tip of a nightmare"?
-jcw
PS. I wouldn't be asking all this if it were not crucial. This
really is part of a new data manipulation extension which fits
extraordinarily well into Tcl in every other respect. The conclusion
of *years* of research! ;)
========================================================================
==
From: miguel sofer <mig@[...].edu>
Date: June 16, 2006 17:52:27 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: Jeff Hobbs <jeffh@[...].com> , "'Don Porter'"
<dgp@[...].net> , "'Donal K. Fellows'"
<donal.k.fellows@[...].uk> , "'Steve Landers'"
<steve@[...].com> , "'Mark Roseman'" <mark@[...].com>
Subject: Re: Q about string representation
Reply-To: mig@[...].edu
Jean-Claude Wippler wrote:
> Jeff Hobbs wrote:
> >> Hm, maybe there is a way to flag certain Tcl_Obj types as not being
> >> allowed to shimmer away their intrep? I know people have been
> >> thinking about various tricks in the past.
> >
> > Are you thinking you want one? Because currently there is no way
> > to make an immutable object, although others have asked.
> Immutable, yes - my obj's have a "fixed" meaning, i.e. values -
> though the internal rep can change substantially due to caching and
> such.
> As I said, I am ok with losing the internal rep provided I get a
> chance to adjust the string to be a loss-less representation. But
> I don't want do this potentially expensive conversion when people
> just want some handle-like string for interactive use.
> > Ah, well, for tkcon you have:
> > tkcon linelength ?maxLength?
> > to auto-truncate long lines.
> Yes, but:
> >> But *showing* the string rep is actually the lesser problem. I need
> >> to avoid even constructing one. I have objects which use a very
> >> compact form in their internal representation, but could explode to
> >> *any* size when converted to a string.
> >
> > Yes, that's a separate issue. This is what the opaque handle is
> > all about though, no? Or do you perhaps want to show shorter output?
> I'm trying to avoid traditional handles like "file12" because then
> I lose the automatic cleanup. Which is very important because my
> objects get constructed and
[...]
> Is that the only big issue, or is this the "tip of a nightmare"?
Tough to say.
Weird idea: how about giving your objs a weirdo string rep that has:
- in the string rep, your short handle + /0 + pointer to long rep
- in the length, just the length of your string rep
I think (but am not sure) that this kind of no-no is more likely to
avoid making big trouble. You'd then create the long part after the /
0 only when in risk of losing the intrep, and modify the string rep
accordingly.
I'll have to think some more about the implications, but chose to
throw the idea out there just in case I miss something in the analysis.
Cheers
Miguel
========================================================================
==
From: "Donal K. Fellows" <donal.k.fellows@[...].uk>
Date: June 19, 2006 10:35:25 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: Jeff Hobbs <jeffh@[...].com> , mig@[...].edu, "'Don Porter'"
<dgp@[...].net> , "'Steve Landers'"
<steve@[...].com> , "'Mark Roseman'" <mark@[...].com>
Subject: Re: Q about string representation
Jean-Claude Wippler wrote:
> But the point is that these objects are only supposed to be used
> as arguments for my own commands. The string rep is used for two
> things:
> 1) as a short readable handle, useful in interactive sessions and
> for application level debugging
> 2) as loss-less string equivalent when the internal rep is lost
> (a huge, totally unreadable, binary-data, thing)
> I think all I need is the permission from you Tcl guys to alter a
> string rep in the FreeInternalRep() code of my objects. Well,
> what I'm really trying to establish is how much trouble I may be
> getting myself into...
I've just studied the code and I can now describe the situation under
which it is OK to allocate a new string rep inside an internal rep
deletion function. Alas, the answer is "quite possibly never" because we
do not guarantee the order in which internal and string reps are deleted
when objects are cleaned up (because that allows us to tear down large
complex graphs of objects fairly easily), and there are times when the
internal rep destructor is called when the string rep is already invalid
(the field is reused to hold a linked list of objects to delete). But it
might be possible to get around this...
To find out if you're in the nasty recursive-delete case, you'll need to
use ObjInitDeletionContext() to get a PendingObjData* and look at the
deletionCount field of that structure. (That's all declared in tclInt.h
to be the correct thing for the threading configuration). If that's
zero, all that's happening is that the object is being transmuted (and
you can probably fiddle with the string rep), but if that's one or more,
the object itself is being deleted (and you should leave the string rep
well alone; it might not be valid at all). Messy, but I don't think
there's a practical alternative. This probably means that code using
this won't build into a DLL on Windows; I don't think that that platform
likes references to variables to be exported. (Maybe that in turn would
indicate a good RFE for a new "internal" API...)
That is the answer for 8.5. In 8.4, the string rep (if present) is
always deleted *after* the internal rep, but there's no way to find out
if the object is mutating or deleting (well, without doing non-portable
C stack hackery to find out what function is calling the internal rep
destructor).
Sorry for the complicated answer, but Obj cleanup *is* complicated. :-)
I hope this provides enough info for you to figure out if what you want
to do is possible. I want to support you, but what you're doing is
deeply complex!
(For what it is worth, the complexity of object deletion does mean that
8.5 can always destroy arbitrarily deeply nested lists, dicts, and other
"user" objects in a fixed amount of stack space. That got rid of one of
the trickier ways of crashing a Tcl interpreter!)
Donal (Should this discussion be on tcl-core? There are issues here of
real interest to other people interested in Tcl soothsaying...)
========================================================================
==
From: Jean-Claude Wippler <jcw@[...].com>
Date: June 19, 2006 11:28:23 GMT+02:00
To: Donal K. Fellows <donal.k.fellows@[...].uk>
Cc: Jeff Hobbs <jeffh@[...].com> , mig@[...].edu, "'Don Porter'"
<dgp@[...].net> , "'Steve Landers'"
<steve@[...].com> , "'Mark Roseman'" <mark@[...].com>
Subject: Re: Q about string representation
Donal,
Thanks for your detailed comment.
> > But the point is that these objects are only supposed to be used
> > as arguments for my own commands. The string rep is used for two
> > things:
> > 1) as a short readable handle, useful in interactive sessions and
> > for application level debugging
> > 2) as loss-less string equivalent when the internal rep is lost
> > (a huge, totally unreadable, binary-data, thing)
> > I think all I need is the permission from you Tcl guys to alter a
> > string rep in the FreeInternalRep() code of my objects. Well,
> > what I'm really trying to establish is how much trouble I may be
> > getting myself into...
>
> I've just studied the code and I can now describe the situation under
> which it is OK to allocate a new string rep inside an internal rep
> deletion function. Alas, the answer is "quite possibly never"
> because we
> do not guarantee the order in which internal and string reps are
> deleted
> when objects are cleaned up (because that allows us to tear down large
> complex graphs of objects fairly easily), and there are times when the
> internal rep destructor is called when the string rep is already
> invalid
> (the field is reused to hold a linked list of objects to delete).
> But it
> might be possible to get around this...
Hm. Might that be a useful convention? If the string rep does not
exist (perhaps length < 0 when bytes is re-used?) and a
FreeInternalRep proc is called, then this signals that it is being
called for object deletion and not because a new internal rep is
about to be set up?
> [...] This probably means that code using this won't build into a
> DLL on Windows; I don't think that that platform likes references
> to variables to be exported. (Maybe that in turn would indicate a
> good RFE for a new "internal" API...)
Is this about exporting variables across dll boundaries? Python does
that for several objects, most notably its "PyNone" nil object (which
you need the moment you want to return None in a C proc, so it's
heavily used).
> That is the answer for 8.5. In 8.4, the string rep (if present) is
> always deleted *after* the internal rep, but there's no way to find
> out
> if the object is mutating or deleting (well, without doing non-
> portable
> C stack hackery to find out what function is calling the internal rep
> destructor).
Ah - I hadn't even thought of deletion as being an important issue,
but of course it is: if the goal is to avoid constructing string
reps, then deletion might well be the most common case.
> Sorry for the complicated answer, but Obj cleanup *is*
> complicated. :-)
> I hope this provides enough info for you to figure out if what you
> want
> to do is possible. I want to support you, but what you're doing is
> deeply complex!
Much appreciated. If 8.5 is better suited for this, I could
experiment with such tricks in 8.5, and leave the current "normal"
behavior for 8.4 - it's workable as long as you avoid all string reps
(or at least have modestly sized datasets). I'm ok with having two
different extensions for 8.4 and 8.5, although I may regret the
deployment issues later ;)
> (For what it is worth, the complexity of object deletion does mean
> that
> 8.5 can always destroy arbitrarily deeply nested lists, dicts, and
> other
> "user" objects in a fixed amount of stack space. That got rid of
> one of
> the trickier ways of crashing a Tcl interpreter!)
There too, Python has had similar issues - I happened to be involved
when Christian Tismer tackled stack cleanup in very deeply nested
stackless exception tracebacks. What he ended up doing there was
that every 1000 frees or so (which nests C calls), the code would put
any remaining frees on a pending list which gets cleaned up in the
main interp loop (nesting again for 1000 frees, etc).
> Donal (Should this discussion be on tcl-core? There are issues here of
> real interest to other people interested in Tcl soothsaying...)
Good point, sorry for starting this off without considering that.
What would be the most convenient way to do so at this stage? Wrap
the entire thread so far into a big post for reference, summarize,
and continue there?
-jcw
========================================================================
==
From: "Donal K. Fellows" <donal.k.fellows@[...].uk>
Date: June 19, 2006 11:36:19 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: Jeff Hobbs <jeffh@[...].com> , mig@[...].edu, "'Don Porter'"
<dgp@[...].net> , "'Steve Landers'"
<steve@[...].com> , "'Mark Roseman'" <mark@[...].com>
Subject: Re: Q about string representation
Jean-Claude Wippler wrote:
> > Donal (Should this discussion be on tcl-core? There are issues
> > here of
> > real interest to other people interested in Tcl
> > soothsaying...)
> Good point, sorry for starting this off without considering that.
> What would be the most convenient way to do so at this stage?
> Wrap the entire thread so far into a big post for reference,
> summarize, and continue there?
Yes, that sounds fine.
Donal.
========================================================================
==
From: miguel sofer <mig@[...].edu>
Date: June 19, 2006 15:03:19 GMT+02:00
To: Jean-Claude Wippler <jcw@[...].com>
Cc: "Donal K. Fellows" <donal.k.fellows@[...].uk> , Jeff Hobbs
<jeffh@[...].com> , "'Don Porter'" <dgp@[...].net>,
"'Steve Landers'" <steve@[...].com> , "'Mark Roseman'"
<mark@[...].com>
Subject: Re: Q about string representation
Reply-To: mig@[...].edu
Jean-Claude Wippler wrote:
> Donal,
> Thanks for your detailed comment.
> >> But the point is that these objects are only supposed to be used
> >> as arguments for my own commands. The string rep is used for
> >> two things:
> >> 1) as a short readable handle, useful in interactive sessions and
> >> for application level debugging
> >> 2) as loss-less string equivalent when the internal rep is lost
> >> (a huge, totally unreadable, binary-data, thing)
Case (2) is shimmering, right? If the obj is unshared, the rep may be
lost forever anyway - the new owner may modify it in place, or
discard the obj after getting a new value (assuming the new owner is
a variable).
> >> I think all I need is the permission from you Tcl guys to alter
> >> a string rep in the FreeInternalRep() code of my objects. Well,
> >> what I'm really trying to establish is how much trouble I may be
> >> getting myself into...
> >
> > I've just studied the code and I can now describe the situation under
> > which it is OK to allocate a new string rep inside an internal rep
> > deletion function. Alas, the answer is "quite possibly never"
> > because we
> > do not guarantee the order in which internal and string reps are
> > deleted
> > when objects are cleaned up (because that allows us to tear down
> > large
> > complex graphs of objects fairly easily), and there are times when
> > the
> > internal rep destructor is called when the string rep is already
> > invalid
> > (the field is reused to hold a linked list of objects to delete).
> > But it
> > might be possible to get around this...
> Hm. Might that be a useful convention? If the string rep does not
> exist (perhaps length < 0 when bytes is re-used?) and a
> FreeInternalRep proc is called, then this signals that it is being
> called for object deletion and not because a new internal rep is
> about to be set up?
Right now, there is now way to tell from the obj fields which case
applies. When the object is being deleted and other deletions are
pending, 'bytes' holds a pointer to the next obj in the
deletionStack, but 'length' holds the original length. Of course we
could change that in the core (see below), but right now it is not
the case.
> > [...] This probably means that code using this won't build into a
> > DLL on Windows; I don't think that that platform likes references
> > to variables to be exported. (Maybe that in turn would indicate a
> > good RFE for a new "internal" API...)
> Is this about exporting variables across dll boundaries? Python
> does that for several objects, most notably its "PyNone" nil object
> (which you need the moment you want to return None in a C proc, so
> it's heavily used).
> > That is the answer for 8.5. In 8.4, the string rep (if present) is
> > always deleted *after* the internal rep, but there's no way to
> > find out
> > if the object is mutating or deleting (well, without doing non-
> > portable
> > C stack hackery to find out what function is calling the internal rep
> > destructor).
> Ah - I hadn't even thought of deletion as being an important issue,
> but of course it is: if the goal is to avoid constructing string
> reps, then deletion might well be the most common case.
> > Sorry for the complicated answer, but Obj cleanup *is*
> > complicated. :-)
> > I hope this provides enough info for you to figure out if what you
> > want
> > to do is possible. I want to support you, but what you're doing is
> > deeply complex!
> Much appreciated. If 8.5 is better suited for this, I could
> experiment with such tricks in 8.5, and leave the current "normal"
> behavior for 8.4 - it's workable as long as you avoid all string
> reps (or at least have modestly sized datasets). I'm ok with
> having two different extensions for 8.4 and 8.5, although I may
> regret the deployment issues later ;)
The way I see this, the core needs to be modified somewhat. Such
changes to the core seem unwarranted and dangerous in the 8.4 branch.
IMHO, that is. The needed changes are:
(1) DELETION:
* In 8.5, signal an invalid string rep with a negative 'length'; and
insure that the stringRep is always deleted first. This is adding one
line in the sources, and changing the position of another one.
AFAICT, safe too - but at least dkf should verify that. Note that
this would provide a fail-safe signal: intrep deletion with an
invalid stringrep is always an obj deletion (true now), and deletion
with a valid stringrep is always a shimmer (not true now).
* In 8.4 the intRep is always freed before the stringRep: on intRep
deletion, the stringRep is still valid (possibly NULL). The matter of
determining if this is indeed a deletion is indeed tough-to-
impossible; even with C-stack hackery: often the deletion is done
from macros and not functions.
(2) SHIMMERING: introduce the convention that all SetFromAny procs
need to Tcl_GetString() again *after* deleting the intRep, and before
attempting the conversion (note that this is easy enough to do in the
core, but will cause trouble with extensions that implement their own
obj types):
name = objPtr-> bytes;
if (name == NULL) {
name = Tcl_GetString(objPtr);
}
TclFreeIntRep(objPtr);
name = objPtr-> bytes; /* AGAIN, in case it changed. */
/* Now generate the new intRep from the stringRep in 'name' */
....
instead of the currently standard
name = objPtr-> bytes;
if (name == NULL) {
name = Tcl_GetString(objPtr);
}
/* Now generate the new intRep from the stringRep in 'name' */
....
TclFreeIntRep(objPtr);
This would allow safe shimmering. But once it did shimmer, you do
risk losing the stringRep too - for instance, on [append] to an
unshared obj.
OTOH, IIUC, your objs would not shimmer in correct code - nobody
should attempt to add 1 to them, or take the llength, or append to
them, or open them as a file, or whatever else. Maybe a reasonable
pseudo (non?)-solution would be to let the intrep deletion panic on
shimmering attempts (ie, intrep deletions that are not caused by obj
deletions; easy to detect with the first modif only). In that case,
you never need anything but the short stringrep. It'd be nice to have
a way to signal that error without a panic, but I do not see how it
can be done. I guess you'd have to take some measures to insure that
the data is not lost before panicing - saving it to file, and writing
out the file name to a log and/or the panic message. Or something
like that. Possibly moot, as IIUC persistence is what all this is
about so that this aspect is probably already taken care of.
Miguel
========================================================================
==
_______________________________________________
Tcl-Core mailing list
Tcl-Core@[...].net
https://lists.sourceforge.net/lists/listinfo/tcl-core
|