Re: [Activetcl] split doing two different things
by Michael Bahr other posts by this author
Apr 8 2008 8:53AM messages near this date
view in the new Beta List Site
Re: [Activetcl] split doing two different things
|
Re: [Activetcl] split doing two different things
Thank you Jeff for the explanation. It makes a little better sense. :-)
Mike...
> -----Original Message-----
> From: Dinsmore, Jeff [mailto:Jeff.Dinsmore@[...].org]
> Sent: Tuesday, April 08, 2008 11:37 AM
> To: Bahr, Michael; Enrico Herzke; activetcl@[...].com
> Cc: activetcl@[...].com
> Subject: RE: [Activetcl] split doing two different things
>
> Gentlemen, the Tcl community has long been regarded as
> helpful AND friendly...
>
>
> A trivial example of "why" would be a CSV file (although we
> don't usually want to handle a CSV by blindly splitting on
> commas). Excel produces CSV output without commas to
> terminate each line. Any program that's going to read CSV has
> to recognize this. Tcl's split behaves exactly the same way.
> The split character is handled as a separator, not a terminator.
>
> Example - a three, three row CSV spreadsheet generated by Excel:
> 1a,1b,
> 2a,2b,2c
> ,3b,
>
> In this case, we need to interpret the trailing comma as
> indicative of the third, but perhaps empty, field in order to
> keep the number of columns consistent.
>
>
> From perldoc.perl.org SPLIT documentation:
> "Splits the string EXPR into a list of strings and returns
> that list. By default, empty leading fields are preserved,
> and empty trailing ones are deleted. (If all fields are
> empty, they are considered to be trailing.)"
>
>
> This sounds like the possible existence of a traling,
> zero-length field is recognized, but disregarded by default.
> Is there a switch to tell it that the final field should be
> preserved? I've not used Perl, but this (apparently) default
> behavior would require extra gyrations if your data source
> assumed Tcl's (or Excel's) model rather than Perl's.
>
> Ultimately, whether a trailing separator ought to imply an
> empty field or not is entirely dependent on the data you're
> parsing. We just have to, as always, understand our data and
> how our chosen tool behaves.
>
> I can see both behaviors being "better", easier, or more
> useful in certain situations - just like lots of other
> differences. Neither right nor wrong - just different based
> on your point of view.
>
> Thanks,
>
> Jeff Dinsmore
> Interfaces
> Ridgeview Medical Center
>
> -----Original Message-----
> From: activetcl-bounces@[...].com
> [mailto:activetcl-bounces@[...].com] On Behalf
> Of Bahr, Michael
> Sent: Tuesday, April 08, 2008 9:48 AM
> To: Enrico Herzke; activetcl@[...].com
> Cc: activetcl@[...].com
> Subject: Re: [Activetcl] split doing two different things
>
> Thank you for that--I already knew it. Just because the FM
> says so does not make it right. For us who program in
> multiple languages split behaves one way and then behaves
> another way in Tcl. It is frustrating to come to Tcl and
> find that, in this case, split adds an empty string to the
> list. What is the purpose of that empty string? Why do some
> of the other languages not show it? Is it an oversight on
> Tcl's part? In the example there are 8 delimiters and split
> will capture the characters to the left of the delimiter up
> to the previous delimiter or the start of the line. So as
> the string is shrinking and the delimiters are gone what is
> left? This has bugged me for over 6 years in Tcl.
>
> Mike...
>
> > -----Original Message-----
> > From: activetcl-bounces@[...].com
> > [mailto:activetcl-bounces@[...].com] On Behalf Of
> > Enrico Herzke
> > Sent: Tuesday, April 08, 2008 10:08 AM
> > To: activetcl@[...].com
> > Cc: activetcl@[...].com
> > Subject: Re: [Activetcl] split doing two different things
> >
> > RTFM:
> >
> > NAME
> > split - Split a string into a proper Tcl list SYNOPSIS split string
> > ?splitChars?
> > DESCRIPTION
> > Returns a list created by splitting string at each
> character that is
> > in the splitChars argument. Each element of the result list will
> > consist of the characters from string that lie between instances of
> > the characters in splitChars. Empty list elements will be
> generated if
> > string contains adjacent characters in splitChars, or if
> the first or
> > last character of string is in splitChars. If splitChars is
> an empty
> > string then each character of string becomes a separate
> element of the
> > result list. SplitChars defaults to the standard white-space
> > characters.
> >
> >
> >
> > Enrico
> >
> > -------- Original-Nachricht --------
> > > Datum: Tue, 8 Apr 2008 09:32:26 -0400
> > > Von: "Bahr, Michael" <mbahr@[...].gov>
> > > An: "Flavio Salgueiro" <flavio.salgueiro@[...].com>,
> "Jeff Hobbs"
> > > <jeffh@[...].com>, "Gene Osteen" <gosteen@[...].com>
> > > CC: activetcl@[...].com
> > > Betreff: Re: [Activetcl] split doing two different things
> >
> > > Is the empty string normal behavior?? I think Gene is
> > visualizing the
> > > string:
> > >
> > > 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN
> > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
> > >
> > > as this
> > > 314185798Ã?
> > > 59858Ã?
> > > 2004-11-19Ã?
> > > 2004-11-19 23:08:00Ã?
> > > KPFS-IN NCAL-INTERFACEÃ?
> > > 121314Ã?
> > > 121314Ã?
> > > 121314Ã?
> > >
> > > So there are 8 strings and 8 delimiters
> > >
> > > Now Perl has been around for over 30 years and I have been
> > programming
> > > in Perl for over 13 years and use it as a standard for
> > these things.
> > > Perl's split returns 8 strings as expected as above. So
> > can someone
> > > please explain why Tcl's split "sees" an empty string at
> the end of
> > > the line? I do not think this is normal behavior.
> > >
> > > Perl code:
> > > $s = '314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN
> > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?';
> > > @items = split (/Ã?/, $s);
> > > foreach $item (@items) { print "$item\n"; } $size =
> scalar(@items);
> > > print "Size of list = $size\n";
> > >
> > > Mike...
> > >
> > > > -----Original Message-----
> > > > From: activetcl-bounces@[...].com
> > > > [mailto:activetcl-bounces@[...].com] On
> Behalf Of
> > > > Flavio Salgueiro
> > > > Sent: Tuesday, April 08, 2008 12:15 AM
> > > > To: Jeff Hobbs; Gene Osteen
> > > > Cc: activetcl@[...].com
> > > > Subject: Re: [Activetcl] split doing two different things
> > > >
> > > > I think that the extra field he is seeing is due to the normal
> > > > behavior of split. The trailing Ã? at the end of the list
> > will give
> > > > you an empty string.
> > > > My suggestion is to use 'string trim $myString Ã?' to get
> > rid of the
> > > > trailing delimiter which is giving the extra blank field
> > at the end.
> > > > You would get the same result if you tried 'llength
> [split a: :]'
> > > > this would result in a count of two because split gives
> > you "a and
> > > > {}" not just "a".
> > > >
> > > > Cheers,
> > > >
> > > > Flavio
> > > >
> > > > @-> -----Original Message-----
> > > > @-> From: activetcl-bounces@[...].com
> > > > [mailto:activetcl- @-> bounces@[...].com]
> > On Behalf
> > > > Of Jeff Hobbs @-> Sent: Monday, April 07, 2008 6:10 PM
> > @-> To: Gene
> > > > Osteen @-> Cc: activetcl@[...].com
> > > > @-> Subject: Re: [Activetcl] split doing two different
> things @->
> > > > @-> Gene Osteen wrote:
> > > > @-> > I use the split command to tell me how many
> > > > fields are in
> > > > @-> a
> > > > @-> > line of text sent to me. The data is character
> > delimited with
> > > > a Ã? as @-> the @-> > delimiter. Each line of text ends
> with a Ã?.
> > > > Normally if I do a split I @-> > get one more list element than
> > > > there are fields. I have a case where @-> > this is not
> > happening. I
> > > > am including 2 examples. The first is not @-> > working
> > as expected
> > > > the line of data is:
> > > > @-> >
> > > > @-> > 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã?
> > KPFS-IN @-> >
> > > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
> > > > @-> >
> > > > @-> > For some reason the list length here is 8 which
> is also the
> > > > number of @-> Ã?s @-> > in the line.
> > > > @->
> > > > @-> I don't have the same issue, getting the expected correct
> > > > results with @-> 8.4.18 and 8.5.2:
> > > > @->
> > > > @-> (Tcl) 49 % set str {314185798Ã?59858Ã?2004-11-19Ã?2004-11-19
> > > > 23:08:00Ã? @-> KPFS-IN NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?}
> > > > @-> 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN @->
> > > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
> > > > @-> (Tcl) 50 % regexp -all Ã? $str
> > > > @-> 8
> > > > @-> (Tcl) 51 % llength [split $str Ã?] @-> 9 @-> (Tcl)
> 52 % split
> > > > $str Ã? @-> 314185798 59858 2004-11-19 {2004-11-19 23:08:00} {
> > > > KPFS-IN @-> NCAL-INTERFACE} 121314 121314 121314 {} @->
> > @-> Jeff @->
> > > > _______________________________________________
> > > > @-> ActiveTcl mailing list
> > > > @-> ActiveTcl@[...].com
> > > > @-> To unsubscribe:
> http://listserv.ActiveState.com/mailman/mysubs
> > > > _______________________________________________
> > > > ActiveTcl mailing list
> > > > ActiveTcl@[...].com
> > > > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> > > >
> > > _______________________________________________
> > > ActiveTcl mailing list
> > > ActiveTcl@[...].com
> > > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> > _______________________________________________
> > ActiveTcl mailing list
> > ActiveTcl@[...].com
> > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> >
> _______________________________________________
> ActiveTcl mailing list
> ActiveTcl@[...].com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
>
> Ridgeview Medical Center Confidentiality Notice: This email
> message, including any attachments, is for the sole use of
> the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email
> and destroy all copies of the original message.
>
_______________________________________________
ActiveTcl mailing list
ActiveTcl@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Gene Osteen
Jeff Hobbs
Flavio Salgueiro
Michael Bahr
Jeff Hobbs
Michael Bahr
Dossy Shiobara
Michael Bahr
Dossy Shiobara
Michael Bahr
Jeff Hobbs
Enrico Herzke
Michael Bahr
Jeff Dinsmore
Michael Bahr
Gene Osteen
Jeff Dinsmore
Enrico Herzke
|