ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> activetcl
activetcl
Re: [Activetcl] split doing two different things
by Jeff Dinsmore other posts by this author
Apr 8 2008 8:35AM messages near this date
view in the new Beta List Site
Re: [Activetcl] split doing two different things | Re: [Activetcl] split doing two different things
Gentlemen, the Tcl community has long been regarded as helpful AND friendly...


A trivial example of "why" would be a CSV file (although we don't usually want to handle a C
SV by blindly splitting on commas). Excel produces CSV output without commas to terminate ea
ch line. Any program that's going to read CSV has to recognize this. Tcl's split behaves exa
ctly the same way. The split character is handled as a separator, not a terminator.

Example - a three, three row CSV spreadsheet generated by Excel:
1a,1b,
2a,2b,2c
,3b,

In this case, we need to interpret the trailing comma as indicative of the third, but perhap
s empty, field in order to keep the number of columns consistent.


From perldoc.perl.org SPLIT documentation:
"Splits the string EXPR into a list of strings and returns that list. By default, empty lead
ing fields are preserved, and empty trailing ones are deleted. (If all fields are empty, the
y are considered to be trailing.)"


This sounds like the possible existence of a traling, zero-length field is recognized, but d
isregarded by default. Is there a switch to tell it that the final field should be preserved
? I've not used Perl, but this (apparently) default behavior would require extra gyrations i
f your data source assumed Tcl's (or Excel's) model rather than Perl's.

Ultimately, whether a trailing separator ought to imply an empty field or not is entirely de
pendent on the data you're parsing. We just have to, as always, understand our data and how 
our chosen tool behaves.

I can see both behaviors being "better", easier, or more useful in certain situations - just
 like lots of other differences. Neither right nor wrong - just different based on your poin
t of view.

Thanks,

Jeff Dinsmore
Interfaces
Ridgeview Medical Center

-----Original Message-----
From: activetcl-bounces@[...].com [mailto:activetcl-bounces@[...].com] On Behalf Of Bahr, Mi
chael
Sent: Tuesday, April 08, 2008 9:48 AM
To: Enrico Herzke; activetcl@[...].com
Cc: activetcl@[...].com
Subject: Re: [Activetcl] split doing two different things

Thank you for that--I already knew it.  Just because the FM says so does not make it right. 
 For us who program in multiple languages split behaves one way and then behaves another way
 in Tcl.  It is frustrating to come to Tcl and find that, in this case, split adds an empty 
string to the list.  What is the purpose of that empty string?  Why do some of the other lan
guages not show it?  Is it an oversight on Tcl's part?  In the example there are 8 delimiter
s and split will capture the characters to the left of the delimiter up to the previous deli
miter or the start of the line.  So as the string is shrinking and the delimiters are gone w
hat is left?  This has bugged me for over 6 years in Tcl.

Mike...

>  -----Original Message-----
>  From: activetcl-bounces@[...].com
>  [mailto:activetcl-bounces@[...].com] On Behalf Of 
>  Enrico Herzke
>  Sent: Tuesday, April 08, 2008 10:08 AM
>  To: activetcl@[...].com
>  Cc: activetcl@[...].com
>  Subject: Re: [Activetcl] split doing two different things
>  
>  RTFM:
>  
>  NAME
>  split - Split a string into a proper Tcl list SYNOPSIS split string 
>  ?splitChars?
>  DESCRIPTION
>  Returns a list created by splitting string at each character that is 
>  in the splitChars argument. Each element of the result list will 
>  consist of the characters from string that lie between instances of 
>  the characters in splitChars. Empty list elements will be generated if 
>  string contains adjacent characters in splitChars, or if the first or 
>  last character of string is in splitChars. If splitChars is an empty 
>  string then each character of string becomes a separate element of the 
>  result list. SplitChars defaults to the standard white-space 
>  characters.
>  
>  
>  
>  Enrico
>  
>  -------- Original-Nachricht --------
>  > Datum: Tue, 8 Apr 2008 09:32:26 -0400
>  > Von: "Bahr, Michael" <mbahr@[...].gov>
>  > An: "Flavio Salgueiro" <flavio.salgueiro@[...].com>, "Jeff Hobbs" 
>  > <jeffh@[...].com>, "Gene Osteen" <gosteen@[...].com>
>  > CC: activetcl@[...].com
>  > Betreff: Re: [Activetcl] split doing two different things
>  
>  > Is the empty string normal behavior??  I think Gene is 
>  visualizing the
>  > string:
>  > 
>  > 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN 
>  > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
>  > 
>  > as this
>  > 314185798Ã?
>  > 59858Ã?
>  > 2004-11-19Ã?
>  > 2004-11-19 23:08:00Ã?
>  >  KPFS-IN NCAL-INTERFACEÃ?
>  > 121314Ã?
>  > 121314Ã?
>  > 121314Ã?
>  > 
>  > So there are 8 strings and 8 delimiters
>  > 
>  > Now Perl has been around for over 30 years and I have been 
>  programming 
>  > in Perl for over 13 years and use it as a standard for 
>  these things.  
>  > Perl's split returns 8 strings as expected as above.  So 
>  can someone 
>  > please explain why Tcl's split "sees" an empty string at the end of 
>  > the line?  I do not think this is normal behavior.
>  > 
>  > Perl code:
>  > $s = '314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN 
>  > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?';
>  > @items = split (/Ã?/, $s);
>  > foreach $item (@items) { print "$item\n"; } $size = scalar(@items); 
>  > print "Size of list = $size\n";
>  > 
>  > Mike...
>  > 
>  > > -----Original Message-----
>  > > From: activetcl-bounces@[...].com
>  > > [mailto:activetcl-bounces@[...].com] On Behalf Of 
>  > > Flavio Salgueiro
>  > > Sent: Tuesday, April 08, 2008 12:15 AM
>  > > To: Jeff Hobbs; Gene Osteen
>  > > Cc: activetcl@[...].com
>  > > Subject: Re: [Activetcl] split doing two different things
>  > > 
>  > > I think that the extra field he is seeing is due to the normal 
>  > > behavior of split. The trailing Ã? at the end of the list 
>  will give 
>  > > you an empty string.
>  > > My suggestion is to use 'string trim $myString Ã?' to get 
>  rid of the 
>  > > trailing delimiter which is giving the extra blank field 
>  at the end. 
>  > > You would get the same result if you tried 'llength [split a: :]' 
>  > > this would result in a count of two because split gives 
>  you "a and 
>  > > {}" not just "a".
>  > > 
>  > > Cheers,
>  > > 
>  > > Flavio
>  > > 
>  > > @-> -----Original Message-----
>  > > @-> From: activetcl-bounces@[...].com
>  > > [mailto:activetcl- @-> bounces@[...].com] 
>  On Behalf 
>  > > Of Jeff Hobbs @-> Sent: Monday, April 07, 2008 6:10 PM 
>  @-> To: Gene 
>  > > Osteen @-> Cc: activetcl@[...].com
>  > > @-> Subject: Re: [Activetcl] split doing two different things @-> 
>  > > @-> Gene Osteen wrote:
>  > > @-> >             I use the split command to tell me how many 
>  > > fields are in
>  > > @-> a
>  > > @-> > line of text sent to me. The data is character 
>  delimited with 
>  > > a Ã? as @-> the @-> > delimiter. Each line of text ends with a Ã?. 
>  > > Normally if I do a split I @-> > get one more list element than 
>  > > there are fields. I have a case where @-> > this is not 
>  happening. I 
>  > > am including 2 examples. The first is not @-> > working 
>  as expected 
>  > > the line of data is:
>  > > @-> >
>  > > @-> > 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? 
>  KPFS-IN @-> > 
>  > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
>  > > @-> >
>  > > @-> > For some reason the list length here is 8 which is also the 
>  > > number of @-> Ã?s @-> > in the line.
>  > > @->
>  > > @-> I don't have the same issue, getting the expected correct 
>  > > results with @-> 8.4.18 and 8.5.2:
>  > > @->
>  > > @-> (Tcl) 49 % set str {314185798Ã?59858Ã?2004-11-19Ã?2004-11-19
>  > > 23:08:00Ã? @-> KPFS-IN NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?}
>  > > @-> 314185798Ã?59858Ã?2004-11-19Ã?2004-11-19 23:08:00Ã? KPFS-IN @-> 
>  > > NCAL-INTERFACEÃ?121314Ã?121314Ã?121314Ã?
>  > > @-> (Tcl) 50 % regexp -all Ã? $str
>  > > @-> 8
>  > > @-> (Tcl) 51 % llength [split $str Ã?] @-> 9 @-> (Tcl) 52 % split 
>  > > $str Ã? @-> 314185798 59858 2004-11-19 {2004-11-19 23:08:00} { 
>  > > KPFS-IN @-> NCAL-INTERFACE} 121314 121314 121314 {} @-> 
>  @-> Jeff @-> 
>  > > _______________________________________________
>  > > @-> ActiveTcl mailing list
>  > > @-> ActiveTcl@[...].com
>  > > @-> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>  > > _______________________________________________
>  > > ActiveTcl mailing list
>  > > ActiveTcl@[...].com
>  > > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>  > > 
>  > _______________________________________________
>  > ActiveTcl mailing list
>  > ActiveTcl@[...].com
>  > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>  _______________________________________________
>  ActiveTcl mailing list
>  ActiveTcl@[...].com
>  To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>  
_______________________________________________
ActiveTcl mailing list
ActiveTcl@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Ridgeview Medical Center Confidentiality Notice: This email message, including any attachmen
ts, is for the sole use of the intended recipient(s) and may contain confidential and privil
eged information. Any unauthorized review, use, disclosure or distribution is prohibited. If
 you are not the intended recipient, please contact the sender by reply email and destroy al
l copies of the original message.
_______________________________________________
ActiveTcl mailing list
ActiveTcl@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Gene Osteen
Jeff Hobbs
Flavio Salgueiro
Michael Bahr
Jeff Hobbs
Michael Bahr
Dossy Shiobara
Michael Bahr
Dossy Shiobara
Michael Bahr
Jeff Hobbs
Enrico Herzke
Michael Bahr
Jeff Dinsmore
Michael Bahr
Gene Osteen
Jeff Dinsmore
Enrico Herzke

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved