ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-tutor
python-tutor
Re: [Tutor] can anyone help me in solving this problem this is urgent
by Emile van Sebille other posts by this author
Nov 7 2009 10:05AM messages near this date
[Tutor] can anyone help me in solving this problem this is urgent | Re: [Tutor] can anyone help me in solving this problem this is urgent
On 11/6/2009 4:24 PM surjit khakh said...
>  Write a python program to read a text file named “text.txt” and show the 
>  number
>  of times each article is found in the file. Articles in the English 
>  language are the
>  words “a”, “an”, and “the”.
>  

Sounds like you're taking a python class.  Great!  It's probably the 
best programming language to start with.

First, it helps when asking questions if you mention what version of the 
language you're using.  Some features and options are newer.  In 
particular, there's a string method 'count' that isn't available in 
older pythons, while the replace method has been around at least ten years.

If you haven't already, the tutorial at 
http://docs.python.org/tutorial/index.html is a great place to start. 
Pay particular attention to section 3's string introduction at 
http://docs.python.org/tutorial/introduction.html#strings and section 7 
starting with 
http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
on files.

Implicit in this problem is identifying words in the text file.  This is 
tough because you need to take punctuation into account.  There's a neat 
tool in newer pythons such that, assuming you've read the file contents 
into a variable txt, allows you to say set(txt) to get all the letters, 
numbers, punctuation marks, and any other whitespace type characters 
embedded in the content.  You'll need to know these so that you can 
recognize the word regardless of adjacent punctuation.  In this specific 
case, as articles in English always precede nouns you'll always find 
whitespace following an article.  It would be a space except, of course, 
when the article ends the line and line wrap characters are included in 
the text file.

For example, consider the following text:

"""
SECTION 1.4. COUNTY PLANNING COMMISSION.

a. The County Planning Commission shall consist of five members. Each 
member of the Board of Supervisors shall recommend that a resident of 
his district be appointed to the Commission; provided, however, the 
appointments to the Commission shall require the affirmative vote of not 
less than a majority of the entire membership of the Board.
"""

Any a's, an's or the's in the paragraph body can be easily counted with 
the string count method once you properly prepared the text.

I expect the an's and the's are the easy ones to count.  Consider 
however the paragraph identifier -- "a." -- this is not an article but 
would likely be counted as one in most solutions.  There may also be a 
subsequent reference to this section (eg, see a above) or range of 
sections (eg, see a-n above) that further make this a harder problem. 
One possible approach may involve confirming the a noun follows the 
article.  There are dictionaries you can access, or word lists that can 
help.  The WordNet database from Princeton appears fairly complete with 
117k entries, but even there it's easy to find exceptions: "A 20's style 
approach"; "a late bus"; or "a fallen hero".

So, frankly, I expect that solutions to this problem will range from the 
naive through the reasonably complete to the impossible without human 
confirmation of complex structure and context.

For your homework, showing you can read in the file, strip out any 
punctuation, count the resulting occurances, and report the results 
should do it.

Emile

_______________________________________________
Tutor maillist  -  Tutor@[...].org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Thread:
Surjit Khakh
Emile van Sebille
Prasad Mehendale
Alan Gauld
Shashwat Anand

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved