Blog-o! Notes from latte.ca

Thu, 10 Mar 2005
I'ld like to apologize right now for the length of this post, but there's something about someone learning a new tool that immediately helps them do something that I really enjoy.
Amy: I think I have a python problem.

Blake:	Ooh, I should be able to help. 

Amy:	Well, it's a problem that could be fixed by python.

Blake:	Close enough. 

Amy:	Ah, it goes beyond help.  I still have to figure out where to
start.  Like, do I even have python on this machine?  And how do you
read in something from a file?

Blake:	"python -v" 

Blake:	and:
 myFile = open( "filename.txt" )
 for line in myFile:
   print line 
 
Amy:	Holy, if comes up with a million lines of... stuff.
Filenames?

Amy:	But then it seems to be 2.3.

Blake:	Perhaps "python -V" 

Amy:	That's better!

Blake:	What did you want to do with the lines in a file? 

Amy:	Well, I want to take a bunch of pieces of data, like company
names and phone numbers and stuff, and stick it into a specific HTML
format.

Amy:	So I want to take in a file of data and output HTML.

Amy:	Do I want the HTML format hardcoded into the python or should
that be another file?

Blake:	Do you have a python prompt up? 

Blake:	Try typing :
 x = "Amy"
 print "Hello %s" % x 

Amy:	Ah hah.

Amy:	That's nice.

Blake:	So, I would do something like :
 myBigTemplate = """abc %s
 def %s
 ghi %s"""
 print myBigTemplate % ('1','2','3') 

Blake:	(triple-quoted strings can span more than one line.) 

Amy:	Oh, I see.  So I set up the formatting and then use 'print' to
spit out the HTML.

Blake:	Yup.  Oh, the other thing you can do is name the variables
you're replacing.  So this works:
 x = { 'name':'Amy', 'food':'apple'}
 print  "Hi %(name)s, do you want a %(food)s" % x 

Blake:	(Just don't forget the 's' after the closing bracket.) 

Amy:	So you think I should define the HTML format in the python
script itself?  It seems easier but somehow less clean.

Blake:	Yeah, for now.  You can always change it later.  :) 

Amy:	True.

Amy:	What's wrong with this:

Amy:	myTable = " %(coname)s "

coname = "Big Developer"

print myTable % coname

Blake:	You need to pass a dictionary in if you use the (name)
feature.  So, make a dictionary of variables, like x = {'coname':"Big
Developer"} 

Blake:	Then pass that in.  You could call your dictionary "values",
or something meaningful. 

Amy:	Ooh, that worked.

Amy:	Now I have to figure out how to get the dictionary from a
different file.

Blake:	What's the format of the file? 

Amy:	Well, I guess it could be something like "Company: Big Developer

Blake:	It could be? 

Amy:	Well, it's going to be exported from Access so I guess I could
define the format?

Amy:	It sounds like you can anyway.

Amy:	I'm going to work on the assumption you can export text in a
format like that, for now.

Blake:	Okay, although it might be easier to just assume that the
first thing on the line is the company... 

Amy:	The problem is that each company could have a varying number
of employees.

Blake:	What do you want to do in that case? 

Amy:	I want to iterate through all the names, adding a new row to
my table for each one.  I wonder if it would be possible to include
the number of names in the dataset.

Amy:	Although if they are the last thing on the line I guess you
could just go through them until you get to the EOL.

Blake:	You could, or you could repeat the company name for each employee. 

Amy:	But I don't want to repeat the company name in my HTML.

Blake:	Ahhh...  Okay, I understand.  How about:
 Big Co, "employee1, employee2, employee3", BooYeah 

Amy:	I'm not clear on the function of the "BooYeah".

Blake:	Neither am I.  It's just whatever other data you need in
there. 

Amy:	Ah hah.  The confusion is because the employees are the end of
the data.

Blake:	Oh, okay. 

Amy:	Yeah, I think if I could get a comma delimited file with
company information and employees on each line, I could parse it.
Assuming it was always correctly formatted. :P

Blake:	But if you're generating it from another program, it should be
correctly formatted. 

Amy:	It should.

Amy:	"In theory..."  but let's assume it will be.

Blake:	So try "import csv" at the Python prompt. 

Amy:	It didn't do anything.

Blake:	Sure it did.  Type "dir( csv )" or "help( csv )" to see what
it did. 

Blake:	(There's a webpage at
http://www.python.org/doc/2.3.2/lib/csv-contents.html that has more
readable contents of the help. ) 

Blake:	(And as another hint, you probably want to use the DictReader
class with a restkey of 'employee') 

Amy:	I will copy that and paste it somewhere and hopefully soon it
will mean something.

Blake:	Feel free to ask me questions about whatever doesn't make
sense. 

Amy:	Hah.  Part of the problem is that you're not working for me,
and part of the problem is that I don't even know how to begin asking
the questions.

Amy:	What's a sequence?

Amy:	As in "remaining data is added as a sequence keyed by the
value of restkey"?

Blake:	It's just a list. 

Amy:	Okay.

Blake:	x = [1,2,3] is a sequence. 

Amy:	Alright.  So I can iterate through it pretty easily?

Blake:	Yup. 

Blake:	I think I was wrong in my last explanation. 

Blake:	I think what they mean there is that you'll have a dictionary
with keys of "employee1", "employee2", etc... 

Amy:	Hm.

Amy:	I guess I could work with that.

Blake:	But a good way to find out would be to try running it on a
file, and printing it out. 

Amy:	How do I call DictReader?

Blake:	No, I take it back again.  I think my first explanation is
correct.  You'ld have an entry in your dictionary with a key of
'employees', and a value of ['Bill', 'Jane', 'Ted']. 

Amy:	Do I have to define something else to be a DictReader?

Blake:	First, you create one.:
 myReader = csv.DictReader( filename, ['company','whateverelse'], 'employees' ) 
 
Blake:	Then, you use it :
 for values in myReader:
     print template % values 
 
Amy:	It's too easy!

Blake:	http://www.python.org/doc/2.3.2/lib/node549.html 

Amy:	I give it the fishy eye.

Blake:	That's the beauty of Python.  If you think it's too easy,
you're on the right track.  :) 

Amy:	Ah hah, it's giving me an error!

Blake:	What's the error? 

Amy:	NameError: name 'data' is not defined

Amy:	Where 'data.csv' is the name of my file.

Blake:	What's the line you used? 

Amy:	myReader = csv.DictReader( data.csv, ['company','address','phone'], 'employees' )

Blake:	You need to put data.csv in quotes, too. 

Amy:	It doesn't say that in the manual!

Blake:	No, that's a syntax thing.

Blake:	Hey, can I post this to the weblog? 

Amy:	Uh, sure.

Amy:	How do I get the values in myReader to just output willy
nilly?  (I don't have a template yet, I just want to see if they're
reading in right).

Blake:	print values 

Amy:	I did that but it gave me another ... prompt.  I guess my
question is actually how do I end a for?

Blake:	Just hit return. 

Amy:	Oh, that really didn't work at all!

Amy:	Here is what I got:

Amy:	{'phone': None, 'company': 'd', 'address': None}
{'phone': None, 'company': 'a', 'address': None}
{'phone': None, 'company': 't', 'address': None}
{'phone': None, 'company': 'a', 'address': None}
{'phone': None, 'company': '.', 'address': None}
{'phone': None, 'company': 'c', 'address': None}
{'phone': None, 'company': 's', 'address': None}
{'phone': None, 'company': 'v', 'address': None}

Amy:	It's kind of funny how wrong it is.

Blake:	Oh, hah!  Yes.  Read the examples, and see what's different. 

Amy:	Yes, sensai.

Blake:	(Alternately, see what the companies spell if you read them
going down.) 

Amy:	Yeah, the filename.  That's the funny part.

Blake:	So you need to get it to read your file, instead of reading
the name of your file. 

Blake:	You can do that one of two ways.  Either use "open( filename
)", or "file( filename )".  They're the same, under the hood. 

Amy:	Oh, it worked!

Blake:	It did? 

Amy:	Yeah, when I asked it to print values it gave me this: 

Amy:	{'phone': ' 416-574-8372', 'company': 'Huge Builder',
'employees': [' Bob Smith', ' President', ' Joan Simpson', '
Vice-President Public Relations', ' Huw Thompson', ' Vice President
Technology'], 'address': ' 2002 Yonge St'} {'phone': ' 416-938-2837',
'company': 'Big Buildco', 'employees': [' Joanne Jones', ' CEO'],
'address': ' 19 King St'}


Amy:	Except for some reason it reordered the variables, but I don't
think that matters.

Blake:	No, cause you'll use them in whatever order you want in your
HTML template. 

Amy:	Yup.

Amy:	Cool!

Amy:	I could get this working before your dad gets back from his
golf game!

Blake:	The only other trick will be to get the employee data out.
For that I'ld use a separate template. 

Blake:	i.e. format the employees into a table first, and then add the
'employeeTable' to your dictionary. 

Amy:	A table?

Blake:	(To do that, assuming you've got the employee's formatted into
the variable "temp", you would write:
 values[ "employeeTable" ] = temp
 ) 

Blake:	An html table.  Or however else you want to add the employees. 

Amy:	Couldn't I format them after I format the rest of the stuff?

Blake:	You could, but formatting them before makes it easier to
insert them into the rest of the stuff. 

Amy:	Okay...

Amy:	The whole thing is going to have to be inside the "for values
in myReader", right?

Blake:	Mostly.  You could define your templates outside, but the
rest, yeah. 

Amy:	Okay.

Amy:	Why didn't this work:
for values in myReader:
...   print " %(company)s "


Amy:	It didn't return anything.

Blake:	Because you didn't tell it where to get the company from.
(You need the " % values" at the end of the print. 

Amy:	Oh.  So "values" is a real thing.

Blake:	At this point, I think you want to switch to a script. 

Amy:	Yeah, just a second. :)

Blake:	So that you can run it over and over again. 

Blake:	Yup, everything is a real thing.  There's very little magic in
Python. 

Amy:	That's going to take some getting used to.

Blake:	Hopefully it won't be too bad. 

Amy:	Oh, I can tell I'm serious now, i have two shell windows open.
:P

Blake:	Heh. 

Amy:	How do I do comments?

Blake:	# Like this. 

Amy:	Can I do line breaks wherever?

Blake:	Almost. 

Blake:	For now, let's say "Yes", and if you run into a problem,
you'll find out. 

Amy:	Okay.

Blake:	(And I can help you figure out where to put the break
instead.) 

Amy:	Oh my god.

Amy:	It worked.

Amy:	Just like that.

Blake:	Heh.  Now I'm definitely posting this to the weblog.  :) 

Blake:	What did you do for the employee names and titles? 

Amy:	I didn't do that part yet. :P

Blake:	Oh, okay. 

Amy:	I'm just excited I got the company to work.

Amy:	Now I must eat more.

Amy:	I'm running out of food.

Blake:	Heh.  I'll have you pulling your data from the live database
any second now. 

Amy:	Aiy!  Don't even say that!  Your dad would be so excited.

Blake:	It's really quite easy...  :) 

Amy:	Okay, now I'm stuck on the employees thing.  It's a list
called "employees"...  can I just do "for values in employees"?

Blake:	You can, but it wouldn't be quite what you wanted. 

Amy:	Ah.

Blake:	The quickest way I've found to get a useful list out of it is
the following line (assuming you've put the employees list into a
variable named "x": zip( [y for (i,y) in enumerate(x) if i%2==0], [y
for (i,y) in enumerate(x) if i%2==1] ) 

Blake:	But that's nigh-unreadable, so perhaps we should try to do it
an easier way, huh? 

Amy:	Holy wha?!

Blake:	See what I mean? 

Amy:	Yeah.  

Blake:	Ooh, how about this:
 names = [y for (i,y) in enumerate(x) if i%2==0]
 titles = [y for (i,y) in enumerate(x) if i%2==1] 

Amy:	First, isn't my employees list in a variable called
"employees"?

Blake:	Just a sec. 

Blake:	Yes, so replace 'x' with "values['employees']" 

Blake:	Or add the line:
 x = values['employees']
 before those other two bits of code. 

Amy:	And then what do "names" and "titles" end up as?  Lists?

Blake:	Yup. 

Amy:	Hm.  That's not really useful because I want to use them in
pairs, the name then the title.

Amy:	I guess I can use an index to refer to the nth item in each
list, and they shouls match up.

Blake:	Yes, but you could then write something like:
 for name,title in zip( names, titles ):
   print name, title 

Amy:	Should I look up zip or just ask you what it is?

Blake:	(zip takes two lists "[a1,a2,a3]" and "[b1,b2,b3]", and makes
a new list with both "[ (a1,b1), (a2,b2), (a3,b3) ]" 

Amy:	Oh, okay.

Blake:	enumerate (while I'm here), returns the items in a list, along
with their indices.  So you could have written:
 for i, name in enumerate( names ):
   print name, names[i], titles[i] 
 
Blake:	and "name" and "names[i]" should have the same value. 

Amy:	So basically I'm taking the original employee list, stripping
it into two lists, and then folding it back into a new list with a
slightly different format.

Blake:	Yeah.  An easier to use format. 

Blake:	I suppose you could do it all in one go, if you wanted...
Something like:
 for i,name in enumerate( values['employees'] ):
   if i%2 == 1:
     continue
   print "name =", values['employees'][i], " title =", values['employees'][i+1] 

Blake:	Which makes more sense to you? 

Amy:	No, I don't like doing things all in one go!

Amy:	I like doing things slowly and methodically.

Amy:	Hm.  It doesn't like "x = values['employees']"  

Amy:	It says values is not defined.

Blake:	What's your whole script look like? 

Blake:	(That line, in specific, should be in the :
 for values in myReader:
 block.) 

Amy:	Right.

Amy:	Well, it did something that time!

Blake:	Excellent.  Not what you wanted, I'm guessing. 

Amy:	Nope.

Amy:	But it did what I told it to do.

Blake:	Heh. 

Amy:	I have this: 
employeeRows = " %(name)s  %(title)s "

and then
  for name, title in zip( names, titles ):
    print employeeRows % name, title

Amy:	But I'm not passing in the name, title values right.

Blake:	Yes, since you're not using a dictionary, you can't use the
%(name)s format. 

Amy:	Do I just use %s>

Amy:	?

Blake:	So, you can do one of two things.  Stick with the %(name)s
format and switch to a dictionary, or switch to %s and pass them in in
the correct order. 

Blake:	Switching to a dictionary, by the way, is as easy as changing
the "% name, title" to "% locals()" 

Amy:	locals()?

Blake:	It's a link to the local variables. 

Blake:	Try putting a "print locals()" at various points in your
script. 

Amy:	So the local variables are just whatever it's working with
right now?

Blake:	Pretty much, yeah. 

Amy:	Hm.

Amy:	Now I have to figure out how to stick the employee HTML into a
variable so I can put it in the rest of the HTML later.

Blake:	What's the format of the html you want to stick it into? 

Blake:	(As a hint, instead of printing it, use += to append it to a
string...) 

Amy:	Pretty much what I had there, rows in a table.

Blake:	Let me know if you need any help with that, m'kay? 

Amy:	Do I have to define variables?

Blake:	Nope. 

Amy:	Hah, that was a trick question.

Blake:	(Well, kinda nope.) 

Amy:	Traceback (most recent call last):
  File "first.py", line 26, in ?
    employeeTable += employeeRows % locals()
NameError: name 'employeeTable' is not defined

Blake:	You can't just append to something that isn't there. 

Blake:	So start it with:
 employeeTable = "<table>" 

Amy:	That's better.

Amy:	I wonder what is wrong with my brain that I never remember to
put the close quote in.

Amy:	How do I tell it to put in a newline?

Blake:	"\n" 

Amy:	Or should I just triple-quote and put it in myself?

Blake:	That would work too. 

Blake:	Whatever looks nicer to you. 

Amy:	\n looks nicer

Amy:	Okay, I think I have the bones of it working.  Now I need to
put in the real formatting.

Blake:	Cool.  Could you show me some sample output before you do? 

Amy:	Sure.

Amy:	 Huge Builder 
  Bob Smith   President 
  Joan Simpson   Vice-President Public Relations 
  Huw Thompson   Vice President Technology 

 Big Buildco 
  Joanne Jones   CEO 

Blake:	No phone number? 

Amy:	I didn't do that yet.  I just assumed it would be about the
same as the company.

Blake:	(Just making sure it's not being overwritten by something
else...) 

Blake:	Yup.  It will be. 

Amy:	Actually I think I will do the """ thing for the HTML
templates, so it looks like regular HTML.

Amy:	Uhoh.

Blake:	What? 

Amy:	If a value is empty I want to leave out a row in my table.

Amy:	I will have to do that in an if in my "for values in
myReader", right?

Blake:	What do you mean by "if a value is empty"? 

Blake:	Oh, if you don't have the title for someone? 

Amy:	Well, more specifically, if the company doesn't have a suite
number.

Amy:	If it does I want a row with the suite number, if it doesn't I
don't want that row at all.

Blake:	Yeah.  Or you could build up a sub-template, like the
employees. 

Blake:	Have a line that looks like:
 values['suiteNumber'] = "<tr><td>%(suiteNumber)s<td><tr>" % values 

Amy:	Either way I will have to break everything else up into
"before Suite" and "after Suite" templates, though.

Blake:	Not really.  If you added the above line, then you could just
use "%(suiteNumber)s", and it would output the whole <tr><td> for you. 

Amy:	Oh, I see.

Amy:	What if suiteNumber is empty, though?

Blake:	Ah, yes, so you would have something like:
 if values['suiteNumber']:
   values['suiteNumber'] = "<tr><td>%(suiteNumber)s<td><tr>" % values  

Blake:	So, if it was empty, there would be no row, but if it wasn't
empty, it would get a row of its own. 

Amy:	Ah.  Okay.

Amy:	This is going to be really swell if it works.

Blake:	It will.  One way or another. 

Amy:	Uh oh.

Amy:	One of my data fields has commas in it.

Blake:	A-ha!  Did it mess up? 

Amy:	I didn't try it yet.  Should I quote the data with commas?

Blake:	You shouldn't have to.  The export thing should do it for you. 

Amy:	Shut up!

Amy:	Wait, what export thing?

Amy:	From access or whatever?

Blake:	Yeah. 

Amy:	Okay.  I'm not using real data yet, I'm just making up
fake(ish) data.

Blake:	Ah, right.  I would just assume that your real data is
correctly formatted. 

Blake:	(The rules for CSV quoting are kind of odd.) 

Amy:	If I have single quotes within a triple-quoted section, is
that okay?

Amy:	Or do I have to escape them or something?

Blake:	Yup. 

Blake:	You can also have single-quotes in a double-quoted section, or
double-quotes in a single-quoted section. 

Blake:	And you can triple-single or triple-double quote stuff, if you
needed a triple-whatever-the-other-quote-was in it. 

Amy:	Ah, wait.  I meant single-double-quote, not single quote.

Blake:	Whatever. 

Blake:	It all works. 

Amy:	Hm.

Amy:	It's whining about something.

Blake:	What's the complaint? 

Amy:	Traceback (most recent call last):
  File "second.py", line 71, in ?
    print myTable % values
ValueError: unsupported format character '"' (0x22) at index 29

Amy:	Perhaps it is the %?

Blake:	It's probably the %.  To get a % in the output, you need to
type %%. 

Amy:	Hah!

Amy:	Ta da!

Blake:	It all works? 

Amy:	Kind of, except some values aren't right. 

Amy:	But it's formatting mostly right.

Blake:	Hmm.  Cool. 

Blake:	Back in a sec. 

Amy:	It's not reading the CSV properly -- it's the problem with
commas inside fields I was talking about before.

Amy:	According to this it should work.
http://www.python.org/doc/2.3.2/lib/csv-fmt-params.html#csv-fmt-params

Blake:	No? 

Blake:	What's the line it's failing on? 

Blake:	And is this actual data, or hand-created data? 

Amy:	It's my fake data.

Amy:	Ah.  It didn't like my spaces after my commas.

Amy:	When I got rid of them it worked.

Amy:	Whoo!

Blake:	Hurray! 

Amy:	I CAN'T BELIEVE IT WAS SO EASY.

Amy:	You can put that in the blog.

Blake:	Oh, I will. 

Amy:	I'm sure I would have spent way more time looking for and
downloading and installing and testing a million graphical things, if
they even exist.

Amy:	Scripting is the shit.
[Posted at 17:14 by Blake Winton] link
Thu, 23 Dec 2004

I tried replying to this post from Peter Bowyer, but the comment submit form was behind an httpd-authenticated wall, so I figured I'ld post the reply here instead.

Have you tried posting to the Python Tutor list (tutor@python.org), and asking them why your code is so slow? You'll probably get some interesting responses. A couple of things I've noticed off the top: You could replace this x = 0
bins = []
for x in range(MAXSTEPS): bins.append(0)
with this bins = [0 for x in xrange(MAXSTEPS)] which should be faster for a couple of reasons. First, list comprehensions are faster than repeated calls to append (I believe). Second, xrange should be faster than range, because it just returns the numbers one at a time instead of creating the whole list at once.

Here's some code showing how much faster that one change is: >>> t1="""x=0
... bins=[]
... for x in xrange(20): bins.append(0)"""
>>> t2 = """bins = [0 for x in xrange(20)]"""
>>> time1 = timeit.Timer(t1)
>>> time2 = timeit.Timer(t2)
>>> time1.timeit()
10.322476353976072
>>> time2.timeit()
7.6572002255583129

As a side note, I ran: >>> t3 = """bins = [0] * 300"""
>>> time1 = timeit.Timer(t1)
>>> time3.timeit()
3.0881361995940324
which takes half the time of t2 to do 15 times as many entries... Interesting. I'll update this with the results of the other tests as they finish running...

Okay, another thought. You calculate the distance every time through the inner loop, which seems really slow. Perhaps you could keep track of the distance, and update it in the call to walk?

Update:
Here are the results from running all of them for 300 iterations. >>> t1="""x=0
... bins=[]
... for x in xrange(300): bins.append(0)"""
>>> t2 = """bins = [0 for x in xrange(300)]"""
>>> t3 = """bins = [0] * 300"""
>>> time1 = timeit.Timer(t1)
>>> time2 = timeit.Timer(t2)
>>> time3 = timeit.Timer(t3)
>>> time1.timeit()
144.48788944637977
>>> time2.timeit()
105.76589055161526
>>> time3.timeit()
3.0881361995940324

[Posted at 10:44 by Blake Winton] link