Learning Python
I'ld like to apologize right now for the length of this post, but
there's something about someone learning a new tool that immediately
helps them do something that I really enjoy.
Amy: I think I have a python problem. Blake: Ooh, I should be able to help. Amy: Well, it's a problem that could be fixed by python. Blake: Close enough. Amy: Ah, it goes beyond help. I still have to figure out where to start. Like, do I even have python on this machine? And how do you read in something from a file? Blake: "python -v" Blake: and: myFile = open( "filename.txt" ) for line in myFile: print line Amy: Holy, if comes up with a million lines of... stuff. Filenames? Amy: But then it seems to be 2.3. Blake: Perhaps "python -V" Amy: That's better! Blake: What did you want to do with the lines in a file? Amy: Well, I want to take a bunch of pieces of data, like company names and phone numbers and stuff, and stick it into a specific HTML format. Amy: So I want to take in a file of data and output HTML. Amy: Do I want the HTML format hardcoded into the python or should that be another file? Blake: Do you have a python prompt up? Blake: Try typing : x = "Amy" print "Hello %s" % x Amy: Ah hah. Amy: That's nice. Blake: So, I would do something like : myBigTemplate = """abc %s def %s ghi %s""" print myBigTemplate % ('1','2','3') Blake: (triple-quoted strings can span more than one line.) Amy: Oh, I see. So I set up the formatting and then use 'print' to spit out the HTML. Blake: Yup. Oh, the other thing you can do is name the variables you're replacing. So this works: x = { 'name':'Amy', 'food':'apple'} print "Hi %(name)s, do you want a %(food)s" % x Blake: (Just don't forget the 's' after the closing bracket.) Amy: So you think I should define the HTML format in the python script itself? It seems easier but somehow less clean. Blake: Yeah, for now. You can always change it later. :) Amy: True. Amy: What's wrong with this: Amy: myTable = " %(coname)s " coname = "Big Developer" print myTable % coname Blake: You need to pass a dictionary in if you use the (name) feature. So, make a dictionary of variables, like x = {'coname':"Big Developer"} Blake: Then pass that in. You could call your dictionary "values", or something meaningful. Amy: Ooh, that worked. Amy: Now I have to figure out how to get the dictionary from a different file. Blake: What's the format of the file? Amy: Well, I guess it could be something like "Company: Big Developer Blake: It could be? Amy: Well, it's going to be exported from Access so I guess I could define the format? Amy: It sounds like you can anyway. Amy: I'm going to work on the assumption you can export text in a format like that, for now. Blake: Okay, although it might be easier to just assume that the first thing on the line is the company... Amy: The problem is that each company could have a varying number of employees. Blake: What do you want to do in that case? Amy: I want to iterate through all the names, adding a new row to my table for each one. I wonder if it would be possible to include the number of names in the dataset. Amy: Although if they are the last thing on the line I guess you could just go through them until you get to the EOL. Blake: You could, or you could repeat the company name for each employee. Amy: But I don't want to repeat the company name in my HTML. Blake: Ahhh... Okay, I understand. How about: Big Co, "employee1, employee2, employee3", BooYeah Amy: I'm not clear on the function of the "BooYeah". Blake: Neither am I. It's just whatever other data you need in there. Amy: Ah hah. The confusion is because the employees are the end of the data. Blake: Oh, okay. Amy: Yeah, I think if I could get a comma delimited file with company information and employees on each line, I could parse it. Assuming it was always correctly formatted. :P Blake: But if you're generating it from another program, it should be correctly formatted. Amy: It should. Amy: "In theory..." but let's assume it will be. Blake: So try "import csv" at the Python prompt. Amy: It didn't do anything. Blake: Sure it did. Type "dir( csv )" or "help( csv )" to see what it did. Blake: (There's a webpage at http://www.python.org/doc/2.3.2/lib/csv-contents.html that has more readable contents of the help. ) Blake: (And as another hint, you probably want to use the DictReader class with a restkey of 'employee') Amy: I will copy that and paste it somewhere and hopefully soon it will mean something. Blake: Feel free to ask me questions about whatever doesn't make sense. Amy: Hah. Part of the problem is that you're not working for me, and part of the problem is that I don't even know how to begin asking the questions. Amy: What's a sequence? Amy: As in "remaining data is added as a sequence keyed by the value of restkey"? Blake: It's just a list. Amy: Okay. Blake: x = [1,2,3] is a sequence. Amy: Alright. So I can iterate through it pretty easily? Blake: Yup. Blake: I think I was wrong in my last explanation. Blake: I think what they mean there is that you'll have a dictionary with keys of "employee1", "employee2", etc... Amy: Hm. Amy: I guess I could work with that. Blake: But a good way to find out would be to try running it on a file, and printing it out. Amy: How do I call DictReader? Blake: No, I take it back again. I think my first explanation is correct. You'ld have an entry in your dictionary with a key of 'employees', and a value of ['Bill', 'Jane', 'Ted']. Amy: Do I have to define something else to be a DictReader? Blake: First, you create one.: myReader = csv.DictReader( filename, ['company','whateverelse'], 'employees' ) Blake: Then, you use it : for values in myReader: print template % values Amy: It's too easy! Blake: http://www.python.org/doc/2.3.2/lib/node549.html Amy: I give it the fishy eye. Blake: That's the beauty of Python. If you think it's too easy, you're on the right track. :) Amy: Ah hah, it's giving me an error! Blake: What's the error? Amy: NameError: name 'data' is not defined Amy: Where 'data.csv' is the name of my file. Blake: What's the line you used? Amy: myReader = csv.DictReader( data.csv, ['company','address','phone'], 'employees' ) Blake: You need to put data.csv in quotes, too. Amy: It doesn't say that in the manual! Blake: No, that's a syntax thing. Blake: Hey, can I post this to the weblog? Amy: Uh, sure. Amy: How do I get the values in myReader to just output willy nilly? (I don't have a template yet, I just want to see if they're reading in right). Blake: print values Amy: I did that but it gave me another ... prompt. I guess my question is actually how do I end a for? Blake: Just hit return. Amy: Oh, that really didn't work at all! Amy: Here is what I got: Amy: {'phone': None, 'company': 'd', 'address': None} {'phone': None, 'company': 'a', 'address': None} {'phone': None, 'company': 't', 'address': None} {'phone': None, 'company': 'a', 'address': None} {'phone': None, 'company': '.', 'address': None} {'phone': None, 'company': 'c', 'address': None} {'phone': None, 'company': 's', 'address': None} {'phone': None, 'company': 'v', 'address': None} Amy: It's kind of funny how wrong it is. Blake: Oh, hah! Yes. Read the examples, and see what's different. Amy: Yes, sensai. Blake: (Alternately, see what the companies spell if you read them going down.) Amy: Yeah, the filename. That's the funny part. Blake: So you need to get it to read your file, instead of reading the name of your file. Blake: You can do that one of two ways. Either use "open( filename )", or "file( filename )". They're the same, under the hood. Amy: Oh, it worked! Blake: It did? Amy: Yeah, when I asked it to print values it gave me this: Amy: {'phone': ' 416-574-8372', 'company': 'Huge Builder', 'employees': [' Bob Smith', ' President', ' Joan Simpson', ' Vice-President Public Relations', ' Huw Thompson', ' Vice President Technology'], 'address': ' 2002 Yonge St'} {'phone': ' 416-938-2837', 'company': 'Big Buildco', 'employees': [' Joanne Jones', ' CEO'], 'address': ' 19 King St'} Amy: Except for some reason it reordered the variables, but I don't think that matters. Blake: No, cause you'll use them in whatever order you want in your HTML template. Amy: Yup. Amy: Cool! Amy: I could get this working before your dad gets back from his golf game! Blake: The only other trick will be to get the employee data out. For that I'ld use a separate template. Blake: i.e. format the employees into a table first, and then add the 'employeeTable' to your dictionary. Amy: A table? Blake: (To do that, assuming you've got the employee's formatted into the variable "temp", you would write: values[ "employeeTable" ] = temp ) Blake: An html table. Or however else you want to add the employees. Amy: Couldn't I format them after I format the rest of the stuff? Blake: You could, but formatting them before makes it easier to insert them into the rest of the stuff. Amy: Okay... Amy: The whole thing is going to have to be inside the "for values in myReader", right? Blake: Mostly. You could define your templates outside, but the rest, yeah. Amy: Okay. Amy: Why didn't this work: for values in myReader: ... print " %(company)s " Amy: It didn't return anything. Blake: Because you didn't tell it where to get the company from. (You need the " % values" at the end of the print. Amy: Oh. So "values" is a real thing. Blake: At this point, I think you want to switch to a script. Amy: Yeah, just a second. :) Blake: So that you can run it over and over again. Blake: Yup, everything is a real thing. There's very little magic in Python. Amy: That's going to take some getting used to. Blake: Hopefully it won't be too bad. Amy: Oh, I can tell I'm serious now, i have two shell windows open. :P Blake: Heh. Amy: How do I do comments? Blake: # Like this. Amy: Can I do line breaks wherever? Blake: Almost. Blake: For now, let's say "Yes", and if you run into a problem, you'll find out. Amy: Okay. Blake: (And I can help you figure out where to put the break instead.) Amy: Oh my god. Amy: It worked. Amy: Just like that. Blake: Heh. Now I'm definitely posting this to the weblog. :) Blake: What did you do for the employee names and titles? Amy: I didn't do that part yet. :P Blake: Oh, okay. Amy: I'm just excited I got the company to work. Amy: Now I must eat more. Amy: I'm running out of food. Blake: Heh. I'll have you pulling your data from the live database any second now. Amy: Aiy! Don't even say that! Your dad would be so excited. Blake: It's really quite easy... :) Amy: Okay, now I'm stuck on the employees thing. It's a list called "employees"... can I just do "for values in employees"? Blake: You can, but it wouldn't be quite what you wanted. Amy: Ah. Blake: The quickest way I've found to get a useful list out of it is the following line (assuming you've put the employees list into a variable named "x": zip( [y for (i,y) in enumerate(x) if i%2==0], [y for (i,y) in enumerate(x) if i%2==1] ) Blake: But that's nigh-unreadable, so perhaps we should try to do it an easier way, huh? Amy: Holy wha?! Blake: See what I mean? Amy: Yeah. Blake: Ooh, how about this: names = [y for (i,y) in enumerate(x) if i%2==0] titles = [y for (i,y) in enumerate(x) if i%2==1] Amy: First, isn't my employees list in a variable called "employees"? Blake: Just a sec. Blake: Yes, so replace 'x' with "values['employees']" Blake: Or add the line: x = values['employees'] before those other two bits of code. Amy: And then what do "names" and "titles" end up as? Lists? Blake: Yup. Amy: Hm. That's not really useful because I want to use them in pairs, the name then the title. Amy: I guess I can use an index to refer to the nth item in each list, and they shouls match up. Blake: Yes, but you could then write something like: for name,title in zip( names, titles ): print name, title Amy: Should I look up zip or just ask you what it is? Blake: (zip takes two lists "[a1,a2,a3]" and "[b1,b2,b3]", and makes a new list with both "[ (a1,b1), (a2,b2), (a3,b3) ]" Amy: Oh, okay. Blake: enumerate (while I'm here), returns the items in a list, along with their indices. So you could have written: for i, name in enumerate( names ): print name, names[i], titles[i] Blake: and "name" and "names[i]" should have the same value. Amy: So basically I'm taking the original employee list, stripping it into two lists, and then folding it back into a new list with a slightly different format. Blake: Yeah. An easier to use format. Blake: I suppose you could do it all in one go, if you wanted... Something like: for i,name in enumerate( values['employees'] ): if i%2 == 1: continue print "name =", values['employees'][i], " title =", values['employees'][i+1] Blake: Which makes more sense to you? Amy: No, I don't like doing things all in one go! Amy: I like doing things slowly and methodically. Amy: Hm. It doesn't like "x = values['employees']" Amy: It says values is not defined. Blake: What's your whole script look like? Blake: (That line, in specific, should be in the : for values in myReader: block.) Amy: Right. Amy: Well, it did something that time! Blake: Excellent. Not what you wanted, I'm guessing. Amy: Nope. Amy: But it did what I told it to do. Blake: Heh. Amy: I have this: employeeRows = " %(name)s %(title)s " and then for name, title in zip( names, titles ): print employeeRows % name, title Amy: But I'm not passing in the name, title values right. Blake: Yes, since you're not using a dictionary, you can't use the %(name)s format. Amy: Do I just use %s> Amy: ? Blake: So, you can do one of two things. Stick with the %(name)s format and switch to a dictionary, or switch to %s and pass them in in the correct order. Blake: Switching to a dictionary, by the way, is as easy as changing the "% name, title" to "% locals()" Amy: locals()? Blake: It's a link to the local variables. Blake: Try putting a "print locals()" at various points in your script. Amy: So the local variables are just whatever it's working with right now? Blake: Pretty much, yeah. Amy: Hm. Amy: Now I have to figure out how to stick the employee HTML into a variable so I can put it in the rest of the HTML later. Blake: What's the format of the html you want to stick it into? Blake: (As a hint, instead of printing it, use += to append it to a string...) Amy: Pretty much what I had there, rows in a table. Blake: Let me know if you need any help with that, m'kay? Amy: Do I have to define variables? Blake: Nope. Amy: Hah, that was a trick question. Blake: (Well, kinda nope.) Amy: Traceback (most recent call last): File "first.py", line 26, in ? employeeTable += employeeRows % locals() NameError: name 'employeeTable' is not defined Blake: You can't just append to something that isn't there. Blake: So start it with: employeeTable = "<table>" Amy: That's better. Amy: I wonder what is wrong with my brain that I never remember to put the close quote in. Amy: How do I tell it to put in a newline? Blake: "\n" Amy: Or should I just triple-quote and put it in myself? Blake: That would work too. Blake: Whatever looks nicer to you. Amy: \n looks nicer Amy: Okay, I think I have the bones of it working. Now I need to put in the real formatting. Blake: Cool. Could you show me some sample output before you do? Amy: Sure. Amy: Huge Builder Bob Smith President Joan Simpson Vice-President Public Relations Huw Thompson Vice President Technology Big Buildco Joanne Jones CEO Blake: No phone number? Amy: I didn't do that yet. I just assumed it would be about the same as the company. Blake: (Just making sure it's not being overwritten by something else...) Blake: Yup. It will be. Amy: Actually I think I will do the """ thing for the HTML templates, so it looks like regular HTML. Amy: Uhoh. Blake: What? Amy: If a value is empty I want to leave out a row in my table. Amy: I will have to do that in an if in my "for values in myReader", right? Blake: What do you mean by "if a value is empty"? Blake: Oh, if you don't have the title for someone? Amy: Well, more specifically, if the company doesn't have a suite number. Amy: If it does I want a row with the suite number, if it doesn't I don't want that row at all. Blake: Yeah. Or you could build up a sub-template, like the employees. Blake: Have a line that looks like: values['suiteNumber'] = "<tr><td>%(suiteNumber)s<td><tr>" % values Amy: Either way I will have to break everything else up into "before Suite" and "after Suite" templates, though. Blake: Not really. If you added the above line, then you could just use "%(suiteNumber)s", and it would output the whole <tr><td> for you. Amy: Oh, I see. Amy: What if suiteNumber is empty, though? Blake: Ah, yes, so you would have something like: if values['suiteNumber']: values['suiteNumber'] = "<tr><td>%(suiteNumber)s<td><tr>" % values Blake: So, if it was empty, there would be no row, but if it wasn't empty, it would get a row of its own. Amy: Ah. Okay. Amy: This is going to be really swell if it works. Blake: It will. One way or another. Amy: Uh oh. Amy: One of my data fields has commas in it. Blake: A-ha! Did it mess up? Amy: I didn't try it yet. Should I quote the data with commas? Blake: You shouldn't have to. The export thing should do it for you. Amy: Shut up! Amy: Wait, what export thing? Amy: From access or whatever? Blake: Yeah. Amy: Okay. I'm not using real data yet, I'm just making up fake(ish) data. Blake: Ah, right. I would just assume that your real data is correctly formatted. Blake: (The rules for CSV quoting are kind of odd.) Amy: If I have single quotes within a triple-quoted section, is that okay? Amy: Or do I have to escape them or something? Blake: Yup. Blake: You can also have single-quotes in a double-quoted section, or double-quotes in a single-quoted section. Blake: And you can triple-single or triple-double quote stuff, if you needed a triple-whatever-the-other-quote-was in it. Amy: Ah, wait. I meant single-double-quote, not single quote. Blake: Whatever. Blake: It all works. Amy: Hm. Amy: It's whining about something. Blake: What's the complaint? Amy: Traceback (most recent call last): File "second.py", line 71, in ? print myTable % values ValueError: unsupported format character '"' (0x22) at index 29 Amy: Perhaps it is the %? Blake: It's probably the %. To get a % in the output, you need to type %%. Amy: Hah! Amy: Ta da! Blake: It all works? Amy: Kind of, except some values aren't right. Amy: But it's formatting mostly right. Blake: Hmm. Cool. Blake: Back in a sec. Amy: It's not reading the CSV properly -- it's the problem with commas inside fields I was talking about before. Amy: According to this it should work. http://www.python.org/doc/2.3.2/lib/csv-fmt-params.html#csv-fmt-params Blake: No? Blake: What's the line it's failing on? Blake: And is this actual data, or hand-created data? Amy: It's my fake data. Amy: Ah. It didn't like my spaces after my commas. Amy: When I got rid of them it worked. Amy: Whoo! Blake: Hurray! Amy: I CAN'T BELIEVE IT WAS SO EASY. Amy: You can put that in the blog. Blake: Oh, I will. Amy: I'm sure I would have spent way more time looking for and downloading and installing and testing a million graphical things, if they even exist. Amy: Scripting is the shit.