Archive for category Code

Week 127

(idea cribbed from BERG, week count done by Wolfram Alpha because I am too lazy)

"[T]he beauty of reading a page of de Selby is that it leads one inescapably to the happy conclusion that one is not, of all nincompoops, the greatest."
The Third Policeman*

Mix of a week. Started with some edits to the XML parser I wrote for Financial Institution Client as part of a project that decrypts (GPG) a large set of XML documents, skims them for the parts we're interested in, dumps the relevant bit into an Expression Engine database and then transforms all that via XSL to display a customizable dashboard (jQuery UI) on the front end. How's that for a keyword-rich blog post? Tangentially speaking, this week reminded me that of all the languages, technologies, whatever you like that I work in, XSL is probably the easiest one to make a great mess in. It looks just like XML and HTML, how hard could it be? A little bit of XPath knowledge and you're good to go. To make a complete mess. When you write an infinite loop in a normal language, it's pretty obvious: you sit there a while, the computer starts to get noisy, the lights go dim. After a few dozen times, you realize what you've done. XSL is (like) a functional programming language. And with all that recursion, it's easy to make a computer do something Big Number of times. Just harder to spot.

Also did some final-mile edits on a social network application from Slim Kiwi built on top of Pinax (and Django).  Hoping it's truly "final-mile" as the project is really cool and I'm quite proud of it. There's a ton of geolocation and mapping going on and a fair number of other bright things happening (the bright ideas being supplied by other team members and the bright implementation by Django, obviously).  I was able to roll some of the geo search and Google Maps integration right into a site for Community Trust Bank, a project from Lightfin Studios (you can see the location stuff at the branch & ATM finder). That's the second bank site I've built with Lightfin, the other being the much-closer-to-home (but harder to spell) Piscataqua Bank (built in .NET) which got some updates this week as well.

At the end of last week I started to ramp up on my second Django site with Lightfin. Not much to say about it yet except that it integrates with a third-party API which reminds me of one thing: I hate SOAP (the overly-verbose web service format, not the cleaning product so beloved by my ancestors they enslaved a leprechaun to endorse it). Please don't ever use it. I'd rather parse faxes by hand. If you're stuck dealing with SOAP in Python, Suds seems to be the best parsing package out there. I'm sure there was other stuff going on, but the only thing I can think of is some cleanup I did of an old ASP site and the less said, the better.


* Indie/ hipster required disclaimer: I am re-reading The Third Policeman, I read it well before it showed up in Lost, so cram it.

Tags:

On Optimization

It's strange what you can get used to: the current social network site I'm working on has a page with 216 database queries on it. Used to be I'd get the hives if I hit a dozen queries on a page.

"216! Did you know databases let you bring back more than one row at a time nowadays?"

Yes. The project is in Django (and built on top of Pinax), so it's the ORM making all those queries, not me. It's one of those social network site pages that aggregates activity from everyone you follow. It also shows details about them, how far they are away from you and any comments on the item, so there's only so small I can make it while coloring inside the lines of the mapping system. I've already fallen back to raw SQL for one of the elements (there are a couple of places, and sure to be more in the future, where we return a list of the database ids of all your friends so we can use them as part of " AND id in (x, y, z)" queries. Doing that through Django resulted in one query to the database for every friend you have. Given this was causing a slowdown when I'm the only user of the site and I only have 3 friends (one is another tester and the other two are dogs I know, so it's kind of a "Bob" situation (specifically the dog part and not the rest)), I had a suspicion that wasn't going to scale. Modified that, added some caching, got smarter about some lookups (I thought I'd only hit the db once no matter how many times I referred to a model's property in a function) and things are back to running smoothly.

"216!"

Hey, it was 1066 when I started a day ago. Or something close to that. I've got 1066 on the brain because I've been thinking about William of Orange and before you say--

"Write code for a job and think about William of Orange in your spare time. You must be a hit with the ladies."

--that, let me point out it was in reference to a Dutch Oven joke. That has to count for something.

"Undoubtedly. Perhaps 'lady killer' is more literal than figurative in your case."

Regardless, given the nature of the screen, aggregating a dozen types of activities from an arbitrary number of users, I don't think the current solution is the long-term answer, so I buttoned it up as best I could.

"As best you could? Implement the long-term solution now."

That would be solving a problem I don't have (c.f., "premature optimization", "YAGNI"). Given the data for this screen is derived from other objects in the system anyway, I think the long-term solution is to move this data into a nosql store (here's an example of using CouchDB in Django now and future updates to Django should improve support for this kind of thing). It's important to remember traffic issues fall under the title Good Problems to Have. While I'd love to spend a couple of days implementing this rightnowyespleasecani, if the overall project never takes off, it would be unfair to ask the client to pay for something they didn't ask for and never needed.

"216!"

I'm already obsessing over it on my own. Why do you think you're here?

Tags:

Microformat Proposal: Coding Experience

When I'm working, even in a language I know well, I often search for how to do something; either because I don't know or because I feel there's a better way (as @ed_atwell says, "I don't know, but I bet my friends Larry and Sergei do). My personal system for filtering code search results looks something like:

  1. Blogs I trust
  2. Personal blogs
  3. Development sites (e.g., 4guysfromrolla.com, etc.)
  4. Mailing lists and newsgroups1
  5. Forums
  6. Expert Sexchange

Regardless of where it comes from, there's no way to know if it's right. It's human nature to use the first thing that works (if under deadline, even the first thing that kinda works will do). As Jeff Atwood has pointed out (twice) , the danger is you might be copying off the paper of someone dumber than you2. Because of this, I'd like to propose a microformat (assuming one doesn't already exist, given I didn't bother to check with Larry and Sergei) to indicate an author's experience with a language.

Immediate disclaimer: I realize this is a programming solution to a human nature problem and those never work, but bear with me, because my hope isn't to fix the problem, but to provide some metadata that will let machines do the work for us so we can stay lazy. Given that is in line with Newton's First Law, this will obviously be a huge success.

The format doesn't need to be very complicated. In fact, I'd prefer if it just provided a few bits of raw data that could be remixed by search engines however they see best. The data provided would stay the same but the algorithms could be tweaked for better results (though that would require feedback), providing an incentive for search engines to consume the format. Make the data something rough, broad and quick to fill out, like years of experience with the language and a simple measure of number of lines written (e.g., none, 10, 100, 1,000, 10,000, a whole bunch). There are any number of issues with using Lines of Code (LoC) as a metric (mainly that an idiot can say in 1,000 lines what a smarter person can say in 10), but if the ranges are broad enough, it should dampen the effect.

Bolt this format onto syntax highlighting engines; this blog, for example, uses WP-Syntax to format the few, poor code samples I provide— one more panel in the plugin admin that allowed me to store a hash of [language name, years, lines of code] would allow the plugin to provide that information in any page using the languages and output a visible box on the page so inexperienced users who come to the page and see my code could know it was terrible without knowing it was terrible. Add it into the syntax formatters for popular forum software (and allow users to specify their experience) and every code argument in a forum post becomes a little easier to follow.

The format doesn't tell you if a snippet is correct, it just gives you some background information (assuming the author is honest in their self-reporting). The danger would be users trusting a snippet blindly because the author has 10 (bad) years of experience (a sort of "Appeal to authority") while better code from "newer" users goes ignored. That's a human nature problem and obviously you can't solve those with programming (/broad wink).

1. I'd rank these higher, especially official groups for languages and systems except for two reasons:

  1. They tend to be so ill-formatted and the ability to follow threads varies wildly from site to site
  2. The advice can be good but dated: it's easy to find perfectly legitimate Python answers from 2000 or so. While the answer is fine, it's possible there's a newer idiom and in a language like Python, where there's "one right way", the right way will be the way that the language has been optimized to work.

2. Basically unrelated story that I've crammed in because I always tell it because it cracks me up: in high school, we had to go to the local public high to take the SATs. The person sitting next to me scribbled furiously throughout the test and was always the first one finished (which frustrated me to no end). When we were walking out, he turned to us and said, "Dude, I just made pretty pictures with the bubbles."

Django/ Pinax: Problems With Login() in Unit Tests

This is the first in what promise to be a number of "Stupid Django Tricks" where the "stupid" is me and not Django. I was having a good deal of trouble creating unit tests for authenticated views (i.e., pages that require a user to be logged in) for the Pinax project I've been working on. I dug up two problems, one of which is on Pinax and one that's entirely on me:

  1. Pinax's settings.py file does not provide a setting for AUTHENTICATION_BACKENDS, so the test client's login method doesn't know how to log your user in. Specify "AUTHENTICATION_BACKENDS = ('django.contrib.auth.backends.ModelBackend',)" in your settings file. Actually, I lied. That's the default value for the setting; having gone back and re-run my tests without it specified, everything works, which means the only idiot here is the guy who . . .
  2. Don't create users by specifying the password directly in the declaration (e.g., user = User(username='Dummy', password='goodluck')). Use the set_password() User method to properly set the password.

I've run into a fair number of issues working in Django where Google wasn't helpful. I think 90% of those issues were because no one else was dumb enough to make such an obvious mistake. The other 10% were typos.

Expression Engine if Clauses

This is the kind of thing that's not worth a blog post except some day it might save one person hours of frustration. Expression Engine apparently doesn't like it when if statements either span multiple lines or when the trailing curly brace is pushed to a new line. I can't quite run down which it is, but it's not all that important: if your if clause isn't behaving as expected, make sure it's all on one line without any extraneous whitespace.

PHP Excel Exporter

A few times a year a client needs to export something from a database table to Excel. There's a simple hack to do it in most any language. There are actually a few, but having come up as a web developer, my preferred trick is to just build an HTML table and serve it as Excel by setting the mime type header. Having done this dozens of times, I finally formalized this into a simple PHP class tonight to save myself some time and figured I might as well share it.

The bad news: because I am lazy, it relies on an old data connection class I wrote years ago when I was even less bright than I am now. The thing's so ugly I posted it somewhere else because I am too ashamed to host it here. You can rip that out and use whatever you prefer by just changing the logic in _get_table() below. If you do choose to use my old data-class.php, be aware it expects 4 constants, DB_SERVER, DB_USER, DB_USER_PASS, DB_NAME to create a connection to the database.

Here's the exporter code itself Update: I moved the code to snipplr because this WordPress plugin doesn't handle newline characters very well.

The simplest use is to instantiate an object, tell the exporter what you want to appear in the header row in the spreadsheet (by setting column_heads to an array of values) and then calling export(), passing it the SQL query that gets the data. If the number of fields in your query doesn't match the number of heads in column_heads, the resulting HTML will be a mess. You will understand if the code assumes you never make such mistakes. Here's a code example:

  1.  
  2. $e = new ExcelExporter();
  3. $e->column_heads = array("First Name", "Last Name");
  4. echo $e->export("SELECT first_name, last_name FROM table");
  5.  

Quick notes:

  • Control the Excel filename in my example by setting $e->filename("something-else.xls")
  • Add a timestamp to every file (useful for making sure the filename is always unique) by setting $e->timestamp_file = true
  • When you're trying to implement this and it's not working and having to say yes to the popup and let the file open in Excel is driving you crazy, set $e->debug = true and it will skip the Excel headers, sending the output to the browser

The big gotcha that works well for me but might not for you: there's a hook in the code that passes every data column through _format_field(). In my current class, this looks for any field with "_date" in the column name, assumes that field is a Unix timestamp and transforms the value into a m/d/y date. If you live in the other 99% of the world where people format their dates un-Americanly, well, you can do that like this: $e->date_format("d/m/y") or whatever other crazy date/ time format you like.

If you think that behavior stinks, rip it out. Alternatively, you can modify it or subclass this code (like "client-xyz-exporter extends ExcelExporter" for every client who lives in Excel) and change _format_field() to do whatever you want in a one-off sort of way. This is not high art, it's just a faster way of making someone happy (if you can imagine the kind of person whose life is improved by additional spreadsheets).

What a Database Can Say About You

When I started at my previous job, we were small-time. Sites were ASP or (horrors) ColdFusion talking to an Access database, and even that was only for fancy clients who wanted a record of the Contact Us form submissions from their site. We got a little bigger and a little better and we started rolling out small tools to manage pieces of sites (typically press releases or job postings). Again, ASP talking to Access. Like Mike Mulligan and his steam shovel, the more we worked, the better we got and then, like Mike, we dug ourselves into trouble. We built a site for a client with enough web traffic that Access couldn't keep up (hard to imagine, I know).

Enter SQL Server. We tried to keep up the pretense this expensive piece of software was a one-time thing, but success here bred more work and attracted more clients who could justify the expense of a machine and a SQL Server web connector license or whatever the hell it's called. This left reusable bits of code all over, but we didn't have that many clients who needed their own server. We sucked it up and added a shared SQL Server for mid-sized clients so we could keep using the same code. At this point, some of the most inventive work being done in development was justifying why a new site wouldn't work with Access. It wasn't long before that second shared SQL Server came on line. Our codebase was driving our hardware and platform decisions.

This ossifies a company: if you can't take on jobs smaller than six figures because you can't hide the software licensing costs, you lose out on smaller jobs. That doesn't look like a problem to a company in this state because they've developed a mindset that says, "We only work on projects that are worthy of our time and platform" (at The Daily WTF, this would be called "enterprise-y"). There was no reason we couldn't have put MySQL or Postgres on Windows and gotten 90-100% of the same performance for free.  Small projects don't add a lot to cash flow, but they can be portfolio pieces, they can turn into bigger jobs, they can create connections that lead to bigger jobs. If they do none of those things, they do wonders for development teams. They're calisthenics. Clients with small budgets don't have small plans; they want everything the guy at the next table is having, they just don't want to pay his tab. "Do more with less" was the derisive slogan of the final season of The Wire; when it comes down from management, it is worthless. But when it's baked into a (well-managed) project, it can force developers to step back and figure out how they can recreate the code they've been cutting and pasting in a smaller space. That's the kind of challenge that not only keeps developers learning, but keeps them interested and in fighting trim.

Design Reminder

Dearest Self,

When designing and building a system, don't just treat the base-level objects as black boxes ("as dumb as they can be, but no dumber"), but the modules they roll up into should be black boxes as well. That way, when you get pulled off progress on one of the modules, you can just tie things off and leave them. If they don't support the few places they interact with others, they're not done yet. Also, documentation never hurts.

God, What Won't You Do?

I'm coming to (near?) the end of what can only charitably be called a "difficult" project. Unfortunately, I don't think I will be working with any of the parties involved in the future. For future readers and my present sanity, this feels like a good time to define what I don't do for a living; sometimes that negative space tells more about a thing than the thing itself.

I don't work without specifications (requirements). Real specs, not three pages of bullet points in Microsoft Word. You don't have to know it all when we start and it's part of my job to get those requirements right, but it is a bad idea to put hands to keyboard before I know exactly what I'm doing. Every development project changes in definition from when it starts. That's the beauty and pain of requirements. You sit down thinking it's stupid to have to tell me what to do and then we start talking about what to do and the conversation chases dozens of tangents. Ratholes appear. Better to find them up front and deal with them (or pour concrete down them) than find out "Later".

I don't put together the requirements without talking to you. I don't know your business, you do. Even if you just got hired last week and this is your first assignment, you still know better. You know who to ask (and who the office gossip says to avoid), you know where to go for answers outside your company and you know how to ask relevant questions that get helpful answers (compare and contrast with my: "So how do you guys ship stuff to people?").

I don't build the system for you. I build it for your customers, your users. It's rare that you're going to be a perfect example of your customer and even then, no. You can't serve two masters (the application vs. career success). This doesn't mean every project has to involve one-way glass, video cameras, white lab coats and a testing facility, but it does mean putting together some kind of test, even if it's just simple hand drawings of screens to show to random folks in the hallway to make sure we haven't missed something glaringly obvious. Oh, and about those paper protoypes: I don't work without some kind of screen mockup. It doesn't need to be the Sistine Chapel. It doesn't even need to look like the final screen. We just need some kind of reference so when I hand in my work, there's an agreed-upon set of things that should be there. This also saves me bugging you in the middle of the night, so invest in your sleep up front.

I don't know anyone that handles scope creep well, but let me define that in my terms: changes or additions to the requirements after they've been fully thrashed out and after work has begun. Even as an hourly contractor, I get skittish. "Hey, it's just more work for you." Sure. But if it affects things that have been finished and vetted, it's A Bad Idea. However, things come up and needs change. My rule of thumb is that if the change would alter how data is stored, then let's suck it up and do it now. Otherwise that means the work has to get done later and data migration will have to be done. Better to do one difficult thing now than one difficult thing and one risky thing later. If it doesn't change the data, we should talk about creating a "Things for the next version" list.

I don't work without a QA person or team. This doesn't have to be anyone with a degree and 100 years of experience in Quality Assurance. It just needs to be someone other than me who can run through the system as a user, spot things that don't match the requirements and tell me about them in a meaningful way, e.g., "Here's what I did, here's what should have happened, here's the big explosion I got instead" as opposed to "COMPUTER BAD!" I've never met a developer who could qa their own work, myself included. After spending weeks or months or years designing the flow of things, we have a bad habit of "testing" the system by using it in the exact fashion it was designed ("Click on this button, then give it a value between 1 and 10 but never a decimal and then wait five minutes") instead of beating on the thing until it can withstand all challenges.

After all that, surprisingly, there are some things I do do*. Email me at tclancy@gmail.com if you want me to do them for you.

I know it comes out like "doo-doo" and I know that was a poor way to phrase the fact I know it, which was even more fun.

Getting Started on Rails, Again

This post exists as a note to myself so I can remember the pain. So far:

7:30am - My old copy of Agile Web Development with Rails may be more of a hindrance than a help. It's from the days of yore (Rails 1.2 or so) and I'd like to work with the current 2.0 version of Rails, but this means some of the book's command-line instructions have to be ignored and the "modern" equivalent found online. Turns out database tables get created in a "user-friendly" migration instead of the older convention of raw sql files. I'll need to find the way to add indicies and other db tuning info to the migrations. It's all irrelevant if I can't figure out why my model doesn't show any fields when I try to create it in the admin. I either fouled something up using the old conventions or there's some miscommunication between Rails and MySQL (it groused about not having the mysql gem installed a while back, but I installed it-- of course, that barfed when trying to install its documentation, but it said it installed correctly and I don't see the log warnings anymore). I feel like Aptana is more of a hindrance than a help right now as I'm drowning in options and ways to investigate what's going on.

7:45am - Going back and reading the link above in a more linear fashion, it does clearly say "no dynamic scaffolding", which may be my problem, that I'm trying to rebuild the scaffolding over what I initially generated. All I see when I go to the scaffold page is a create button with no fields. Pressing the create button adds a record to the database, so it's not a db communication problem (whatever it is, it's the same problem described by James here with no solution provided). I just went back, deleted all my databases and all the files/folders I could find related to the one model I've been trying to create (User). And that . . . did not work. Awesome. I can see why people love this so. It really is like magic.

7:50am - Ok, so I'm a little dim. Apparently "no dynamic scaffolding" doesn't mean "There's a new way to do this". It means "Gone". Either I'm really stupid or the tutorial linked above dances over some bits that need doing. Moving on to this tutorial since it seems to address this problem. If nothing else, I learned that I could do the Model & Controller destruction via script instead of deleting the relevant files by hand. Really helpful when starting on a framework or language, which is usually a set of fits and starts. That worked, but it took two tries. The first time I did it without specifying the fields at the command line, e.g., "ruby script/generate scaffold User", then added my fields to the migration and ran the db:migrate task. Nothing doing. So I destroyed the model & controller, re-ran the generate with all my fields listed in the line and then restarted the server without bothering to run db:migrate (the new 001_create_users.rb matched the older one exactly). It would appear the key difference is that the second run offered to overwrite the user views. Not sure if they weren't there the first time around or if I did something different. Either way, it works. And I got about 3% of where I wanted to get in the first 2 hours.

8:10am - A discussion that explains things a little more clearly. It makes sense that the scaffolds are just a starting point, but the one thing that puts me off Rails every time I start up again is the attitude of the community. It's endemic. I tried to get help in IRC once and anyone with a question is treated like an ass for not being part of the club. I do this for a living, I just don't do it in your language and framework of choice. The community feels so closed-minded, but closed in a really interesting way, because I'm either getting too old or there's a stink to the Rails Borg that makes it an Insiders Club no one but misanthropes would actually want to be accepted into. I have a Powerbook and a copy of Textmate. Neither of them changed my life. If they were gone tomorrow, I could cope. My simplistic observation of how to "belong":

  1. Mac laptop
  2. Textmate
  3. Glasses
  4. Short haircut

Actually, I have all of that. Still not interested. I should point out your development machine needs to be a laptop so you can develop on the road at all the places no one is ever going to ask you to go. From that discussion: "dynamic scaffolding was a crutch that kept people from really getting Rails from the start". See? It's not about being able to get whatever you want done. It's about learning Ruby and Rails and falling deeply in love with them. Bad enough when developers lose sight of what an application is supposed to do for people and start gold-plating features because the app becomes an end goal in itself. This guy wants you to move another step of abstraction past that: you need to spend some time learning the framework upfront. He's not so much turning the value proposition of Rails upside-down as he's throwing it out. I'm guessing he has full-time employment with no real deadlines.

Tags: