When a client needs to manipulate data, it’s easiest to give them an Excel spreadsheet. Creating a spreadsheet is easy with Python, and making it available for web download is just as easy with the Django web framework. After a brief tangent into evolutionary history, you’ll find out how to do both.
Sharks haven’t evolved much in millions of years, mostly because they’re wildly efficient. They haven’t had to adapt, or change, because they’re really good at their job. How could you possibly improve on a shark? They’re perfectly terrifying fish eating machines.
Much like sharks, spreadsheets haven’t evolved much in the past few decades. Following the shark pattern, the reason for their near-stagnation is because they’re really good at their job. Take a bunch of data, present it in a table, allow non-programmers to sort and manipulate the numbers to achieve new views of the data. They’re perfectly terrifying data eating machines.
Given the efficiency of spreadsheets, sometimes it pays to acknowledge their power, and not waste your time trying to improve on it. A huge swath of web applications can be described as “taking a database and putting it on the web.” In an iterative development environment, the first few cycles are taken by “getting the data into the database.” Once that’s completed, product owners start asking for views of the data in the web app. This is where it gets sticky.
It’s rare to find a client that knows exactly how they want to manipulate the data you’re going to be presenting to them. At times, they might think they know what they want, but as soon as you complete the work, it’s most likely not really what they wanted.
This isn’t some flaw in a product owner’s mind; rather, it’s a direct result of their probably never having actually seen all the data in one place before. The fact is, once they start playing with this new source of information, patterns they hadn’t considered emerge.
Given the above, whenever possible, I like to give a product owner an Excel representation of their data. I usually allow them to download it through a web application. When they’ve played around with the data in Excel, they’ll have a better idea of how they use the data, and then know how they ultimately want to see the data presented. It saves me time, and leads to a better end product.
In many cases, a “download data as Excel” button is also a great way to make data portable. As the application matures away from needing the Excel view, the app will keep its Excel export function as a way for a user to take data offline. Any way you look at it, having the ability to download to Excel is an attractive feature that solves a lot of headaches.
Now that I’ve established that spreadsheets are great, how do you create them and allow them to be downloaded on the fly from a web application? I’ll start with creation, and then explain how to make them available for download.
The following example uses Django as the web application framework, but the general concept is portable to any toolkit you like.
Creating an Excel Spreadsheet with Python
Excel is (obviously) a proprietary format, but it tries its best to be open (sort of). It’s pretty easy to open a comma-separated file from Excel and treat it as if it’s a spreadsheet. At the same time, it’s pretty easy to create a comma-separated file from Python. Clearly, Python and Excel are made for each other, their common language being csv files.
The Python standard library comes with a nice csv writer class. This brief example shows its use:
import csv w = csv.writer(open('output.csv','w')) for i in range(10): w.writerow(range(10))
The contents of output.csv look like:
0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9
When the file is opened in Excel, you’re presented with a dialog that asks for the separator character. Tell it comma and you’re all set.
That said, the extra step of working with a csv file might not be attractive to an end user. Luckily, Roman Kiseliov has created an excellent library for writing binary Excel files with Python. The library is called pyExcelerator and is available from its project page on Sourceforge. The writer doesn’t require Windows, or Excel, which makes it even easier to run.
Similar to the previous csv example, you can write a 10 by 10 grid with pyExcelerator:
from pyExcelerator import * wb = Workbook() ws0 = wb.add_sheet('0') for x in range(10): for y in range(10): # writing to a specific x,y ws0.write(x,y,"this is cell %s, %s" % (x,y)) wb.save('output.xls')
The scripts are similar, but there are a few key differences. With pyExcelerator, you have to explicitly create a Workbook object and add a Worksheet object to it. When you write to the Worksheet, you write to a specific x,y location. This gives you much more flexibility, but also requires you to visualize the grid as you write to it.
Although pyExcelerator is trickier to use, it’s much more powerful. In addition to writing simple grids of data, you can format cells, insert Excel formulas, and generally take advantage of all the features you’d expect from Excel. With the csv writer, you’re limited to simple grids.
Downloading Excel from Django
Now that you can create Excel-compatible files with Python, how do you allow them to be downloaded with Django? It’s pretty straightforward, and shows us how to generate and provide just about any kind of binary file. If you can create it, your users can download it.
The secret is in the ‘content-type’ HTTP header. When your browser requests a file from a server, the transaction looks like this:
GET /wp-content/uploads/2007/10/cropped-dsc_0020.jpg HTTP/1.1 Host: weblog.lonelylion.com User-Agent:Mozilla/5.0 (X11; U; Linux i686; en-US; rv:18.104.22.168) Gecko/20071204 Ubuntu/7.10 (gutsy) Firefox/22.214.171.124 Accept: image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://lonelylion.com/ HTTP/1.x 200 OK Date: Sat, 02 Feb 2008 17:53:58 GMT Server: Apache/1.3.37 (Unix) mod_throttle/3.1.2 DAV/1.0.3 mod_fastcgi/2.4.2 mod_gzip/126.96.36.199a PHP/4.4.8 mod_ssl/2.8.22 OpenSSL/0.9.7e Last-Modified: Thu, 01 Nov 2007 03:22:12 GMT Etag: "798f213-5c88-47294664" Accept-Ranges: bytes Content-Length: 23688 Keep-Alive: timeout=2, max=97 Connection: Keep-Alive Content-Type: image/jpeg
The part that you’re interested in is that last line, ‘Content-Type: image/jpeg’. That line tells the browser what the data that follows means. The browser then decides how to display the data, or what external application to open to display it.
The above example is a jpeg image, which is why the Content-Type is image/jpeg. If you change that header to ‘application/ms-excel’, the file will be opened with Excel. Changing header types in Django is very easy; here is an example of a Django view that sends the browser a csv file and tells the browser to open it in Excel:
import csv from StringIO import StringIO from django.http import HttpResponse def show_excel(request): # use a StringIO buffer rather than opening a file output = StringIO() w = csv.writer(output) for i in range(10): w.writerow(range(10)) # rewind the virtual file output.seek(0) return HttpResponse(output.read(), mimetype='application/ms-excel')
I made a few modifications to your original csv example; it’s using a file-like StringIO object rather than actually opening a file, and it’s wrapping the response in the Django HttpResponse object, which is the standard return value object type of a Django view. You’re also setting the Content-Type header to ‘application/ms-excel’ by passing HttpResponse the content_type option.
The above pattern lets you return just about any kind of binary data through the web browser. You can use libraries that generate PDFs, images, sound files, video, whatever…as long as you know the appropriate Content-Type header and can generate the binary data.
Now, apply your technique to outputting an actual Excel file generated by pyExcelerator:
from pyExcelerator import * from django.http import HttpResponse def show_excel(request): wb = Workbook() ws0 = wb.add_sheet('0') for x in range(10): for y in range(10): # writing to a specific x,y ws0.write(x,y,"this is cell %s, %s" % (x,y)) wb.save('output.xls') return HttpResponse(open('output.xls','r').read(), mimetype='application/ms-excel')
Again, it’s pretty similar to the previous example of creating an Excel file with pyExcelerator. There’s one bit that isn’t ideal, though—creating a temporary file to write the data to, then opening it and reading from it. This could cause problems on a high traffic site. The file could become corrupted as one user hits it while another is reading from it. You could attach time stamps to the file name to help with this, but really it would be ideal if you could pass the Workbook save method a file-like object to save to. It isn’t currently an option in pyExcelerator, but I’m sure a patch could be offered.
More often than not, I just send the browser a csv file with the Content-Type set to ‘application/ms-excel’. The use of StringIO makes for a clean implementation, and most end users don’t mind the bit of oddness because the amount of time it takes to build a downloadable spreadsheet is typically less than five minutes. It keeps users happy, and lets them play with their data very quickly. The more they play with it, the better able they’ll be to request HTML views of it that they’ll actually use. Everyone wins.