I’m already off to a rocky start with a blog title that contains three techie terms: .htaccess, PHP and PEAR (and that’s not a fruit; if it were it wouldn’t be in all caps) and PDFs. But those are the ingredients that, when combined with a little web-development ingenuity, solved a very real problem and saved a lot of real dollars. Not a tech geek? This post might not be for you. Unless you happen to want to stop printing and folding form letters, stuffing and addressing envelopes, and paying for postage, and have a web server at your disposal.
Background
Before the onslaught of techno-babble begins, a bit about the problem I was solving. Each quarter, the University of Washington (UW) notifies students whose academic performance earned them a spot on the Dean’s list. This notification, called a Dean’s letter, was sent by mail to qualifying students’ permanent addresses.
Due to the deep cuts to the UW’s budget, the Office of the University Registrar (in which I work) wanted to switch Dean’s letters to an e-mail process. Doing so would save the roughly fifty cents in material and postage cost for each of the 5,000 to 8,000 Dean’s letters sent each quarter—not to mention the happy trees that wouldn’t have to sacrifice themselves to carry the ink.
Seems simple enough: just e-mail the letters to students, right? Sure, except for:
- the capabilities of various e-mail clients in use by students;
- many students’ (and their parents’!) desire to keep or frame Dean’s letters as a sort of academic trophy; and
- FERPA, the Family Educational Rights and Privacy Act of 1974, that regulates student educational records. (Get all the details here or here, but in a sentence: you can’t display any part of a student’s educational record in a manner that would allow others to see it—and Dean’s letters are definitely a student record.)
First attempt: Access, Word, Acrobat and Exchange
Before I got involved, the good folks in our Data Management office first attempted the switch to e-mail using Microsoft tools: student data stored in Access (database), fed to Dean’s letter templates in Word (word processing) via its mail merge function, exported to PDF with Acrobat 9 (an Adobe product, not Microsoft’s), and passed to Outlook to be e-mailed via Exchange. Sound like a lot of software? It was. This approach was tried unsuccessfully for many weeks.
Let me state that I have no grudge against Microsoft tools, nor do I think a solution couldn’t be found using them (Adobe’s Acrobat 9, however, has an issue I did rant about). But after hearing often that messages were mysteriously choking the Exchange mail server, I offered to help. I don’t know a thing about Exchange, so rather than trying to fix it, I suggested a different approach. Once again, I don’t mean to imply these tools can’t accomplish this task, only that the difficulties experienced couldn’t be resolved given the time available by our IT staff.
Second attempt: PHP, PDFs, and e-mailing with PEAR
My approach was to move the solution to the UW’s unix-based servers, which run PHP, an open-source web programming language. I suggested we:
- Generate PDF files of the Dean’s letters as before (using Word’s mail merge and Acrobat) and upload them to the web server.
- Export a list of student data from Access into a comma-separated values (CSV) list.
- Write a PHP script to:
- read in student data from the CSV file (name, student number, e-mail address, etc.),
- rename the correct PDF file to the student’s unique student ID number,
- attach that file to an e-mail object using PHP’s PEAR code library, and
- e-mail the message with attachment to the student’s primary e-mail address.
When I started this project, it seemed pretty straightforward. “I’m just switching out PHP’s mail function for Exchange,” I thought, since that was were the problem lay. I didn’t realize the number of obstacles that would get in the way.
The first issue to overcome was the volume of messages and their (relatively) large size due to the PDF attachment for each. Summer quarter traditionally has fewer students, yet we still needed to e-mail roughly 2,500 messages. Obviously, sending that many messages, each about 80k in file size, couldn’t be done all at once. Doing so would raise some flags with the university’s technology folks who run the mail servers.
This problem was solved with PEAR’s Mail_Queue package. It not only provides code to store outgoing messages in a database (MySQL or others), from which they can be sent at a throttled rate, but it actually provides the SQL statement to set up the necessary fields in the database. Very nice! Once set up, it was a simple matter to queue each message as it was generated—rather than send them immediately. Then I created a cron job (a server tool to automatically perform processes at defined intervals) to run a simple script to release 25 of the queued messages every five minutes.
The next obstacle? Recipients e-mail clients. While I took all the precautions I could to ensure the PDF attachment was sent properly (including setting the proper MIME-type encoding and tests on all major e-mail clients: Gmail, Hotmail, Outlook 2003/2007, and AOL), some things were just beyond my control. As I fielded student complaints (which, thankfully, were only about 3% of the total recipients), it became clear that those who did not receive their attachment all used some version of the UW’s own Alpine software. Turns out there is an open bug in Alpine that “sometimes just doesn’t show an attachment.” (That’s all was told, and digging up more info on it seemed pointless.) And a handful of students received a PDF, but it was blank. Quite odd, but I did see a blank PDF one student forwarded me, and confirmed that what was sent to him did contain data—I’m chalking that one up to gremlins. For all these students, I re-sent them their letters individually, and all reported that they received them.
So, while this approach worked, it was not without its problems; problems that would only get worse as the volume of letters increased. Plus, there’s another (though minor) downside: not all students use their UW-provided e-mail address as their primary e-mail. Therefore, this process does send an educational record outside the university, which is less secure.
Third attempt: PHP, PDFs, and secure downloads with .htaccess
As I tried to reconcile the issues with e-mailing these Dean’s letters, a much better solution hit me. To paraphrase (and otherwise mangle) a metaphor: “if you can’t move the mountain to the student, we’ll bring the student to the mountain!” In other words, we’d send a notification message via e-mail containing a link to the PDF. Excellent, problem solved… almost. This approach had its own set of problems, the most glaring of which is security. To comply with FERPA, we have to ensure that the existence of a Dean’s letter—or more importantly, the lack of one—cannot be discovered by anyone other than its recipient. Put in practical terms, a URL structure like:
http://university.edu/registrar/deansletters/quarter/studentnumber.pdf
isn’t acceptable. Anyone with an iota of smarts and half a desire to snoop into their classmates’ educational prowess could type in their student number and (a) see their letter, if it exists, or (b) know that they didn’t make the list simply by the absence of that letter.
Password-protect the PDFs themselves? Might work, but PDF passwords aren’t perfectly secure, and how would we communicate the password with the recipient? If sent in the e-mail along with the link… well, that’s not much security at all, is it? And anything relatively obvious (student number, birth year, etc.) would be readily guessable by others, assuming they had also received one and knew the file’s password was their student number, birth year, etc. And this approach wouldn’t solve the problem of knowing whether another student made the Dean’s list merely by the existence of the PDF. No, password-protecting the files won’t work… we need something else.
The solution to this issue seemed clear: require the students to log in. Luckily, the University of Washington has for years now issued NetIDs to all students. Each unique UW NetID serves as both the student, staff, or faculty member’s UW-provided e-mail address (when appended with @u.washington.edu) and their authentication token for all university systems. Since it grants access to personal records, course scheduling, e-mail, and so much else, students are good about keeping their NetID passwords to themselves. Yes, NetIDs were a perfect solution, but how to implement it such that only the intended recipient can log in and see their letter?
Restricting access to a user (or group of users) is easy with UW NetIDs and .htaccess files. An .htaccess file (found on Unix-based systems like the UW’s) is a set of directives that control access to a directory, a file, or even set of files that meet certain criteria. It’s a sophisticated yet simple system that would do the trick: restrict access to a file to specific UW NetID. The only remaining issue: setting up .htaccess-based controls for the files. Remember, we’re dealing with many thousands of Dean’s letters each quarter, far too many to create manually.
Once again, PHP comes to the rescue. My script already opened and read in a CSV data file containing the student recipients’ information, and one of the fields in each of those records was the student’s UW NetID. I realized I had everything I needed to solve the puzzle! As the script looped through each record in the data file to create and queue up an e-mail message to the student, I could open and write another line to an existing .htaccess file, like this:
$handle = fopen( $dir . "/.htaccess", "a+");
$htaccessCode = "" .
"require user " . $studentNetID .
""
$writeSuccess = fwrite( $handle, $htaccessCode );
For security’s sake, I won’t explain what each variable is (they’re designated by the $) but you can see that for each student, a new FilesMatch directive is added to the directory’s .htaccess file. It specifies a unique identifier (that’s the name of the PDF file) and that access to that file is limited to only one user: the recipient student, as identified by their UW NetID.
Sure, this setup results in a single .htaccess file consisting of thousands of lines, but it’s still only a few hundred kilobytes in size and it causes no noticeable performance hit. And it allows me to generate a unique URL for each student (as shown above) that, when clicked, requests the specific PDF file from the server. Thanks to the .htaccess file’s entry for that file, the user is prompted for their UW NetID and (presumably quite secret) password.
It’s a great solution. First, it saves us having to send thousands of ~80kb e-mails with error-prone attachments. Second, it keeps all student educational records secure on UW servers—no sent to Hotmail or other third-party e-mail systems. Third, access is limited to just the intended recipient and protected by an existing, secure password. But there’s one thing this solution doesn’t yet do: close the “file existence” loophole.
Suppose a mischievous recipient decided to check whether his friend got a letter by entering in her student number in place of his own in the URL. If he was greeted with a UW NetID log-in prompt, he could presume she did receive a letter—he just couldn’t view it. If he saw the regular 404 “file not found” error, he’d know she didn’t make the Dean’s list because no letter existed for her. Yes, he’d have to know her student number but they are not all that secret. And yes, it’s not the worst security breach, but its still a breach. Closing this security loophole wasn’t hard: just another entry in the .htaccess file, like this:
require user xxx
This entry relies on the web site’s handling of 404 errors, which is to direct the user to a specific page whenever a requested file is not found. By requiring a specific user (here shown as xxx, though in reality a valid staff person’s UW NetID is used), even when a non-existent Dean’s letter is requested—actually, any non-existent file, like blahbjhfadf.html, etc.—the user is still prompted for a UW NetID. It’s a slick way to close the “file existence” loophole: it’s impossible to determine whether a given student number received a Dean’s letter because every file request within that directory prompts for a log in.
Summary
This was a long post, but the process of analyzing a set of business needs, trying different approaches to meet them, and outlining the best tools (PHP functions, .htaccess files, and some creative thinking) is worth it. The resulting website resolves the issues and allows the UW to inform students of their Dean’s list status in a secure manner while saving many thousands of dollars in printing and postage cost. I’d call that a successful project!
Who’s saying what