Tag Archives: PHP

Saving time with PHP and date math

18 Jan

You may have cringed when read­ing the term “date math,” which sounds both cum­ber­some and dif­fi­cult. It brings to mind the addi­tion and sub­trac­tion of tricky num­bers like 7, 29, and 31; and con­cepts like “two weeks from now,” “yes­ter­day,” and “next month.” And if you’ve every tried to build a sys­tem that tracks rel­a­tive dates you know such things can be a hassle.

So why bother? Because date math is very use­ful! For exam­ple, sup­pose you man­age a web­site that includes due dates, sub­mis­sion dead­lines, and pri­or­ity reg­is­tra­tion peri­ods that recur. If you’re me, you man­age a site that’s full of such dates that recur every aca­d­e­mic quar­ter. Keep­ing these dates in sync is a task that PHP’s date math func­tions were born writ­ten to do.

(more…)

& PEAR, and PDFs saved the day (and thousands of dollars)">How .htaccess, PHP & PEAR, and PDFs saved the day (and thousands of dollars)

18 Dec

& PEAR, and PDFs saved the day (and thousands of dollars)" href="http://scottbush.net/v2/blog/2009/12/18/how-htaccess-php-pear-and-pdfs-saved-the-day-and-thousands-of-dollars/" >How .htaccess, PHP <span class=& PEAR, and PDFs saved the day (and thousands of dollars)" class="thumbnail" width="490" height="200" />

no-stampsI’m already off to a rocky start with a blog title that con­tains three techie terms: .htac­cess, PHP and PEAR (and that’s not a fruit; if it were it wouldn’t be in all caps) and PDFs. But those are the ingre­di­ents that, when com­bined with a lit­tle web-development inge­nu­ity, solved a very real prob­lem and saved a lot of real dol­lars. Not a tech geek? This post might not be for you. Unless you hap­pen to want to stop print­ing and fold­ing form let­ters, stuff­ing and address­ing envelopes, and pay­ing for postage, and have a web server at your disposal.

Back­ground

Before the onslaught of techno-babble begins, a bit about the prob­lem I was solv­ing. Each quar­ter, the Uni­ver­sity of Wash­ing­ton (UW) noti­fies stu­dents whose aca­d­e­mic per­for­mance earned them a spot on the Dean’s list. This noti­fi­ca­tion, called a Dean’s let­ter, was sent by mail to qual­i­fy­ing stu­dents’ per­ma­nent addresses.

Due to the deep cuts to the UW’s bud­get, the Office of the Uni­ver­sity Reg­is­trar (in which I work) wanted to switch Dean’s let­ters to an e-mail process. Doing so would save the roughly fifty cents in mate­r­ial and postage cost for each of the 5,000 to 8,000 Dean’s let­ters sent each quar­ter—not to men­tion the happy trees that wouldn’t have to sac­ri­fice them­selves to carry the ink.

Seems sim­ple enough: just e-mail the let­ters to stu­dents, right? Sure, except for:

  • the capa­bil­i­ties of var­i­ous e-mail clients in use by students;
  • many stu­dents’ (and their par­ents’!) desire to keep or frame Dean’s let­ters as a sort of aca­d­e­mic tro­phy; and
  • FERPA, the Fam­ily Edu­ca­tional Rights and Pri­vacy Act of 1974, that reg­u­lates stu­dent edu­ca­tional records. (Get all the details here or here, but in a sen­tence: you can’t dis­play any part of a student’s edu­ca­tional record in a man­ner that would allow oth­ers to see it—and Dean’s let­ters are def­i­nitely a stu­dent record.)

First attempt: Access, Word, Acro­bat and Exchange

pdf_icon_150pxBefore I got involved, the good folks in our Data Man­age­ment office first attempted the switch to e-mail using Microsoft tools: stu­dent data stored in Access (data­base), fed to Dean’s let­ter tem­plates in Word (word pro­cess­ing) via its mail merge func­tion, exported to PDF with Acro­bat 9 (an Adobe prod­uct, not Microsoft’s), and passed to Out­look to be e-mailed via Exchange. Sound like a lot of soft­ware? It was. This approach was tried unsuc­cess­fully for many weeks.

Let me state that I have no grudge against Microsoft tools, nor do I think a solu­tion couldn’t be found using them (Adobe’s Acro­bat 9, how­ever, has an issue I did rant about). But after hear­ing often that mes­sages were mys­te­ri­ously chok­ing the Exchange mail server, I offered to help. I don’t know a thing about Exchange, so rather than try­ing to fix it, I sug­gested a dif­fer­ent approach. Once again, I don’t mean to imply these tools can’t accom­plish this task, only that the dif­fi­cul­ties expe­ri­enced couldn’t be resolved given the time avail­able by our IT staff.

Sec­ond attempt: PHP, PDFs, and e-mailing with PEAR

php-med-transMy approach was to move the solu­tion to the UW’s unix-based servers, which run PHP, an open-source web pro­gram­ming lan­guage. I sug­gested we:

  1. Gen­er­ate PDF files of the Dean’s let­ters as before (using Word’s mail merge and Acrobat) and upload them to the web server.
  2. Export a list of stu­dent data from Access into a comma-separated val­ues (CSV) list.
  3. Write a PHP script to:
    • read in stu­dent data from the CSV file (name, stu­dent num­ber, e-mail address, etc.),
    • rename the cor­rect PDF file to the student’s unique stu­dent ID number,
    • attach that file to an e-mail object using PHP’s PEAR code library, and
    • e-mail the mes­sage with attach­ment to the student’s pri­mary e-mail address.

When I started this project, it seemed pretty straight­for­ward. “I’m just switch­ing out PHP’s mail func­tion for Exchange,” I thought, since that was were the prob­lem lay. I didn’t real­ize the num­ber of obsta­cles that would get in the way.

The first issue to over­come was the vol­ume of mes­sages and their (rel­a­tively) large size due to the PDF attach­ment for each. Sum­mer quar­ter tra­di­tion­ally has fewer stu­dents, yet we still needed to e-mail roughly 2,500 mes­sages. Obvi­ously, send­ing that many mes­sages, each about 80k in file size, couldn’t be done all at once. Doing so would raise some flags with the university’s tech­nol­ogy folks who run the mail servers.

pear_icon_120pxThis prob­lem was solved with PEAR’s Mail_Queue pack­age. It not only pro­vides code to store out­go­ing mes­sages in a data­base (MySQL or oth­ers), from which they can be sent at a throt­tled rate, but it actu­ally pro­vides the SQL state­ment to set up the nec­es­sary fields in the data­base. Very nice! Once set up, it was a sim­ple mat­ter to queue each mes­sage as it was generated—rather than send them imme­di­ately. Then I cre­ated a cron job (a server tool to auto­mat­i­cally per­form processes at defined inter­vals) to run a sim­ple script to release 25 of the queued mes­sages every five minutes.

The next obsta­cle? Recip­i­ents e-mail clients. While I took all the pre­cau­tions I could to ensure the PDF attach­ment was sent prop­erly (includ­ing set­ting the proper MIME-type encod­ing and tests on all major e-mail clients: Gmail, Hot­mail, Out­look 2003/2007, and AOL), some things were just beyond my con­trol. As I fielded stu­dent com­plaints (which, thank­fully, were only about 3% of the total recip­i­ents), it became clear that those who did not receive their attach­ment all used some ver­sion of the UW’s own Alpine soft­ware. Turns out there is an open bug in Alpine that “some­times just doesn’t show an attach­ment.” (That’s all  was told, and dig­ging up more info on it seemed point­less.) And a hand­ful of stu­dents received a PDF, but it was blank. Quite odd, but I did see a blank PDF one stu­dent for­warded me, and con­firmed that what was sent to him did con­tain data—I’m chalk­ing that one up to grem­lins. For all these stu­dents, I re-sent them their let­ters indi­vid­u­ally, and all reported that they received them.

So, while this approach worked, it was not with­out its prob­lems; prob­lems that would only get worse as the vol­ume of let­ters increased. Plus, there’s another (though minor) down­side: not all stu­dents use their UW-provided e-mail address as their pri­mary e-mail. There­fore, this process does send an edu­ca­tional record out­side the uni­ver­sity, which is less secure.

Third attempt: PHP, PDFs, and secure down­loads with .htaccess

As I tried to rec­on­cile the issues with e-mailing these Dean’s let­ters, a much bet­ter solu­tion hit me. To para­phrase (and oth­er­wise man­gle) a metaphor: “if you can’t move the moun­tain to the stu­dent, we’ll bring the stu­dent to the moun­tain!” In other words, we’d send a noti­fi­ca­tion mes­sage via e-mail con­tain­ing a link to the PDF. Excel­lent, prob­lem solved… almost. This approach had its own set of prob­lems, the most glar­ing of which is secu­rity. To com­ply with FERPA, we have to ensure that the exis­tence of a Dean’s letter—or more impor­tantly, the lack of one—cannot be dis­cov­ered by any­one other than its recip­i­ent. Put in prac­ti­cal terms, a URL struc­ture like:

http://university.edu/registrar/deansletters/quarter/studentnumber.pdf

isn’t accept­able. Any­one with an iota of smarts and half a desire to snoop into their class­mates’ edu­ca­tional prowess could type in their stu­dent num­ber and (a) see their let­ter, if it exists, or (b) know that they didn’t make the list sim­ply by the absence of that letter.

pdf_icon_no_lock_150pxPassword-protect the PDFs them­selves? Might work, but PDF pass­words aren’t per­fectly secure, and how would we com­mu­ni­cate the pass­word with the recip­i­ent? If sent in the e-mail along with the link… well, that’s not much secu­rity at all, is it? And any­thing rel­a­tively obvi­ous (stu­dent num­ber, birth year, etc.) would be read­ily guess­able by oth­ers, assum­ing they had also received one and knew the file’s pass­word was their stu­dent num­ber, birth year, etc. And this approach wouldn’t solve the prob­lem of know­ing whether another stu­dent made the Dean’s list merely by the exis­tence of the PDF. No, password-protecting the files won’t work… we need some­thing else.

UWnetIDThe solu­tion to this issue seemed clear: require the stu­dents to log in. Luck­ily, the Uni­ver­sity of Wash­ing­ton has for years now issued NetIDs to all stu­dents. Each unique UW NetID serves as both the stu­dent, staff, or fac­ulty member’s UW-provided e-mail address (when appended with @u.washington.edu) and their authen­ti­ca­tion token for all uni­ver­sity sys­tems. Since it grants access to per­sonal records, course sched­ul­ing, e-mail, and so much else, stu­dents are good about keep­ing their NetID pass­words to them­selves. Yes, NetIDs were a per­fect solu­tion, but how to imple­ment it such that only the intended recip­i­ent can log in and see their letter?

Restrict­ing access to a user (or group of users) is easy with UW NetIDs and .htac­cess files. An .htac­cess file (found on Unix-based sys­tems like the UW’s) is a set of direc­tives that con­trol access to a direc­tory, a file, or even set of files that meet cer­tain cri­te­ria. It’s a sophis­ti­cated yet sim­ple sys­tem that would do the trick: restrict access to a file to spe­cific UW NetID. The only remain­ing issue: set­ting up .htaccess-based con­trols for the files. Remember, we’re deal­ing with many thou­sands of Dean’s let­ters each quar­ter, far too many to create manually.

Once again, PHP comes to the res­cue. My script already opened and read in a CSV data file con­tain­ing the stu­dent recip­i­ents’ infor­ma­tion, and one of the fields in each of those records was the student’s UW NetID. I real­ized I had every­thing I needed to solve the puz­zle! As the script looped through each record in the data file to cre­ate and queue up an e-mail mes­sage to the stu­dent, I could open and write another line to an exist­ing .htac­cess file, like this:

$handle = fopen( $dir . "/.htaccess", "a+");
$htaccessCode = "" .
  "require user " . $studentNetID .
  ""
$writeSuccess = fwrite( $handle, $htaccessCode );

For security’s sake, I won’t explain what each vari­able is (they’re des­ig­nated by the $) but you can see that for each stu­dent, a new Files­Match direc­tive is added to the directory’s .htac­cess file. It spec­i­fies a unique iden­ti­fier (that’s the name of the PDF file) and that access to that file is lim­ited to only one user: the recip­i­ent stu­dent, as iden­ti­fied by their UW NetID.

Sure, this setup results in a sin­gle .htac­cess file con­sist­ing of thou­sands of lines, but it’s still only a few hun­dred kilo­bytes in size and it causes no notice­able per­for­mance hit. And it allows me to gen­er­ate a unique URL for each stu­dent (as shown above) that, when clicked, requests the spe­cific PDF file from the server. Thanks to the .htac­cess file’s entry for that file, the user is prompted for their UW NetID and (pre­sum­ably quite secret) password.

It’s a great solu­tion. First, it saves us hav­ing to send thou­sands of ~80kb e-mails with error-prone attach­ments. Sec­ond, it keeps all stu­dent edu­ca­tional records secure on UW servers—no sent to Hot­mail or other third-party e-mail sys­tems. Third, access is lim­ited to just the intended recip­i­ent and pro­tected by an exist­ing, secure pass­word. But there’s one thing this solu­tion doesn’t yet do: close the “file exis­tence” loophole.

Sup­pose a mis­chie­vous recip­i­ent decided to check whether his friend got a let­ter by enter­ing in her stu­dent num­ber in place of his own in the URL. If he was greeted with a UW NetID log-in prompt, he could pre­sume she did receive a letter—he just couldn’t view it. If he saw the reg­u­lar 404 “file not found” error, he’d know she didn’t make the Dean’s list because no let­ter existed for her. Yes, he’d have to know her stu­dent num­ber but they are not all that secret. And yes, it’s not the worst secu­rity breach, but its still a breach. Clos­ing this secu­rity loop­hole wasn’t hard: just another entry in the .htac­cess file, like this:


  require user xxx

This entry relies on the web site’s han­dling of 404 errors, which is to direct the user to a spe­cific page when­ever a requested file is not found. By requir­ing a spe­cific user (here shown as xxx, though in real­ity a valid staff person’s UW NetID is used), even when a non-existent Dean’s let­ter is requested—actually, any non-existent file, like blahbjhfadf.html, etc.—the user is still prompted for a UW NetID. It’s a slick way to close the “file exis­tence” loop­hole: it’s impos­si­ble to deter­mine whether a given stu­dent num­ber received a Dean’s let­ter because every file request within that direc­tory prompts for a log in.

Sum­mary

This was a long post, but the process of ana­lyz­ing a set of busi­ness needs, try­ing dif­fer­ent approaches to meet them, and out­lin­ing the best tools (PHP func­tions, .htac­cess files, and some cre­ative think­ing) is worth it. The result­ing web­site resolves the issues and allows the UW to inform stu­dents of their Dean’s list sta­tus in a secure man­ner while sav­ing many thou­sands of dol­lars in print­ing and postage cost. I’d call that a suc­cess­ful project!

Tricking an HTML form to POST with unselected radio inputs

10 Dec

Tricking an HTML form to POST with unselected radio inputs

simple-radio-formThis web-development sit­u­a­tion con­fused me: why wasn’t my sim­ple form, con­sist­ing of a sin­gle radio but­ton set, throw­ing the expected error when sub­mit­ted with­out a selec­tion? Spoiler alert: HTML forms don’t include val­ues for uns­e­lected radio but­tons (or check­boxes, it turns out).

The prob­lem

I stum­bled across this issue when, as part of a larger web appli­ca­tion I was build­ing, I cre­ated the first step in the process using my stan­dard form set-up. PHP is my pre­ferred devel­op­ment envi­ron­ment and I’ll typ­i­cally re-use known good code when start­ing a new project. For forms, I start with a form that sub­mits to itself via POST:

 $variableName = (array_key_exists("inputName", $_POST)) ? $_POST["inputName"] : ""; 

and then pro­cess­ing the input via PHP before the rest of the page con­tent, like this:

<form action="< ?= $PHP_SELF ?>" method="post">
// form stuff
</form>

Thus the con­tents of the input (or and empty string “” if the input was not set) is stored in $vari­able­Name. Then I can per­form server-side form val­i­da­tion such as ensur­ing required fields are set, e-mail addresses are valid, and dis­play­ing error mes­sages back to the user, in PHP code that’s only exe­cuted when the form has been sub­mit­ted. I deter­mine that by the exis­tence of PHP’s auto­matic cre­ation of an array stor­ing post values:

if( $_POST) {
// check the user's inputs, create error messages, process form, etc.
}
// other PHP and HTML displayed the first time, before any form submission

This approach has always worked, so I was con­fused when the sim­ple form shown here failed to dis­play the “Please chose an option” mes­sage I expected when sub­mit­ted with­out a choice being made. After a lot of trial and error, I dis­cov­ered that because no value is selected for either radio but­ton in the browser and because it’s the only input in the form, PHP does not cre­ate the $_POST array. There­fore, my if state­ment fails and none of the user mes­sages are gen­er­ated and the form does noth­ing. Tech­ni­cally, it’s not hor­ri­ble; the user sim­ply has to real­ize they didn’t click any­thing and resub­mit. But from a user-experience stand­point, that’s a big fail.

The solu­tion

There are (at least) two ways to solve this:

  • Hid­den input — Set­ting a hid­den input with some arbi­trary value forces the cre­ation of the $_POST array. Also, the same set­ting the name attribute the same as the radio but­tons’ name ensures that assign­ing that name to a vari­able will result in a value for that vari­able. And then you can check whether the vari­able has the hid­den input’s value. If so, the user didn’t select one of the radio buttons.
  • Check­ing for the exis­tence of the spe­cific name — Rather than rely­ing on the exis­tence of the $_POST array, PHP’s isset() func­tion can deter­mine if a value for the radio but­tons was selected:
    if( isset( $_POST["inputName"] ) ) {
    	// a radio button choice was made
    }
    

What I don’t like about the sec­ond approach is it’s not as clean as the hid­den input because it requires another func­tion call. That sim­ple call is hardly a per­for­mance hit, but why add com­plex­ity if you can avoid it? And check­ing for a spe­cific ele­ment in the array lacks the ele­gance of just using if( $_POST ). Check­ing for the exis­tence of the spe­cific vari­able name in $_POST does work to per­form server-side val­i­da­tion on those radio but­tons, though. But even in this case I pre­fer the first approach because col­lect­ing the vari­ables as described above will work (the hid­den input will pro­vide the value) even if no radio but­ton selec­tion was made. Slick.