Chris Cohen's blog: 2009

Tuesday, 30 June 2009

Drupal for Publishers, London, 30th June 2009

I've just returned from the Drupal for Publishers event held at Sun in London. 100 attendees, including a good mix of Drupalites and potential Drupal users, were presented with a series of talks on a range of issues relating to using Drupal to create websites (and other solutions) for newspapers and magazines.

Sun were great hosts and despite the event being free to attend, a lavish array of sandwiches, fruit and cake was laid on, washed down with a selection of tea, coffee and fruit juice. The venue was great too, with a very large, clear projector, and a nice cool room in spite of the scorching weather in the City.

For me in particular, the most useful talk of the day was Stewart Robinson's on some of the development and management techniques being used by the team redeveloping the Economist in Drupal. It was good to discover that even on large-scale projects, the usual Drupal problems can rear their ugly heads (everything is in the database, like views, so making testing changes live can be problematic), and it was good to hear about some of the proposed solutions, such as putting everything in the code (views, content types, CCK field definitions, etc) which is then kept under version control, and using a contributed module to generate the database structure or records from the code.

For a lot of smaller Drupal developments, corners are cut to save time or money, but the Economist's approach obviously has to be extremely comprehensive, including full unit testing and browser testing, but these words can often be thrown about very conceptually. Stewart presented some very tangible methods of implementing these (simpletest and Selenium, in this case), and these are valuable insights that are often found only occasionally while scouring the personal blogs of Drupal developers.

The other talks were entertaining but most were of less value to me personally, serving as an introduction to Drupal and what its capabilities are in the online publishing world. No doubt a large proportion of the audience would have found this relevant, and I think it was good to provide a little bit that everyone could take away with them.

The penultimate talk was on the development of a solution for IPC Media. Although it was a Drupal solution, it showcased how, in this case, Drupal was being used for its administration and data entry interface, completely ignoring the front-end side of it. Although it was interesting to learn that this kind of thing can be achieved, the audience was then shown custom modules that could provide a graphical image uploader, selector and cropper, but told that these modules were not available to the wider community.

Unfortunately, I found this was like having to watch a man guzzle a nice cold beer after I've just crossed a desert on foot. These are certainly modules that would be very useful additions to the Drupal community, and it seems odd to bring to light the existence of such modules at an open source event and then snatch them away again behind lock and key.

Overall, I'd like to thank and commend Sun for hosting such a useful event and the individual speakers and organisers for their hard work. I think smaller events like this, as well as the twice-yearly DrupalCon, are key to the continuity and expansion of the Drupal community.

Wednesday, 10 June 2009

Drupal 5: Automatically assign a role on user profile edit

Using Drupal 5, I recently had cause to create a system whereby when a user updates checkboxes in his or her profile, roles would automatically be assigned or unassigned. No problem, I thought, I would just use hook_user() to achieve this. According to the API, I would need the two $ops insert and update.

Writing the one for update was easy. The $account contained all the user's profile fields (the ones starting with profile_) and the roles could be assigned based on these (1 for add role, 0 for delete role).

I ran into a problem with insert. The $account contained the keys for the profile fields, but the values were all blank! I still don't really know why this is. The solution, as cumbersome as it might be, is to wait until the new user has a uid, then call user_load() on the user, at which point the profile fields will have their proper values. Then, exactly the same method can be used as in the update case.

As a footnote, we don't actively develop in Drupal 5 any more; all of our development occurs in Drupal 6, but we still support Drupal 5 sites. Here is the finished code in case anyone finds it useful:


/**
 * Implementation of hook_user().
 */
function mymodule_autorole_user($op, &$edit, &$account, $category = NULL) {
  if ($op == 'insert') {
    mymodule_autorole_apply_roles($account->uid);
  }
  else if ($op == 'update') {
    mymodule_autorole_apply_roles($account->uid);
  }
}

/**
 * Takes a user account object and uses it to update the user's roles.
 * 
 * @param $uid
 *    A fully populated user account object such as one returned by user_load().
 */
function mymodule_autorole_apply_roles($uid) {
  $account = user_load(array('uid' => $uid));
  
  // Filter out the profile fields from the account information.
  $profile_fields = array();
  
  foreach ($account as $fieldname => $field) {
    // Split up the field name by the underscore character. The field names we
    // are looking for are named like profile_something, but they could be
    // profile_something_something, so join back together after the split.
    $pieces = explode('_', $fieldname);
    
    if (array_shift($pieces) == 'profile') {
      $profile_fields[implode('_', $pieces) . ' club member'] = $field;
      
    }
  }
  
  $myaccount = user_load(array('uid' => $account->uid));
  $roles = user_roles();
  
  foreach ($profile_fields as $field => $value) {
    if ($value) {
      // The checkbox was checked, or the textfield had something in it. Add a
      // new role corresponding to this, if there is one.
      foreach ($roles as $key => $role) {
        if ($role == $field) {
          $myaccount->roles[$key] = $role;
        }
      }
    }
    else {
      // The checkbox was unchecked, or the textfield was empty. Unset the user
      // role corresponding to this, if there is one.
      foreach ($roles as $key => $role) {
        if ($role == $field) {
          unset($myaccount->roles[$key]);
        }
      }
    }
  }
  
  // Update the user with the new role assignments.
  db_query("DELETE FROM {users_roles} WHERE uid = %d", $myaccount->uid);
  
  foreach ($myaccount->roles as $rid => $role) {
    db_query("INSERT INTO {users_roles} (uid, rid) VALUES (%d, %d)", $myaccount->uid, $rid);
  }
}

Wednesday, 20 May 2009

Drupal 5: Problems with unserialize in bootstrap.inc

Today I had a problem where Drupal 5 kept reporting a problem with unserialize in bootstrap.inc on line 428.

On closer inspection, bootstrap.inc, which is in the includes directory in the root of Drupal 5, contains a number of functions that are used when Drupal 'boots up'. The function in question here was variable_init(), and this is where all variables are drawn from the database. These variables are in serialized form, allowing Drupal to store anything, from objects to arrays, in string format.

If the serialized item got corrupted somehow, it wouldn't be able to unserialize properly in variable_init(), leading to this error. My problem was that I was not able to see which variables were causing the problem; only that the problem existed. With over 200 variables on the site, manually checking each one for valid serialization was not a viable option!

My solution was to change the core bootstrap.inc to print the names of the offending variables, thereby enabling me to find them in the database and fix them. Here's the original snippet from line 427 of bootstrap.inc:


while ($variable = db_fetch_object($result)) {
$variables[$variable->name] = unserialize($variable->value);
}

Here's what I changed it to, temporarily:


while ($variable = db_fetch_object($result)) {
if (($variables[$variable->name] = unserialize($variable->value)) === FALSE) {
  print $variable->name;
}
}

Once I had my variable names, I was able to find them in the database, using phpMyAdmin, and edit them. I found that I had something like s:5:" in there. This had been truncated, and should have been something like s:5:"hello". The first letter indicates that the variable is a string. The number indicated how many characters the string has, and the value of the string is encapsulated within double quotes.

Afterwards, I just changed bootstrap.inc back to the way it was before, and my unserialize problems vanished!

Monday, 16 February 2009

When will we be free from Internet Explorer 6?

Another standards-compliant site completed, another day or so consumed developing non-standard workarounds to make it work in Internet Explorer 6. I found myself asking, “why do I need to spend all this extra time deviating from standards just to accommodate one poorly-designed browser?” The unfortunate truth is always, “because over one third of all the site's visitors will be using it.”

When will IE6 finally die, and what will be the straw that breaks its back? Now that XP is no longer sold with new computers, and Windows Update is installing IE7 even on XP, IE6's market share decreased rapidly for a while, but now seems to have levelled off. Although its user base is decreasing, it's not decreasing at any kind of rate that would make me confident that I can stop supporting it when I develop sites.

For anyone who isn't aware of the issue, this browser, released in 2001, fails to respect countless web standards. Of course, some standards came along after IE6 was developed, but this is just no excuse, because every other browser has been patched to accommodate this.

What's very clear is that in the late 20th or early 21st century, a lot of businesses and corporations developed their own internal computer systems using IE5, 5.5 or 6 as a web interface. For example, a bank might have written a customer relationship system that only works with IE6 as a front end. Back then, standards were far less important to web design (because fewer browsers actively supported them all) and were far less… well… standard.

When IE7 came along, a lot of these corporations found that their internal software, on which they had invested thousands or millions, would no longer work, and other corporations were unwilling to upgrade from Windows NT, due to the cost, and therefore could not install IE7 at all. Many workplaces are stuck with IE6. IE7 is not available and other browsers cannot be installed whatsoever.

It might well cost a lot of money to upgrade from NT to XP, Vista or the forthcoming Windows 7, it's true. However, consider how many (wo)man hours, and therefore how much money, has been wasted unnecessarily on developing sites that work in IE6. Consider Microsoft's own 5-year support policy as the cutoff for IE6, which would be about 2006. In this 3-year period, just how much time could have been saved by developing all websites without IE6 support? This would easily outweigh the cost to businesses and corporations of updating their internal systems or operating systems.

What would it take to finally get rid of IE6? I don't think it would take too much. One large-scale site launches a banner on every page, for IE6 users, declaring that they should upgrade their browser or the site will no longer support it in 3 months. Another follows suit. Pretty soon, enough home users are persuaded to upgrade, and enough employees, disgruntled at not being able to check their eBay auctions or Facebook profile at work, pester their employers into upgrading.

The main problem here is one of competition. Putting a banner on your site claiming that you no longer support a browser used by a third of your customers will no doubt send a large proportion of your customers to another site. No large site is going to want to commit commercial suicide like that, because the amount of money they would lose in the short term would outweigh the amount of money they waste on IE6-specific development in the long term.

Instead, the task should be put to the non-commercial websites. Social sites like digg.com or online tools like Google Documents. The problem here is that while the sites aren't actively selling things, they do benefit from revenue brought in by large amounts of traffic, and certainly wouldn't want to take a hit in the number of visitors. I don't believe that the immediate loss of a proportion of traffic, especially from sites that are visited by those with a higher average level of technical awareness (and therefore less likely to be using IE6), would be that great, so I think it is a feasible idea. Digg will still be popular after dropping IE6. Facebook would still be visited by millions using IE7 and other browsers (and in fact, maybe more work would actually get done at work if people couldn't pointlessly update their status every 5 minutes while there).

Supposing one or more larger sites would actually agree to act for the greater good and commit to the outlawing of IE6, how should it be done? The idea would be to gently encourage the user to upgrade, rather than tell the user how stupid he or she is by not using a modern browser. People will respond far better if they think they are upgrading in order to get more out of their favourite site, rather than if they are insulted into upgrading. With that in mind, it could be beneficial to apply IE6-only drawbacks to using a particular site. Perhaps an eBay listing would have fewer pictures and no AJAXy interface in IE6, or perhaps the BBC would be unable to show more advanced, prettier interface elements. Whatever the case, users should be given a carrot and not a stick.

We need to start somewhere, at some point in time, to resolve to rid the world of IE6 for good. Not because it's the root of all evil, or because Microsoft sucks and should die, but because it's genuinely sucking up millions of hours of extra development time and holding back some really creative development techniques. We've already ganged together on the whole to create some fantastic things on the Internet. Wikipedia is a completely user-created free encyclopedia, as an example. Almost everybody in web design dislikes having to create special styles or rules for IE6, yet we seem to just accept that we're powerless to do anything about it, but that's just not the case. We should be putting on the pressure to get rid of an 8-year-old browsing relic. Are you still using the computer you bought in 2001? Would it even work nowadays, with today's Internet? It's about time we pressed a bit harder for change, and then get really creative with the web.

Friday, 13 February 2009

Drupal on Amazon web hosting

Cloud computing has been around for a while, but only recently have we, the general populace, had access to it. Amazon offer one such manifestation: an environment where it's possible to set up any number of virtual dedicated servers and use them for hosting, in our case, Drupal sites! I wanted to share some of my experiences (both good and bad) using the Amazon cloud, so you can make a better decision about whether it's right for your Drupal sites.

This is attractive compared to paying for a shared hosting account. One can never quite be sure what else is running on the system you're sharing, leading to potential performance woes. The Amazon EC2 (Elastic Compute Cloud), as they call it, appears more attractive than smaller companies offering VPSs (Virtual Private Servers) because of the sheer scale of Amazon. It is unlikely to disappear tomorrow and is backed up by the excellent S3 (Simple Storage Service) for backing up data.

Having said that, Amazon solutions can be costly. At the time of writing, Amazon's most modest offering, a reasonably specced dedicated machine with 1.8GB of memory, goes for 11 cents ($0.11) per hour. This doesn't sound like much, but there are 168 hours in a week, and based on the 4.3-week month, that's 722 hours per month, or about $80 per month. Writing from the UK, with the exchange rates as they are at the time of writing, this is about £55.

Still, the alternative for us was to purchase a new machine and buy some rack space (which is billable in advance, often for several months). Compared to Amazon, who bill for usage at the end of each calendar month, with no initial hardware cost, the choice seemed clear.

Starting out, it's striking how little documentation there is. Concepts like elastic IPs, keypairs and elastic block stores are very alien, even to the average techie, and whilst there is introductory material, it feels incomplete. Since the Amazon system is fairly new, and is quite pioneering in its approach, this is understandable, but doesn't make the task any easier.

One of the biggest surprises is that, at the time of writing, Amazon's own web interface does not allow the management of EU-based instances (virtual machines), despite allowing control over US-based ones. If there's one thing that really winds people up on this side of the Atlantic, it's that Americans (and often Canadians) are given preferential treatment with this type of thing. Nevertheless, we were soon able to locate the excellent ElasticFox firefox extension, which allows management of European instances, and we had our very own fresh copy of Ubuntu installed and running at the touch of a button.

This is incredibly powerful stuff, especially because in theory, you can launch as many virtual servers as you like with the click of a button. In practice, Amazon impose sensible limits (although you can apply for more if you genuinely need them) and after all, you're paying per hour for all these machines. We found Alestic's site very valuable indeed. It allows you to quickly find the right Amazon Machine Image (AMI) to use when starting your system, rather than poring over a huge list. A lot of these systems come pre-installed with all the things you're likely to need. There weren't any with specific Drupal installs, but since this is only a 5-minute job (download, unzip), it wasn't too much of an issue.

When you turn off (terminate) an instance, or the power fails at Amazon (very rare but can happen), the instance disappears completely, and so does its local storage. This is a concept alien to those unaccustomed to a VPS environment. After all, if I turn off my laptop right now, the data will be saved to the hard disk and when I start it up again, it's all there. This is not so with Amazon's hosting. The virtual machine powers down and all data stored locally, including program setting and even the operating system itself, is gone forever.

With this in mind, it's clear that a backup solution is needed. Luckily, there are some decent tools in the Amazon EC2 AMI tools package, which is pre-installed on many of Alestic's images. The idea is that you regularly take an image of the entire machine and copy it to S3 where it can be stored safely, and permanently.

Writing a simple script to do this proved more difficult, however. Firstly, it wasn't clear from the documentation that we needed to explicitly state that we wanted to back up the data to an EU S3 bucket. Without this option, the files were sent to a US bucket, taking a very long time indeed and costing $0.17 per gigabyte (a machine image is usually at least 1GB, if not more). Secondly, the bundling process, as Amazon calls it, is less than reliable. Sometimes it would just bail for no apparent reason. Sometimes it would bundle the image and fail during the upload process, again, for no apparent reason. I personally still don't trust the automated backup script I wrote because of these shortcomings, so I find myself checking manually a lot of the time, which diminishes the value of an automated system.

The next concept that was slightly alien was the Elastic Block Store (EBS), which is a system whereby it's possible to create virtual hard disks and mount them to your instances. This is much better than storing files on the instance itself, because if the instance dies, your data are safe. It's possible to take what Amazon calls snapshots of the volumes, enabling a simple backup system, and again, this process can be automated, but you will need to know your way around a shell script, since this is not a point-and-click affair.

The EBS makes it easy to split data into different volumes (database volume, websites volume, miscellaneous volume, etc). Initially we wanted to run MySQL and Apache on the same low-traffic system to see how good Amazon really was, but we always wanted the ability to migrate MySQL to a dedicated machine at a later date. It's a doddle with Amazon: you can simply unmount the database EBS volume from one machine and mount it to another.

We have used the EBS to store some of our more persistant configuration settings too, such as Apache configuration, Apache log files and configurations for the awesome Nagios monitoring system.

To administer these Amazon systems, shell access is needed at a minimum. Unlike other systems where it's possible to simply connect on port 22, Amazon uses a keypair system. Each instance must be created with a specific keypair, and then a key must be downloaded and used with the terminal application (such as PuTTY) before it will allow you to connect. Terminal is nice and all, but sometimes it's useful to do more with the system, like have multiple terminals open or use a GUI tool (like MySQL Administrator). For this, we set up NX, which is similar to VNC in that it provides an interface to the remote machine's desktop that you can use exactly as though it were your own desktop. We found a Google Groups article by Eric Hammond very useful in setting up NX, and thought it was preferrable to VNC because of the default encryption method and insistance on avoiding the root user.

Performance-wise, our Drupal machine has been running Apache 2, MySQL 5, and a host of monitoring software (Munin, Nagios and AWStats) so that we can keep an eye on things, and it has been running for around a month so far with no outages, crashes or other problems at all. The learning curve is pretty steep and the documentation is fairly sparse, but there is a very active community out there on the AWS forums and places like Google Groups. Overall we are very impressed with Amazon as a Drupal hosting environment and although not entirely convinced at the current time, will be looking towards moving more and more sites over there in the future.

Chris Cohen's blog