Chris Cohen's blog: 2008

Tuesday, 9 December 2008

Drupal 6: Core hacking for the simplest of things

I was recently faced with a problem concerning changing something in Drupal's core user module. This wasn't even a big change; it was as simple as changing the line of text that appears underneath the username field on the registration form. You know, the one that helps the user to choose an appropriate username.

The core has no option for changing this text, so the most obvious and worst thing possible is to open up user.module in the core and edit the string. This is bad because the first rule of Drupal club is do not hack the core. Actually, the second rule of Drupal club is... you get the idea. It would mean that when Drupal is upgraded, the change will disappear.

After ruling out the above method, I was faced with a number of opportunities to accomplish what I needed, but they all came with downsides.

Override the core

This involves making a copy of the core user module inside my site's modules folder. Drupal will then use the copy instead of the one in the core and changes can be made safely to this copy, because it is not going to be upgraded when a new version of Drupal is released, unlike the core itself.

My main concern here is that technically it's a core hack in disguise. In my case, I would be making a copy of Drupal 6.6's user module and then changing it. But, what happens when Drupal gets upgraded to 6.7? The core user module will be 6.7, but the site won't be using it. Any new or changed functionality in the user module will be missing from the site, and worse, this could introduce incompatibilities.

Suppose something has changed in 6.7's user module, and the other core modules expect this change to be in place? This would break Drupal, because it's not expecting to encounter the 6.6 user module here. How likely is this to happen? I'd say very unlikely in an upgrade from 6.6 to 6.7, but still possible. More than that, it's far more likely to happen once 6.7 becomes 6.8, 6.9 and so forth.

Locale

Drupal's translation system makes it possible to provide a translation string for the text underneath the username field, as though it were being translated into French or Italian, but this feels wrong. It's not a ‘real’ translation because we just want to change the string itself, not change it into another language.

String Overrides

There exists a module called string overrides. It provides a way of changing any text that is passed through the t() function, by saving these translations in the variables table. This seemed like an ideal solution at first, but on closer examination, there are a few drawbacks.

Firstly, since the overrides are stored in the database, we lose version control, which is never a good thing. Secondly, the more modules you add to a site, the more overhead there is, both in terms of memory and in terms of processing times.

Custom module

It would be possible to write a custom module specifically for the site in question. The module would mimic the functionality of the string overrides module, but store the overrides in a file instead of the database, thereby providing version control. The only slight downside to this is that it requires a lot more time than the other approaches. It seems like extreme overkill to have to write an entire module simply to change one string.

To conclude

In the end we decided to go with the core override method. I am not entirely happy with this, but I am sure I would not have been entirely happy with any of the solutions I've presented here. I can acknowledge and understand that there are many compromises to be made on the way to a completely custom Drupal site, but there is always room for improving Drupal to accommodate many of the more common requests. The questions are: is this one of the more common requests? and: what is the best way forward here?

Wednesday, 12 November 2008

Drupal: Pitfalls when converting CCK field modules from Drupal 5 to Drupal 6

I recently needed to convert a CCK field module from Drupal 5 to Drupal 6. There is a lot to take in, but I was faced with a particular problem:

warning: array_shift() [function.array-shift]: The argument should be an array in D:\wamp\www\drupal6\includes\form.inc on line 1320.

Oh dear. It's always difficult to debug this kind of thing because the problem lies in the code that called the function on line 1320 of form.inc, rather than there being a problem with form.inc. I used debug_backtrace() to see what parameters the previous functions in the call stack were using, and noticed that only the first two parameters in _form_set_value() were populated; the other two were NULL. This was the immediate source of the error. The following line was failing because $parents was NULL:

$parent = array_shift($parents);

Obviously you can't array_shift() NULL. I then went back further in the backtrace, to the form_set_value() function (note the function is not preceded by an underscore, like the last one). In this function, the second parameter ($value) was NULL. This was causing the NULL in the _form_set_value() function.

The solution

It turns out that I was using this code to handle the form widget's processing in my CCK field module:

/**
* Process the mcimage element.
*/
function mcimage_mcimage_process($element, $edit, $form_state, $form) {
 $field_name = $element['#field_name'];
 $field = $form['#field_info'][$field_name];
 $field_key = $element['#columns'][0];
 $value = isset($element['#value'][$field_key]) ? $element['#value'][$field_key] : '';

 $element[$field_key] = array(
   '#type'           => 'hidden',
   '#default_value'  => $value,
   // The following values were set by the content module and need
   // to be passed down to the nested element.
   '#title' => $element['#title'],
   '#description' => $element['#description'],
   '#required' => $element['#required'],
   '#field_name' => $element['#field_name'],
   '#type_name' => $element['#type_name'],
   '#delta' => $element['#delta'],
   '#columns' => $element['#columns'],
 );
}

My mistake? The function does not return anything! It was as simple as adding a return statement at the end of the function, so that the form element could be processed correctly:

/**
 * Process the mcimage element.
 */
function mcimage_mcimage_process($element, $edit, $form_state, $form) {
  $field_name = $element['#field_name'];
  $field = $form['#field_info'][$field_name];
  $field_key = $element['#columns'][0];
  $value = isset($element['#value'][$field_key]) ? $element['#value'][$field_key] : '';
  
  $element[$field_key] = array(
    '#type'           => 'hidden',
    '#default_value'  => $value,
    // The following values were set by the content module and need
    // to be passed down to the nested element.
    '#title' => $element['#title'],
    '#description' => $element['#description'],
    '#required' => $element['#required'],
    '#field_name' => $element['#field_name'],
    '#type_name' => $element['#type_name'],
    '#delta' => $element['#delta'],
    '#columns' => $element['#columns'],
  );
  
  return $element;
}

Thursday, 6 November 2008

Drupal: Creating the perfect production environment

For small-time developments, it's simple: buy an account with a Drupal hosting provider, set up your single install, then create the site. This is a live server setup and is probably the simplest way to do things. While this works fine for simple sites that can be thrown together in hours, what is the best way to create an environment in which many sites can be independently designed, created, tested and hosted, while keeping track of large-scale developments? I want to share the method I use. It's not perfect by any means, but hopefully it will be useful to someone, and maybe you'll let me in on how you do things, too.

Version Control

Version control is essential for medium-scale projects upwards and projects where multiple individuals collaborate. Most people have experienced a situation where hours have been spent making a particular change, only to find that it's a mistake, and things need to be changed back to the way they were. A living nightmare under normal circumstances; a two-minute job with version control.

Also, I think a significant number of people are familiar with a situation where multiple individuals have edited the same file, overwriting one another's changes and generally becoming frustrated at the lost time and effort. Difficult to avoid under normal circumstances; a breeze with version control.

I use Subversion for version control, and use the excellent TortoiseSVN for Windows as a client. There is a learning curve involved: for the most part, terms like checkout, merge, branch, and tag will be less than obvious at first, but the ability to easily go back to any version of the file you're working on is soon indispensible and I often wonder how I ever did without it. I wish more things could come with version control. Maybe I've decided I don't like the redecoration I've done in the kitchen. It would be great if I could just revert it at the touch of a button!

Beyond this, version control allows you to keep logs for each commit (each time something is changed) so you know why things were altered, and even to blame each line in the file on the person who committed it, so when you see that line 134 in style.css is messing up your entire page layout, you know who wrote it, when it was written, and why.

The Subversion server I use is its own entity: it is an Ubuntu box that exists purely to serve version-controlled files from the repositories. The repositories themselves are not even on the Subversion server; instead, they're on a NAS (network-attached storage) device, so if the Subversion server dies, the repositories can be retrieved and used with a replacement server. Of course, the repositories are backed up too, because if the NAS device fails and there is no backup, it would be catastrophic.

Individual Development

Each individual working on Drupal projects has a WAMP server on his or her local machine. Drupal is installed here and runs under Apache 2. One important decision here is that there is only a single Drupal installation running multiple sites. These sites reside in subdirectories in the sites folder, and each site has its own repository.

Site-specific modules are installed in the site's own modules folder, whereas modules that will be used on all or most sites (such as cck, webform, etc) are kept in the sites/all/modules folder, which means there is just one codebase for each of these modules. This folder is kept in its own repository, so when modules are updated, changes are committed and a tag is created.

This means that only one member of the production team handles module updates. The others simply update their working copy of the sites/all/modules repository and everything is up to date.

Potential Problems with a Shared Environment

The alternative to having developers work locally on their own machines is to have a shared development server with something like a Samba share that the production team use to access and edit the files. The main drawback with this approach is that it's all too easy to have two individuals editing the same file at the same time, potentially overwriting one another's changes. The style.css and template.php files in Drupal themes are especially vulnerable.

Using local copies and version control all but eliminates this kind of thing. Let's say two people have edited style.css. Person A makes her changes, updates her working copy (this should always be done before committing) and sees that there are no changes, so she commits. Person B makes his changes, and updates his working copy. Assuming that A and B have edited different lines in the file, person B will see that person A's changes have been merged into his changes, so when he commits, the combined efforts of both will be saved.

If A and B have edited the same lines in the file, when person B updates his copy, he will see that there has been a conflict, but will be offered both versions of the file and shown the exact lines that are conflicting. He can then select which side of the conflict ‘wins’, or if neither will be correct, he can alter the conflicting line to incorporate both edits. Nothing will get overwritten unless B wants it to be.

Development and Testing Server

When changes are ready to show to a client, a dedicated development and testing server is used. The working copy on this system is simply updated, and the client is given a special development URL on which to view the work that has been carried out.

Shared Development Database

When I said that the production team work locally on their own machines using WAMP, this is true for the site's files, but not for the database. It's not possible to keep a database under version control, so consider this:

Person A is working locally with a local database and creates a node. There were 10 nodes already, so this is node 11. Person B is working locally with a local database too, and is working on a different aspect of the site. B creates a node too, but in B's database, this is node 11. Therefore, node 11 exists in two places, and is actually two separate nodes. Now imagine this kind of thing happening quite often as various team members work on different bits of the site.

When it comes time to merge all these databases together to create a final testing version of the site, all hell breaks loose. It simply isn't possible to automatically handle situations where the same ID in two different databases refers to two different things.

For this reason, I use a MySQL server on the development and testing system as the ‘central’ development system. Each individual is running his or her own copy of Drupal and has his or her own copy of the files, but utilises the same database. That way, when person A adds a new node, view, user or whatever else, it is instantly visible on person B's copy of the site too.

There are problems with this approach, for sure. Suppose person A has created a view with some templates. The view exists in the database, so it is visible on all of the production team's copies. Person A has not yet committed the templates to the Subversion repository, so they don't exist on person B's copy of the site. The view will almost certainly look very odd to person B until the templates are committed.

Another situation can arise where modules differ. Suppose person A is using a special module on the site to import some content. Once it's in the database, the module can be removed, and nobody else on the production team needs to do anything, because they're all sharing the same database. The module implements hook_menu() to provide an import page where settings can be chosen. When the module is installed on person A's copy of the site, the menu cache is updated to include the path to this import page.

All is well so far, but suppose that person B clears the site's cache for some reason. The import module does not exist on B's copy of the site, so the regenerated menu cache will not include the path to the import page any more. Person A will suddenly find that the menu option for the import page has disappeared! Luckily this is easily resolved (person A simply needs to clear the cache again) but this shows the type of unexpected event that might arise as a result of sharing a Drupal database across multiple copies of the site.

With all that in mind, the benefits of sharing a single database far outweigh the drawbacks, so this is my preferred method, rather than having multiple copies of the database and attempting to merge them together.

Multisite Hostname Problems

One big step I had to take in getting to grips with the multisite approach combined with each member of the production team running their own copy of the site was overcoming the problems associated with Drupal's handling of sites in a multisite environment.

When using a single Drupal install and single copy of Apache for multiple sites, all of Apache's virtual hosts use the Drupal install as their document root, and Drupal selects the appropriate site (and therefore the database, via settings.php) by examining the hostname.

For example, suppose we want to run two sites on the same copy of Drupal: www.foo.com and www.bar.com. In Drupal's sites folder, we create one folder called foo.com and one called bar.com. When a user's browser makes a request for www.foo.com, Apache handles this request, serves the document from the Drupal install folder (with Drupal, it's almost always index.php), and Drupal knows that because the request asked for www.foo.com, it should use the foo.com site and not the bar.com one.

If we apply this to our development environment, our main development and testing server (example.com, for argument's sake) can be accessed from either foo.example.com or bar.example.com, to see each site's development and testing copy. Great, but how does person A access her local copy of the site? She could add an entry to her hosts file to redirect foo.example.com to 127.0.0.1, but then how would she see the copy on the development and testing server? She would have to remove or comment the hosts file entry, which will get confusing, because it will be easy to forget which version of the site she is looking at during any given moment.

Luckily, there's a better way. The folder sites/foo.example.com exists on the development and testing server and also on person A's local copy. Person A alters the Apache virtual host entry for her local machine from foo.example.com to local.foo.example.com. She then adds an entry into the hosts file, directing local.foo.example.com to 127.0.0.1. Now, she can access her local copy by putting local.foo.example.com into her browser, and the development and testing copy by using foo.example.com, all the while keeping the structure of the sites folder the same in both places.

Keeping up with Site Files

Quite often, clients will want to populate their new sites with stories and articles before the site launches. Sometimes, I will want to give them this privilege before all areas of the site are finished. A situation arises where the client has uploaded files such as thumbnails or PDFs. These exist on the development and testing server, but not on any of the local copies used by the production team.

Luckily, SyncToy can help with this. Simply create a folder pair that echoes the development and testing server's files to the local system. Each time any member of the production team notices that their local copy is missing some thumbnails or other files, he or she can just synchronise the folder pair to receive all of the latest files. Alternatively, a schedule can be created to do this automatically at regular intervals.

Of course, this will not help in situations where the production team has uploaded files to their local machines that need copying to the development and testing server, but this is probably going to happen much less frequently, and there is always the option of synchronising both ways for this eventuality.

Live Server Setup

The live sites run on a single Drupal install on a dedicated Apache server, while the live databases run on a separate system that does nothing but run databases (and has MySQL in dedicated mode for best performance). I would always recommend separating these functions out on medium loads upwards, so that Apache does not slow MySQL and MySQL does not slow Apache.

I have not seen any viable clustering solutions for hosting Drupal sites, but luckily I am in a position where all the sites can be run from one web server and one database server. Even so, I would be interested to hear about any successful clusters out there.

Amending the Live Site

Unfortunately, making amendments to the live site is probably the biggest hurdle left in this particular production environment. If there is a rogue CSS class, that's fine. The change is made on a production team member's local system, and tested. The change is committed, then the development and testing server is updated, so that it receives this change too. The change is tested there. Assuming it is ready to be made live, the live server is updated and everyone is happy.

The real problem comes when, for example, a new node needs to be added. Suppose that the client has asked for a new webform. First, the live database is copied down to the development and testing server. A member of the production team produces the webform on her local system. This shares the database with the development and testing server, so unless template or CSS changes are required, the job is done and it can be tested.

The problem is that suppose there were 100 nodes before the work began. On the development and testing database, the webform is node 101. Now suppose that in the time taken to create and test the webform, 5 nodes have been created on the live site. The development and testing database cannot simply be copied back over, because this would wipe those 5 nodes!

One solution is to put the live site into maintenance mode while the webform is created. This way, the database will be locked off to the general public, so it can be copied down to development and testing, worked upon, then copied back up without the risk of wiping content.

This might be fine for a 5-minute job, but what if the work requires 4 hours? The site cannot very well be put into maintenance mode for 4 hours during the working day just so a small change can be made!

The answer at the moment seems to be to write everything down. Write down the settings you use to create the new webform. Once it has been tested and is ready to go live, just create the webform again on the live server. This is not sophisticated or classy in any way, but appears to be the best option.

Your Input

I would be interested to hear thoughts on the way this particular production environment is set up and how it might be improved, especially where I have noted problems. I'd also love to hear how you do things and why, because I think I learn a lot from other people's experiences.

Drupal 5: user_save and profile fields

I was recently required to import a large number of users into a Drupal 5 site, so I wrote a simple import module to take rows from a CSV file and pass them to user_save(). In addition to the basic user information in the {users} table, I needed to create several profile fields too. This was incredibly complicated, but probably shouldn't have been.

The first thing I noticed is that the documentation for user_save() is not exactly stellar.

$account The $user object for the user to modify or add. If $user->uid is omitted, a new user will be added.

Fine, but is this where I should be putting my new user's information, or perhaps I should use the next parameter?

$array An array of fields and values to save. For example array('name' => 'My name'); Setting a field to NULL deletes it from the data column.

Ok, fine. Maybe I should use this one for adding my new user data. This doesn't mention anything about the profile fields though, or explain what I should be doing with the $account parameter. Maybe there's something else?

$category (optional) The category for storing profile information in.

What? Category? Now I'm really confused. There's nothing obvious in user_save() that suggests how the profile fields get saved, or even where to put the profile fields. My only real clue is the call to user_module_invoke() towards the end of the function. This calls hook_user() in all the active modules on the site, and the one I'm interested in is the profile module, so my next stop was profile_user(). In turn, this calls profile_save_profile() with the details from the original call to user_save().

It was at this stage that I noticed that $category must refer to the various groups that profile information can be put in. For example, you can create a category for personal information, and one for notification preferences, and doing so will split the fields onto different tabs when the user edits his or her profile. Unfortunately, $category is a string, not an array, so for each call to profile_save_profile(), only one category can be changed.

Because profile_save_profile() is only called once per user_save(), it appears that when creating a user, it is only possible to create profile fields in one group! This causes a problem for me because I needed to import lots of profile fields in several groups.

My solution was to temporarily move all the profile fields into a single group. Once I had done that, I could populate $array with the information destined for the {users} table and the profile fields (this was not documented anywhere). It turns out I could just use NULL for $account (again, this was not documented).

Surely this is not the ideal way of creating new users programatically. My solution does work, but it is annoying and time-consuming. Is there another way to create users with profile fields in multiple categories?

Tuesday, 4 November 2008

Drupal 5: Don't rely on the node's path attribute

I recently noticed that some of the nodes on a site I was working on were not linking properly from the teaser to the full node view. It turns out that the hyperlink tag's href was empty. Checking the template for the node teaser, I found that it was using this:


print l($node->title, $node->path);

This appears to work fine for nodes with a path alias set up (via the path or pathauto modules), but not for other nodes, because the $node->path part was empty. I believe the following code should have been used, to always start with the base URL for the node, and let the l() function choose the most appropriate alias:


print l($node->title, 'node/' . $node->nid);

Tuesday, 28 October 2008

Drupal: Broken RSS feeds

In Drupal 5, the path www.example.com/rss.xml will produce an RSS feed of the most recent nodes (the number of items can be customised in the RSS publishing options; the default is ten), and other core modules provide feeds, such as taxonomy, via www.example.com/taxonomy/term/1/0/feed. I had a problem recently where this functionality appeared to stop working. RSS feeds would not be displayed, and a ‘page not found᾿ (404) error would appear instead.

At first, I suspected a module might have been to blame. After all, the RSS functionality worked on a simple site, but not the one I was working on, which had been customised with lots of modules. I disabled all the modules that could have possibly affected the RSS feeds, but this did not help at all.

It turns out that URL aliasing was to blame. I had inserted some rows manually into the {url_alias} table, but I had put the values into the wrong columns. The two columns in question were src and dst. The dst column is the column containing the URL the user goes to, and the src column is the one that it is aliases to, ie the ‘real’ URL. I had put the values into these columns the wrong way around, so when requesting the RSS feed's URL, Drupal was redirecting to a gibberish URL that did not exist!

Friday, 24 October 2008

Drupal: Allowing users to edit roles

In Drupal 5, I needed some way of giving users with a specific role permission to set the roles of others. It turns out that by default, the administrator is the only one who can assign roles to users. I also found the user_selectable_roles module by Bacteria Man, which allows users to assign themselves roles, but this was not quite what I wanted.

After some digging around in the core, specifically user.module, I found that when the user edit page is displayed, the system checks whether the user (the user who is seeing the edit page, not the user being edited) has permission to administer access control, and if so, grants them the ability to edit roles.

It turns out that in order to edit roles, the user must be given privileges to edit the access rights of all the roles, which is not really the same thing! In my opinion, there should be sufficient granularity between assigning roles to users and deciding what permissions those roles have. Perhaps this is something for a future Drupal release? It might even exist in Drupal 6 or Drupal 7, but not having played with them yet, I don't know.

Wednesday, 22 October 2008

Drupal: Search indexing manually imported nodes

Recently I was required to import about 7000 nodes into Drupal 5. I was not entirely sure how to achieve this, but I decided to use SQL scripting. As it turns out, a more proper method would have been to use a PHP script to build nodes and use node_save() to accomplish the import.

That aside, I ran into a problem where none of the manually imported nodes were appearing in search results. Nodes created within Drupal worked fine, appearing in search results as expected. My manually imported nodes worked fine on the site itself: by nagivating to node/1234 it was possible to view the node as expected, and they worked fine with Views, but there was nothing in the search results.

My first stop was to tell Drupal to regenerate the search index, by visiting admin/settings/search and using the functionality there. The correct number of nodes was reported on this page, but after clicking the reindex button and running cron.php, it reported that 97% of the nodes had been indexed. This was far too short a time for this to happen, and a quick search showed that the manually imported nodes were still not appearing in search results.

I began to speculate that because the manually imported nodes were last modified (the changed column in the database) before the last time that the indexer was run, they weren't being ‘seen’ by the indexer, since it only pays attention to things that have changed since last time.

I did try backing up the {node} table, setting the changed timestamp for all nodes to something after the last time the indexer was run, then reindexing everything again, but this seemed to make no difference. If I didn't know better, I would have said that Drupal was taking one look at the sheer amount of stuff it was going to have to index, and freaking out!

In the end, my solution was to create a simple module that took advantage of the _node_index_node() function from Drupal 6. This seemed to work with no compatibility issues (I was using Drupal 5 for this), and my simple module checked to see if the node's nid existed in the {search_dataset} table (as a sid, not a nid), and if it didn't, called _node_index_node() to forcibly index it. It was a very slow process but it appears to have done the trick.

I suppose the question I am left with after this project is: why didn't Drupal reindex these nodes? Was my suspicion about the indexer not ‘seeing’ them correct? Should one always import nodes using PHP and Drupal's node_save()? While my method works, it is cumbersome and I would always seek the most efficient way of doing this, so please leave a comment if you can elaborate on this!

Download the completed module (2K).

Chris Cohen's blog