Wednesday 12 November 2008

Drupal: Pitfalls when converting CCK field modules from Drupal 5 to Drupal 6

I recently needed to convert a CCK field module from Drupal 5 to Drupal 6. There is a lot to take in, but I was faced with a particular problem:

warning: array_shift() [function.array-shift]: The argument should be an array in D:\wamp\www\drupal6\includes\form.inc on line 1320.

Oh dear. It's always difficult to debug this kind of thing because the problem lies in the code that called the function on line 1320 of form.inc, rather than there being a problem with form.inc. I used debug_backtrace() to see what parameters the previous functions in the call stack were using, and noticed that only the first two parameters in _form_set_value() were populated; the other two were NULL. This was the immediate source of the error. The following line was failing because $parents was NULL:

$parent = array_shift($parents);

Obviously you can't array_shift() NULL. I then went back further in the backtrace, to the form_set_value() function (note the function is not preceded by an underscore, like the last one). In this function, the second parameter ($value) was NULL. This was causing the NULL in the _form_set_value() function.

The solution
It turns out that I was using this code to handle the form widget's processing in my CCK field module:

/**
* Process the mcimage element.
*/
function mcimage_mcimage_process($element, $edit, $form_state, $form) {
$field_name = $element['#field_name'];
$field = $form['#field_info'][$field_name];
$field_key = $element['#columns'][0];
$value = isset($element['#value'][$field_key]) ? $element['#value'][$field_key] : '';

$element[$field_key] = array(
'#type' => 'hidden',
'#default_value' => $value,
// The following values were set by the content module and need
// to be passed down to the nested element.
'#title' => $element['#title'],
'#description' => $element['#description'],
'#required' => $element['#required'],
'#field_name' => $element['#field_name'],
'#type_name' => $element['#type_name'],
'#delta' => $element['#delta'],
'#columns' => $element['#columns'],
);
}
My mistake? The function does not return anything! It was as simple as adding a return statement at the end of the function, so that the form element could be processed correctly:

/**
* Process the mcimage element.
*/
function mcimage_mcimage_process($element, $edit, $form_state, $form) {
$field_name = $element['#field_name'];
$field = $form['#field_info'][$field_name];
$field_key = $element['#columns'][0];
$value = isset($element['#value'][$field_key]) ? $element['#value'][$field_key] : '';

$element[$field_key] = array(
'#type' => 'hidden',
'#default_value' => $value,
// The following values were set by the content module and need
// to be passed down to the nested element.
'#title' => $element['#title'],
'#description' => $element['#description'],
'#required' => $element['#required'],
'#field_name' => $element['#field_name'],
'#type_name' => $element['#type_name'],
'#delta' => $element['#delta'],
'#columns' => $element['#columns'],
);

return $element;
}

Thursday 6 November 2008

Drupal: Creating the perfect production environment

For small-time developments, it's simple: buy an account with a Drupal hosting provider, set up your single install, then create the site. This is a live server setup and is probably the simplest way to do things. While this works fine for simple sites that can be thrown together in hours, what is the best way to create an environment in which many sites can be independently designed, created, tested and hosted, while keeping track of large-scale developments? I want to share the method I use. It's not perfect by any means, but hopefully it will be useful to someone, and maybe you'll let me in on how you do things, too.

Version Control
Version control is essential for medium-scale projects upwards and projects where multiple individuals collaborate. Most people have experienced a situation where hours have been spent making a particular change, only to find that it's a mistake, and things need to be changed back to the way they were. A living nightmare under normal circumstances; a two-minute job with version control.

Also, I think a significant number of people are familiar with a situation where multiple individuals have edited the same file, overwriting one another's changes and generally becoming frustrated at the lost time and effort. Difficult to avoid under normal circumstances; a breeze with version control.

I use Subversion for version control, and use the excellent TortoiseSVN for Windows as a client. There is a learning curve involved: for the most part, terms like checkout, merge, branch, and tag will be less than obvious at first, but the ability to easily go back to any version of the file you're working on is soon indispensible and I often wonder how I ever did without it. I wish more things could come with version control. Maybe I've decided I don't like the redecoration I've done in the kitchen. It would be great if I could just revert it at the touch of a button!

Beyond this, version control allows you to keep logs for each commit (each time something is changed) so you know why things were altered, and even to blame each line in the file on the person who committed it, so when you see that line 134 in style.css is messing up your entire page layout, you know who wrote it, when it was written, and why.

The Subversion server I use is its own entity: it is an Ubuntu box that exists purely to serve version-controlled files from the repositories. The repositories themselves are not even on the Subversion server; instead, they're on a NAS (network-attached storage) device, so if the Subversion server dies, the repositories can be retrieved and used with a replacement server. Of course, the repositories are backed up too, because if the NAS device fails and there is no backup, it would be catastrophic.

Individual Development
Each individual working on Drupal projects has a WAMP server on his or her local machine. Drupal is installed here and runs under Apache 2. One important decision here is that there is only a single Drupal installation running multiple sites. These sites reside in subdirectories in the sites folder, and each site has its own repository.

Site-specific modules are installed in the site's own modules folder, whereas modules that will be used on all or most sites (such as cck, webform, etc) are kept in the sites/all/modules folder, which means there is just one codebase for each of these modules. This folder is kept in its own repository, so when modules are updated, changes are committed and a tag is created.

This means that only one member of the production team handles module updates. The others simply update their working copy of the sites/all/modules repository and everything is up to date.

Potential Problems with a Shared Environment
The alternative to having developers work locally on their own machines is to have a shared development server with something like a Samba share that the production team use to access and edit the files. The main drawback with this approach is that it's all too easy to have two individuals editing the same file at the same time, potentially overwriting one another's changes. The style.css and template.php files in Drupal themes are especially vulnerable.

Using local copies and version control all but eliminates this kind of thing. Let's say two people have edited style.css. Person A makes her changes, updates her working copy (this should always be done before committing) and sees that there are no changes, so she commits. Person B makes his changes, and updates his working copy. Assuming that A and B have edited different lines in the file, person B will see that person A's changes have been merged into his changes, so when he commits, the combined efforts of both will be saved.

If A and B have edited the same lines in the file, when person B updates his copy, he will see that there has been a conflict, but will be offered both versions of the file and shown the exact lines that are conflicting. He can then select which side of the conflict ‘wins’, or if neither will be correct, he can alter the conflicting line to incorporate both edits. Nothing will get overwritten unless B wants it to be.

Development and Testing Server
When changes are ready to show to a client, a dedicated development and testing server is used. The working copy on this system is simply updated, and the client is given a special development URL on which to view the work that has been carried out.

Shared Development Database
When I said that the production team work locally on their own machines using WAMP, this is true for the site's files, but not for the database. It's not possible to keep a database under version control, so consider this:

Person A is working locally with a local database and creates a node. There were 10 nodes already, so this is node 11. Person B is working locally with a local database too, and is working on a different aspect of the site. B creates a node too, but in B's database, this is node 11. Therefore, node 11 exists in two places, and is actually two separate nodes. Now imagine this kind of thing happening quite often as various team members work on different bits of the site.

When it comes time to merge all these databases together to create a final testing version of the site, all hell breaks loose. It simply isn't possible to automatically handle situations where the same ID in two different databases refers to two different things.

For this reason, I use a MySQL server on the development and testing system as the ‘central’ development system. Each individual is running his or her own copy of Drupal and has his or her own copy of the files, but utilises the same database. That way, when person A adds a new node, view, user or whatever else, it is instantly visible on person B's copy of the site too.

There are problems with this approach, for sure. Suppose person A has created a view with some templates. The view exists in the database, so it is visible on all of the production team's copies. Person A has not yet committed the templates to the Subversion repository, so they don't exist on person B's copy of the site. The view will almost certainly look very odd to person B until the templates are committed.

Another situation can arise where modules differ. Suppose person A is using a special module on the site to import some content. Once it's in the database, the module can be removed, and nobody else on the production team needs to do anything, because they're all sharing the same database. The module implements hook_menu() to provide an import page where settings can be chosen. When the module is installed on person A's copy of the site, the menu cache is updated to include the path to this import page.

All is well so far, but suppose that person B clears the site's cache for some reason. The import module does not exist on B's copy of the site, so the regenerated menu cache will not include the path to the import page any more. Person A will suddenly find that the menu option for the import page has disappeared! Luckily this is easily resolved (person A simply needs to clear the cache again) but this shows the type of unexpected event that might arise as a result of sharing a Drupal database across multiple copies of the site.

With all that in mind, the benefits of sharing a single database far outweigh the drawbacks, so this is my preferred method, rather than having multiple copies of the database and attempting to merge them together.

Multisite Hostname Problems
One big step I had to take in getting to grips with the multisite approach combined with each member of the production team running their own copy of the site was overcoming the problems associated with Drupal's handling of sites in a multisite environment.

When using a single Drupal install and single copy of Apache for multiple sites, all of Apache's virtual hosts use the Drupal install as their document root, and Drupal selects the appropriate site (and therefore the database, via settings.php) by examining the hostname.

For example, suppose we want to run two sites on the same copy of Drupal: www.foo.com and www.bar.com. In Drupal's sites folder, we create one folder called foo.com and one called bar.com. When a user's browser makes a request for www.foo.com, Apache handles this request, serves the document from the Drupal install folder (with Drupal, it's almost always index.php), and Drupal knows that because the request asked for www.foo.com, it should use the foo.com site and not the bar.com one.

If we apply this to our development environment, our main development and testing server (example.com, for argument's sake) can be accessed from either foo.example.com or bar.example.com, to see each site's development and testing copy. Great, but how does person A access her local copy of the site? She could add an entry to her hosts file to redirect foo.example.com to 127.0.0.1, but then how would she see the copy on the development and testing server? She would have to remove or comment the hosts file entry, which will get confusing, because it will be easy to forget which version of the site she is looking at during any given moment.

Luckily, there's a better way. The folder sites/foo.example.com exists on the development and testing server and also on person A's local copy. Person A alters the Apache virtual host entry for her local machine from foo.example.com to local.foo.example.com. She then adds an entry into the hosts file, directing local.foo.example.com to 127.0.0.1. Now, she can access her local copy by putting local.foo.example.com into her browser, and the development and testing copy by using foo.example.com, all the while keeping the structure of the sites folder the same in both places.

Keeping up with Site Files
Quite often, clients will want to populate their new sites with stories and articles before the site launches. Sometimes, I will want to give them this privilege before all areas of the site are finished. A situation arises where the client has uploaded files such as thumbnails or PDFs. These exist on the development and testing server, but not on any of the local copies used by the production team.

Luckily, SyncToy can help with this. Simply create a folder pair that echoes the development and testing server's files to the local system. Each time any member of the production team notices that their local copy is missing some thumbnails or other files, he or she can just synchronise the folder pair to receive all of the latest files. Alternatively, a schedule can be created to do this automatically at regular intervals.

Of course, this will not help in situations where the production team has uploaded files to their local machines that need copying to the development and testing server, but this is probably going to happen much less frequently, and there is always the option of synchronising both ways for this eventuality.

Live Server Setup
The live sites run on a single Drupal install on a dedicated Apache server, while the live databases run on a separate system that does nothing but run databases (and has MySQL in dedicated mode for best performance). I would always recommend separating these functions out on medium loads upwards, so that Apache does not slow MySQL and MySQL does not slow Apache.

I have not seen any viable clustering solutions for hosting Drupal sites, but luckily I am in a position where all the sites can be run from one web server and one database server. Even so, I would be interested to hear about any successful clusters out there.

Amending the Live Site
Unfortunately, making amendments to the live site is probably the biggest hurdle left in this particular production environment. If there is a rogue CSS class, that's fine. The change is made on a production team member's local system, and tested. The change is committed, then the development and testing server is updated, so that it receives this change too. The change is tested there. Assuming it is ready to be made live, the live server is updated and everyone is happy.

The real problem comes when, for example, a new node needs to be added. Suppose that the client has asked for a new webform. First, the live database is copied down to the development and testing server. A member of the production team produces the webform on her local system. This shares the database with the development and testing server, so unless template or CSS changes are required, the job is done and it can be tested.

The problem is that suppose there were 100 nodes before the work began. On the development and testing database, the webform is node 101. Now suppose that in the time taken to create and test the webform, 5 nodes have been created on the live site. The development and testing database cannot simply be copied back over, because this would wipe those 5 nodes!

One solution is to put the live site into maintenance mode while the webform is created. This way, the database will be locked off to the general public, so it can be copied down to development and testing, worked upon, then copied back up without the risk of wiping content.

This might be fine for a 5-minute job, but what if the work requires 4 hours? The site cannot very well be put into maintenance mode for 4 hours during the working day just so a small change can be made!

The answer at the moment seems to be to write everything down. Write down the settings you use to create the new webform. Once it has been tested and is ready to go live, just create the webform again on the live server. This is not sophisticated or classy in any way, but appears to be the best option.

Your Input
I would be interested to hear thoughts on the way this particular production environment is set up and how it might be improved, especially where I have noted problems. I'd also love to hear how you do things and why, because I think I learn a lot from other people's experiences.

Drupal 5: user_save and profile fields

I was recently required to import a large number of users into a Drupal 5 site, so I wrote a simple import module to take rows from a CSV file and pass them to user_save(). In addition to the basic user information in the {users} table, I needed to create several profile fields too. This was incredibly complicated, but probably shouldn't have been.

The first thing I noticed is that the documentation for user_save() is not exactly stellar.

$account The $user object for the user to modify or add. If $user->uid is omitted, a new user will be added.

Fine, but is this where I should be putting my new user's information, or perhaps I should use the next parameter?

$array An array of fields and values to save. For example array('name' => 'My name'); Setting a field to NULL deletes it from the data column.

Ok, fine. Maybe I should use this one for adding my new user data. This doesn't mention anything about the profile fields though, or explain what I should be doing with the $account parameter. Maybe there's something else?

$category (optional) The category for storing profile information in.

What? Category? Now I'm really confused. There's nothing obvious in user_save() that suggests how the profile fields get saved, or even where to put the profile fields. My only real clue is the call to user_module_invoke() towards the end of the function. This calls hook_user() in all the active modules on the site, and the one I'm interested in is the profile module, so my next stop was profile_user(). In turn, this calls profile_save_profile() with the details from the original call to user_save().

It was at this stage that I noticed that $category must refer to the various groups that profile information can be put in. For example, you can create a category for personal information, and one for notification preferences, and doing so will split the fields onto different tabs when the user edits his or her profile. Unfortunately, $category is a string, not an array, so for each call to profile_save_profile(), only one category can be changed.

Because profile_save_profile() is only called once per user_save(), it appears that when creating a user, it is only possible to create profile fields in one group! This causes a problem for me because I needed to import lots of profile fields in several groups.

My solution was to temporarily move all the profile fields into a single group. Once I had done that, I could populate $array with the information destined for the {users} table and the profile fields (this was not documented anywhere). It turns out I could just use NULL for $account (again, this was not documented).

Surely this is not the ideal way of creating new users programatically. My solution does work, but it is annoying and time-consuming. Is there another way to create users with profile fields in multiple categories?

Tuesday 4 November 2008

Drupal 5: Don't rely on the node's path attribute

I recently noticed that some of the nodes on a site I was working on were not linking properly from the teaser to the full node view. It turns out that the hyperlink tag's href was empty. Checking the template for the node teaser, I found that it was using this:


print l($node->title, $node->path);


This appears to work fine for nodes with a path alias set up (via the path or pathauto modules), but not for other nodes, because the $node->path part was empty. I believe the following code should have been used, to always start with the base URL for the node, and let the l() function choose the most appropriate alias:


print l($node->title, 'node/' . $node->nid);