Recently I was required to import about 7000 nodes into
Drupal 5. I was not entirely sure how to achieve this, but I decided to use SQL scripting. As it turns out, a more proper method would have been to use a PHP script to build nodes and use
node_save() to accomplish the import.
That aside, I ran into a problem where none of the manually imported nodes were appearing in search results. Nodes created within Drupal worked fine, appearing in search results as expected. My manually imported nodes worked fine on the site itself: by nagivating to
node/1234 it was possible to view the node as expected, and they worked fine with
Views, but there was nothing in the search results.
My first stop was to tell Drupal to regenerate the search index, by visiting admin/settings/search and using the functionality there. The correct number of nodes was reported on this page, but after clicking the reindex button and running cron.php, it reported that 97% of the nodes had been indexed. This was far too short a time for this to happen, and a quick search showed that the manually imported nodes were still not appearing in search results.
I began to speculate that because the manually imported nodes were last modified (the changed column in the database) before the last time that the indexer was run, they weren't being ‘seen’ by the indexer, since it only pays attention to things that have changed since last time.
I did try backing up the {node} table, setting the changed timestamp for all nodes to something after the last time the indexer was run, then reindexing everything again, but this seemed to make no difference. If I didn't know better, I would have said that Drupal was taking one look at the sheer amount of stuff it was going to have to index, and freaking out!
In the end, my solution was to create a simple module that took advantage of the
_node_index_node() function from Drupal 6. This seemed to work with no compatibility issues (I was using Drupal 5 for this), and my simple module checked to see if the node's
nid existed in the
{search_dataset} table (as a
sid, not a
nid), and if it didn't, called _node_index_node() to forcibly index it. It was a very slow process but it appears to have done the trick.
I suppose the question I am left with after this project is: why didn't Drupal reindex these nodes? Was my suspicion about the indexer not ‘seeing’ them correct? Should one always import nodes using PHP and Drupal's node_save()? While my method works, it is cumbersome and I would always seek the most efficient way of doing this, so please leave a comment if you can elaborate on this!