The other day a friend and I were discussing blog and forum software, and whether there were any good combination blog/forum products out there that were reasonably priced.
We'd both tried CommunityServer and found it was not really appropriate for our needs, and neither of us knew of any other prominent alternatives. The reasoning for our discussion was that we'd both noticed a potential for dual-use web sites offering both blog and forum features; sort of a "here's what I think...discuss amongst yourselves" kind of an approach.
He is off now investigating the hazards of writing his own forum software, but while we were looking into ways to potentially allow single sign-on between BlogEngine.Net and [insert your favorite forum software here], we noticed something interesting...
Before proceeding, let me first say I really like BlogEngine.Net a lot. I have a great deal of respect for the team that developed (continues to develop) it, and I think it's about the best open source blog software available in managed code. It's well thought out, with easy integration of extensions and widgets, not to mention theming instructions (even videos), and an avid community of users and contributors. I recommend it to anyone looking for a personal blog that they can also tweak to better suit their own needs. It was just this extensibility that led to today's post.
As I mentioned, we initially began looking at the code to see if single sign-on was a likelihood, but during our investigation my friend pointed out that the search feature appeared to be aggregating what could turn out to be *very* large strings (the text from all posts, pages and comments in some cases) in memory, then using regular expressions to search the content, and ranking the results. To me, that seemed like it would work well for small blogs, but could have some issues scaling for a user that was either very prolific, or very verbose (or anyone who had an active blog for a number of years without archiving their content).
I started looking around for freely available search engines, and ran across Lucene.Net. It appeared to have a fairly simple API, ported directly from its Java progenitor, so I started looking at BlogEngine.Net's widget framework for ways to supplant the existing search functionality with a true indexed approach. As it turned out, it wasn't bad at all. In total, I added seven (7) files - including a 2 files for the widget editor which is not strictly required, and 2 other code-behind files that could be done away with if I was a glutton for punishment - and had myself a working version of 'Search' using Lucene.Net.
The widget - Lucene Search - relies heavily on the existing application architecture (and the existing concatenated text search feature) for its structure. In many cases, I simply copied a page or control (e.g. Search.aspx) and gutted the logic, using the same function names and arguments as the original. Like the original search feature, Lucene Search hooks static system-wide events for determining when documents are added, removed or changed, and then builds and maintains a running index of the blog's contents.
Lucene turned out to be just as flexible and easy to work with as it had first appeared. I started off with a basic console application to test my plans, and moved on from there. The only real trouble I had was the relative scarcity of comprehensive documentation of the engine itself. Many of the issues I faced completing my search widget were rooted in my inability to find an example similar to what I was attempting to do (note: complete API documentation is available, but where parsers and analyzers are concerned - of which there are many in Lucene - examples are really handy). There is only one outstanding issue that I believe may be attributable to an actual bug in Lucene.Net, and that relates to the apparent case-sensitivity of un-tokenized query terms. I'll try to write more about that in a later article once I have time to read through the Lucene.Net source code. It's possible that my own ignorance is at the bottom of that issue, as well.
The widget can be downloaded here:
LuceneSearchWidget.zip 1.0.0 (126.94 kb)
The installation instructions are really pretty simple. I kept the folder hierarchy in the ZIP archive the same as it is in BlogEngine.Net 1.4.x.x, so all you have to do is extract the archive into the blog's root directory. No existing BlogEngine.Net files are mutilated in the course of the installation (all the files in the archive are "new"), and the only other item to note is that the NETWORK SERVICE account will need write access to the App_Data folder, which is the default location of the index files. Of course, if you don't already have that set up, your blog isn't working too well in the first place.
A few words about the widget's edit feature:
It works great, but doesn't actually do much at present. I was playing around with it, building in features to allow admins to keep several different indices on the same server. You can use it to store index locations (folder names), but the 'Set as Current' feature in the list of index folder locations won't actually change anything yet. That will be in the next release. The one feature you may find useful is the button labeled '(Re)Create Search Index'. Pressing that button will clear the existing search index for your blog (if any), and regenerate a new index of all your content (note: a new index will also be created the first time the widget is loaded if one does not already exist).
As with virtually all freeware, this download is offered with no guarantees or warranties. Use at your own risk, but I hope you enjoy it. Let me know when bugs are found.