But we're not quite ready to go live. In this and the next post, we'll be installing Sphinx as your back-end search engine and doing some additional configuration work.
People will be continuously searching through your wiki for that exact bit of information they need, so it follows you'll be needing a good search functionality. The basic search included in your wiki installation is not a good search functionality.
SphinxSearch is a (stand-alone) lightning-fast indexing and searching engine. We'll install it, then give it access to the MediaWiki MySQL-database . Finally, we'll install the sphinxsearch-mediawiki-extension to feed search queries from wiki users to Sphinx and results from Sphinx to your wiki.
I realize this post is a bit extensive, but rest assured, it's worth it.
SphinxSearch installation
Using FireFox, download your preferred version from http://sphinxsearch.com/downloads/. For myself, I've chosen the most recent beta. Pick the Ubuntu “.deb” package (12.04 LTS at time of writing, i386 for Intel, x86_64 for AMD processor).
After clicking it, select not to just to download it but to let it open with Ubuntu software center. After the file is downloaded, the Ubuntu software center will analyze the file and give you the option to install it. Click the install button and let Ubuntu do the work.
You might wonder if there isn't a sudo apt-get install something instead; after all, it’s been going great that way. Well, there is. But we're not using it this time because the apt-get version is terribly outdated and not being updated with a good frequency.
Alright, so the install is done. Now, we'll need to provide a configuration file for Sphinx to work nicely with MediaWiki. A preconfigured one (for MediaWiki-use) is provided by the SphinxSearch extension (we'll get to that in the next section). There is also a configuration file provided by Sphinx itself - this is a general one which you can use as a starting point when you’re configuring Sphinx for other purposes. This configuration file is automatically loaded and will mess things up for us as we will have a preconfigured one in a different location. So first thing you need to do is delete this configuration file.
http://www.mediawiki.org/wiki/Special:ExtensionDistributor/SphinxSearch
When the download is complete, open the terminal and enter the following commands:
Now to move the sphinx.conf file and prepare a folder structure for Sphinx. At the terminal which you still have open, type the following series of commands:
We've let Sphinx read and write to /usr/local/var/data/sphinx and /usr/local/var/log/sphinx, but only our “sudo” lets Sphinx access those folders. When Sphinx is being queried by MediaWiki, it won’t have access. So we need to change this. At the terminal, type:
To make the search daemon start up automatically when your server starts up, you need to add it to the startup script (rc.local):
End with CTRL+o, enter, CTRL+x.
Now, to set up an automated task for the indexers:
That will let the main and incremental index run at 7 AM every day. To let the incremental indexer run with a higher frequency during the day, add another line below the previous one:
This will make the incremental indexer work every 5 minutes. You may want to lower the frequency if you’re having performance problems (e.g. */10 = every 10 minutes, */30 = every 30 minutes).
Again CTRL+o, enter, CTRL+x.
Then we need to tell our wiki to use Sphinx for searching. Using the file manager, go to /etc/w and open the LocalSettings.php file. At the bottom of the file, add
In the following article, we'll be installing and configuring some other nice extra's - we're not done pimping your workplace yet! But what you have so far is good to go, so at this point you can start adding information in your wiki and letting your colleagues/employees know about it.
Alright, so the install is done. Now, we'll need to provide a configuration file for Sphinx to work nicely with MediaWiki. A preconfigured one (for MediaWiki-use) is provided by the SphinxSearch extension (we'll get to that in the next section). There is also a configuration file provided by Sphinx itself - this is a general one which you can use as a starting point when you’re configuring Sphinx for other purposes. This configuration file is automatically loaded and will mess things up for us as we will have a preconfigured one in a different location. So first thing you need to do is delete this configuration file.
Open the terminal and type:
gksu nautilus
which will start the GUI file manager with super-user permission. Use it to navigate to File system, etc, sphinxsearch. There you will find a file sphinx.conf. You need to delete this file, we'll be setting up a different configuration file in a minute.
Configuration - MediaWiki extension
Using FireFox inside your virtual machine, download the extension from:http://www.mediawiki.org/wiki/Special:ExtensionDistributor/SphinxSearch
When the download is complete, open the terminal and enter the following commands:
cd DownloadsMove the extracted file - check the output given during extraction for the exact name of the folder for your version (the underlined part may be different):
tar xvzf wikimedia-mediawiki-extensions-SphinxSearch*.tar.gz
sudo mv wikimedia-mediawiki-extensions-SphinxSearch-f6f56dd /etc/w/extensions/SphinxSearch(note that this is one long line)
Now to move the sphinx.conf file and prepare a folder structure for Sphinx. At the terminal which you still have open, type the following series of commands:
cd ../../../etc/w/extensions/SphinxSearchUpdate the default configuration using nano:
sudo mkdir /usr/local/var/
sudo mkdir /usr/local/var/data
sudo mkdir /usr/local/var/data/sphinx
sudo mkdir /usr/local/var/data/sphinx/wiki_main
sudo mkdir /usr/local/var/data/sphinx/wiki_incremental
sudo mkdir /usr/local/var/log
sudo mkdir /usr/local/var/log/sphinx
sudo mv sphinx.conf usr/local/var/data/sphinx
cd ../../../../usr/local/var/data/sphinx
nano sphinx.confEnter your database name (mediawiki), username (mediawiki) and password (you've set this previously).
You also need to change all the paths to the folder structure we've created above. This means you need to
- replace all occurrences of the /var/data/... paths to /usr/local/var/data/...
- replace all occurrences of /var/log/... to /usr/local/var/log/...
Indexer & Folder Access
The indexer is a process which will read the articles in your wiki (by means of the queries in the sphinx.conf-file), and create an index on this textual data. The index will allow very quick searching.
A little in-between theory on indexers: indexing all your articles (when you have tens of thousands) will take some time. On the other hand, you want to include new articles to your index very quickly, because they can't be found as long as they haven't been indexed. To address this dual requirement, there are two indexes being used:
Enough theory. Let's run the indexer to check if it’s working (and at the same time create the initial index):
A little in-between theory on indexers: indexing all your articles (when you have tens of thousands) will take some time. On the other hand, you want to include new articles to your index very quickly, because they can't be found as long as they haven't been indexed. To address this dual requirement, there are two indexes being used:
- The main index, which will index all your articles but will only run once a day.
- The incremental index, which will only index the articles added/updated since the last run of the main index and so doesn't take much time to complete. We'll configure it to run every few minutes.
Enough theory. Let's run the indexer to check if it’s working (and at the same time create the initial index):
cd ../../../../bin/If it doesn't give you any errors, then you're good. Otherwise, read the error and try to fix whatever went wrong.
sudo indexer --config /usr/local/var/data/sphinx/sphinx.conf --all
We've let Sphinx read and write to /usr/local/var/data/sphinx and /usr/local/var/log/sphinx, but only our “sudo” lets Sphinx access those folders. When Sphinx is being queried by MediaWiki, it won’t have access. So we need to change this. At the terminal, type:
gksu nautilusThis will start up the GUI file manager in superuser mode again. Use it to navigate to the above mentioned folders: first click File System in the left window of the file manager, then you'll find the /usr folder in the right window. Navigate further through local, var to find the data and log folders.
Now right-click the sphinx folder (do this both in the data and log folder) and go to the permissions tab. Search for the user group your username (as set when installing Ubuntu). Then set the access to let this user group create and delete files, and read and write in existing files. Don't forget to apply the permissions to enclosed files.
SearchDaemon & Cron
The SearchDaemon is a process that listens continuously on a certain port and accepts queries from other processes (in our case: the MediaWiki-SphinxSearch-extension). It then performs the search on the indexes prepared by the indexing process (which we executed once in the previous section).To make the search daemon start up automatically when your server starts up, you need to add it to the startup script (rc.local):
cd ../../etcBefore the line with “exit 0”, add:
sudo nano rc.local
/usr/bin/searchd --config /usr/local/var/data/sphinx/sphinx.conf >> /usr/local/var/log/sphinx/sphinx-startup.log 2>&1(one long line, space before and after the >>)
End with CTRL+o, enter, CTRL+x.
Now, to set up an automated task for the indexers:
crontab -eWhich will open your crontab in nano. Add the following below the comments:
0 7 * * * /usr/bin/indexer --quiet --config /usr/local/var/data/sphinx/sphinx.conf wiki_main --rotate >/dev/null 2>&1; /usr/bin/indexer --quiet --config /usr/local/var/data/sphinx/sphinx.conf wiki_incremental --rotate >/dev/null 2>&1(one long line)
That will let the main and incremental index run at 7 AM every day. To let the incremental indexer run with a higher frequency during the day, add another line below the previous one:
*/5 * * * * /usr/bin/indexer --quiet --config /usr/local/var/data/sphinx/sphinx.conf wiki_incremental --rotate >/dev/null 2>&1(again, one long line)
This will make the incremental indexer work every 5 minutes. You may want to lower the frequency if you’re having performance problems (e.g. */10 = every 10 minutes, */30 = every 30 minutes).
Again CTRL+o, enter, CTRL+x.
MediaWiki extension
We need to make the Sphinx PHP API available to the extension. We do this by copying sphinxapi.php from /usr/share/sphinxsearch/api to /etc/w/extensions/sphinxsearch. Do this using the file manager.Then we need to tell our wiki to use Sphinx for searching. Using the file manager, go to /etc/w and open the LocalSettings.php file. At the bottom of the file, add
$wgSearchType = 'SphinxMWSearch';Save and close.
require_once( "$IP/extensions/SphinxSearch/SphinxSearch.php" );
$wgEnableMWSuggest = true;
$wgEnableSphinxPrefixSearch = true;
Done!
Reboot the server to let the search-daemon start up (as configured in rc.local). After reboot, Sphinx will be serving your wiki-users their search results.In the following article, we'll be installing and configuring some other nice extra's - we're not done pimping your workplace yet! But what you have so far is good to go, so at this point you can start adding information in your wiki and letting your colleagues/employees know about it.
Question about the Sphinxsearch installation package.
ReplyDeleteDoes it matter which processor you are using if you install it in a virtual machine?
Pick the Ubuntu “.deb” package (12.04 LTS at time of writing, i386 for Intel, x86_64 for AMD processor).
Yes. It depends on which processor the environment (your virtual machine "player") is offering your virtual machine. This will probably be the same as the one you're using on your actual underlying machine.
Delete