Sunday, September 18, 2011

Search Engine Installation and Configaration

Introduction
htdig is a webpage search engine licensed under the GNU Public License. It uses a very simple configuration file to allow it to search only the webpages you specify. For example, you can exclude the cgi-bin or a testing directory from the search engine. In addition to installing it on a webserver, some programs use it as a search engine plugin such as Glade, the GTK+ User Interface Builder. In addition, it will create a searchable database of any website. You just supply to URL.
Installing htdig
  1. Download the latest version from the htdig ftp server.
  2. tar -xvfz htdig-3.1.5.tar.gz
  3. cd htdig-3.1.5
  4. ./configure
  5. make
  6. make install

Configuring htdig

Once you have htdig installed, you must make a few changes to the configuration file and the HTML templates into which the search results are embedded.

Configuration File

The configuration file for htdig is located at /opt/www/htdig/conf/htdig.conf. It is pretty self-explanitory. The main attributes you need to configure are as follows. It will work if you leave the defaults for the other options or change them if you wish.
Attribute Value Example
start_url URL of your site http://www.mywebsite.com
exclude_urls Directories you do not want searched separated by white spaces /cgi-bin/ /testing/
adminstrator Email address of administrator admin@mywebsite.com
search_results_header HTML file to be used as header of search results. Only use this if you don’t want to use the default location for the header file: /opt/www/htdig/common/header.html /home/httpd/search/header.html
search_results_footer HTML file to be used as footer of search results. Only use this if you don’t want to use the default location for the header file: /opt/www/htdig/common/footer.html /home/httpd/search/footer.html
nothing_found_file HTML file to be displayed if there is no match to search string entered. Only use this if you don’t want to use the default location for the header file: /opt/www/htdig/common/nomatch.html /home/httpd/search/nomatch.html
syntax_error_file HTML file to be displayed if there is a syntax error in the search string entered. Only use this if you don’t want to use the default location for the header file: /opt/www/htdig/common/syntax.html /home/httpd/search/syntax.html
HTML Templates

If you don’t want to use the default look-and-feel of htdig, you can edit the following files to use the look-and-feel of your website. The paths may be different if you choose to change the paths of them in your configuration file.
  • /opt/www/htdig/common/header.html
  • /opt/www/htdig/common/footer.html
  • /opt/www/htdig/common/nomatch.html
  • /opt/www/htdig/common/syntax.html
Post-installation and configuration
  1. Next, you must setup the search database by running the script /opt/www/htdig/bin/rundig.
  2. Copy the default search.html and images from /opt/www/htdocs/htdig to a directory named htdig off of your webRoot. If the images are not in this directory, they will not appear unless you configure it otherwise it htdig.conf.
  3. Copy /opt/www/cgi-bin/htsearch to the cgi-bin for your webserver.
  4. Test the search engine by opening search.html in your browser and entering a search string.
  5. Because the search engine uses a database to return results, the database must be rebuilt with the rundig command used in step 1 every time any pages are added to the website.
  6. If you want to configure anything else, refer the the htdig website. Pretty much everything is configurable with htdig.
Post a Comment