OnSearch Installation

Contents

  1. Introduction
  2. System Requirements
  3. Automatic Installation
  4. Manual Installation
  5. Appendices
    1. Perl Configuration
    2. Apache Installation
      1. Apache 1.3
      2. Apache 2.0
    3. Configuring Apache
      1. Apache 1.3
        1. Load Modules
        2. Enable Handlers
        3. Add Permissions
        4. Create Password and Group Files
        5. Restart the Server
      2. Apache 2.0
    4. Acknowledgements

Introduction

This document describes how to install OnSearch. If the system already has the features needed by OnSearch (described in, "
System Requirements,") then you should be able to install OnSearch by entering the following commands.
  $ su            # Log in as the system administrator.
  Password:
  # perl install  # Follow the prompts.
The install script tells you if the World Wide Web server is missing a feature, that OnSearch needs. The section, "Configuring Apache," describes the changes to httpd.conf that OnSearch needs.

The section, "Automatic Installation," describes each step of the automatic installation process in detail.

System Requirements

  1. You must be able to log in as the superuser ("root") or set permissions using Web server's user ID.

  2. The file systems where OnSearch is installed (/usr/local and the Apache DocumentRoot directory) need to be writable; e.g. they are not mounted read-only locally or via a network. If the system does not have a /usr/local file system, the installation process will create one.

  3. The system must have sufficient disk space for the indexes. They can add significantly to a Web site's storage.

  4. An Apache Web server, version 1.3 or later. If you don't have Apache, you can find it at http://www.apache.org.

    The Web server must have:

    1. Handlers to Run Common Gateway Interface scripts outside ScriptAlias directories.
    2. Handlers to execute CGI scripts using Server Side Include shtml files.
    3. The ability to write to the document directories.
    4. Unique user and group IDs for the Web server in order to index from within OnSearch.

    The section, Apache Configuration, describes how to add these features.

    If these requirements present security problems, you can limit OnSearch to only a portion of the Web site, or create a symlink to OnSearch's directories. The Apache document manual/misc/security_tips.xml discusses these issues.

  5. Perl, version 5.6.1 or later. OnSearch can work with earlier versions, but its HTML filtering relies on the Unicode support of later versions. See Perl Configuration.

    If you need to install or upgrade Perl, you can find it at: http://www.perl.org/.

    The Perl interpreter, or a symlink to the interpreter, must be /usr/bin/perl.

    Note: OnSearch does not yet use the Perl I/O abstraction layer. One symptom of I/O layer incompatibility is excessive memory usage. To determine if Perl was built with the abstraction layer, and how to cope with it, see Perl Configuration.

  6. A C compiler. The C part of OnSearch was built with the GNU C compiler, GCC 2.95. Other ANSI C compilers should also be able to build the binary programs with minimal effort. The GCC Web site is http://www.gnu.org, but most operating systems offer GCC as an optional free software package.

  7. Web browser(s) that can handle cookies and CSS 2 style sheets; e.g., Mozilla and Netscape, versions 5 and later, and recent versions of Internet Explorer. Although OnSearch is compatible with earlier browsers and text mode programs like lynx, the pages are much more readable when viewed with recent browsers.

  8. To index and search documents other than text files, OnSearch's plugins rely on helper applications. The Postscript plugin uses GNU Ghostscript to translate Postscript to text. The PDF plugin uses the pdftotext utility. The gzip plugin requires the gzip utility. You can find ghostscript at http://mirror.cs.wisc.edu/pub/mirrors/ghost. Pdftotext is part of the xpdf distribution which is downloadable from http://www.foolabs.com/xpdf. The main gzip site is http://www.gnu.org. Many operating systems also provide ghostscript, xpdf, and gzip packages.

Automatic Installation

If for some reason the automatic installation doesn't work, refer to "
Manual Installation."

The examples here show an Apache server installed as described in, Apache Installation, and, Configuring Apache. The host name, directories, and user and group names may be different.

  1. Log in as the superuser and start the install script.
    $ su
    Password:
    # perl install
    
  2. The install script will search for the Web server and its configuration If the system has more than one Web server the install script will ask which to use. This step can take several minutes.
    OnSearch installation.  If this program can't determine the site
    configuration, read INSTALL.html.
    
    NOTE: If you've already installed OnSearch, make a backup copy.
    Press [Ctrl-C] to exit now if necessary.
    
    The Web server is /usr/local/apache2/bin/httpd.  Is this correct (y/n)?
    y
    The Web server's configuration file is /usr/local/apache2/conf/httpd.conf.  
    Is this correct (y/n)?
    y
    Configuring for /usr/local/apache2/bin/httpd and /usr/local/apache2/conf/httpd.conf.
    
  3. The install script finds the settings it needs from the Web server's configuration. If you're not certain that the settings are correct, determine that the settings in httpd.conf are also correct.

    Note: The install script will also warn you if the Web server is not configured to handle CGI scripts and Server Side Includes. You must make the necessary changes to httpd.conf before using OnSearch.

    The following values were found:
    The Web site's name is owl.local.net.
    The Web site's document directory is /usr/local/apache2/htdocs.
    The Web server's version is 2.0.54.
    The Web server's directory is /usr/local/apache2.
    The Web server's process owner is apache. 
    The Web server's port is 8090.
    
  4. The install script then verifies the directory where it should install OnSearch. Refer to Manual Installation if you want to install OnSearch elsewhere.
    I'm going to install OnSearch in /usr/local/apache2/htdocs/onsearch.  Is this okay (y/n)?
    y
    
  5. The install script installs the OnSearch program, documentation, and libraries, and verifies the location of the top-level document directory.
    OnSearch will use /usr/local/apache2/htdocs as its top-level document directory.  
    Is this okay (y/n)?
    y
    
  6. The install script then checks for Perl and the GNU C Compiler. They must be present or the script will not continue.
  7. The installation script then creates the necessary directories, configures and builds the program files, and sets the application's file and directory permissions.
    Creating directories.
    Copying files.
    Building indexing daemon.
    Setting permissions.
    
  8. Then the install script asks if you want add directory wrappers for the OnSearch directories. They are necessary to set the correct permissions for OnSearch and enable passwords for the, "Admin," page.

    Note: The installation's administrator user name is, "onsearch," password, "onsearch." You should change them to your own user name and password by editing /usr/local/etc/onpasswd and /usr/local/etc/ongroup. Refer to the manual page for Apache's htpasswd program for instructions to set the password.

    I'm going to add OnSearch's configuration to /usr/local/apache2/conf/httpd.conf.
    I'll copy the original httpd.conf file to httpd.conf.onsearch-save.
    
    You can skip this step, and then configure the Web server
    manually, using the template in doc/conf.tmpl.
    Should I continue (y/n)?
    y
    
    I'm going to add password authentication information to your Web
    server configuration and then try to restart the server.
    
    With authentication enabled, users must enter a user name and password
    to perform administrative tasks.  For passwords to work the Web server
    must include mod_auth in its configuration.
    
    During the process, the installation will copy your old 
    /usr/local/apache2/conf/httpd.conf to /usr/local/apache2/conf/httpd.conf.onsearch-save, 
    so you can restore your previous configuration if necessary.
    
    You can skip this step, but then you will not have password protection
    for administrative tasks.
    
    Should I continue (y/n)?
    y
    
    
    .
    .
    .
    
    Done. If the installation was correct, you should be able to browse to
    
    http://owl.local.net:8090/onsearch/index.shtml
    
    The OnSearch User Guide, 
    
    http://owl.local.net/onsearch/doc/userguide.html
    
    describes how to index files and start using OnSearch.
    
    
  9. You should now be able to browse the URL given by the install script (which uses your Web server's name instead of the server given in the example above). If the Web server's configuration is correct, you should see OnSearch's main Web page. If you encounter an error, refer to Manual Installation and Configuring Apache.

Refer to the OnSearch User Guide, which you can open from the, "About," page for instructions on how to configure OnSearch and index documents.

Manual Installation

  1. Find the settings of the following variables in the Apache httpd.conf file.
    ServerName
    DocumentRoot
    User
    
    Note: Some Web servers set the site name name using Listen instead of ServerName.

    If you can't find http.conf, try this command.

    # find / -name "httpd.conf"
    
    If find can't locate the file, then check the Apache installation before continuing. To find the settings mentioned above, either view the file with a text editor or more, or use a shell command something like this.
    # egrep '^(ServerName|DocumentRoot|User|Listen}' /path/of/httpd.conf
    
    Substitute the actual path of httpd.conf in the example.

  2. Decide where you want to install OnSearch. OnSearch's working directory should be a subdirectory of DocumentRoot, and the name of the subdirectory is the value of OnSearchDir in onsearch.cfg, normally onsearch. The complete path to the OnSearch directory is (DocumentRoot)/onsearch.
    For example, if DocumentRoot is /usr/local/apache/htdocs, then OnSearch's working directory will be /usr/local/apache/htdocs/onsearch. The following instructions use this example.
    If you want to install OnSearch in a different subdirectory, open onsearch.cfg in a text editor and edit the value of OnSearchDir with the name of the subdirectory.

  3. Using a text editor, edit onsearch.cfg's values for SearchRoot, ServerName, and User with the values you found in step 1.

  4. Edit WebLogDir, BinDir, and DataDir in onsearch.cfg if you need to change them from the default /usr/local/spool/onsearch, /usr/local/sbin, and /usr/local/var/run/onsearch

  5. Create the OnSearch directories.
    # mkdir /usr/local/htdocs/onsearch
    # mkdir /usr/local/htdocs/onsearch/admin
    # mkdir /usr/local/htdocs/onsearch/cache
    # mkdir /usr/local/htdocs/onsearch/doc
    # mkdir /usr/local/htdocs/onsearch/images
    # mkdir /usr/local/htdocs/onsearch/plugins
    # mkdir /usr/local/htdocs/onsearch/webpages
    # mkdir /usr/local/htdocs/onsearch/uploads
    # mkdir -p /usr/local/spool/onsearch
    # mkdir -p /usr/local/var/run/onsearch
    

  6. Copy the Web pages and CGI scripts to OnSearch's working directories.
    # cp onsearch.cfg /usr/local/htdocs/onsearch
    # cp -R html/* /usr/local/htdocs/onsearch
    # cp images/* /usr/local/htdocs/onsearch/images
    # cp -R cgi/* /usr/local/htdocs/onsearch
    # cp doc/* /usr/local/htdocs/onsearch/doc
    # cp plugins/* /usr/local/htdocs/onsearch/plugins
    

  7. Set the file and directory ownership and permissions.
    # chown -R nobody /usr/local/htdocs/onsearch
    # chown -R nobody /usr/local/spool/onsearch
    # chown -R nobody /usr/local/var/run/onsearch
    # chmod 0755 /usr/local/htdocs/onsearch/*.cgi
    # chmod 0755 /usr/local/htdocs/onsearch/admin/adminform.cgi
    # chmod 0755 /usr/local/htdocs/onsearch/plugins/*
    # chmod 0700 /usr/local/var/run/onsearch
    

  8. Build and install the Perl libraries.
    # cd lib && (perl Makefile.PL LIB="/usr/local/apache/htdocs/onsearch/lib" && make install)
    # make clean
    # cd ..
    

  9. Copy src/Makefile.tmpl to src/Makefile, and edit BINDIR and DATADIR to the BinDir and DataDir in onsearch.cfg. Edit the value of ONSEARCHDIR to OnSearch's installation directory.

    Also edit the values of CC and CFLAGS if necessary for compilers other than GCC.

    Then build and install onindex.

    # cd src && make install
    # make clean
    # cd ..
    

  10. Copy doc/Makefile.tmpl to doc/Makefile and edit APPDIR to the directory where OnSearch is installed, and WEBLOGDIR and DATADIR with the values of WebLogDir and DataDir in onsearch.cfg.

    Also edit the TROFF variable if necessary.

    Then install the man page for onindex.

    # cd doc && make all
    # make clean
    # cd ..
    

  11. Create the OnSearch authorization files onpasswd and ongroup.
    # /usr/local/apache/bin/htpasswd -nb onsearch onsearch >/usr/local/etc/onpasswd
    # echo "onsearch: onsearch" >/usr/local/etc/ongroup
    
    This creates a default administrator user name onsearch, password onsearch. You can change the password, or add users by adding them with htpasswd and adding the users to ongroup.

    Refer to the htpasswd manual page for further information.

  12. Edit doc/conf.tmpl with the name of OnSearch's working directory, and add its contents to httpd.conf.
    <Directory /usr/local/apache/htdocs/onsearch>
        Options ExecCGI Includes
        AddHandler cgi-script .cgi
        AddType text/html .shtml
        AddHandler server-parsed .shtml
    </Directory>
    

  13. Edit doc/auth.tmpl with the name of OnSearch's admin subdirectory, and add its contents to httpd.conf.
    <Directory /usr/local/apache/htdocs/onsearch/admin>
        AuthType Basic
        AuthName OnSearch
        AuthUserFile /usr/local/etc/onpasswd
        AuthGroupFile /usr/local/etc/ongroup
        require group onsearch
    </Directory>
    

Refer to the OnSearch User Guide, which you can open from the, "About," page, for information about configuring OnSearch and indexing documents.

Appendices

Perl Configuration

Unicode Support
OnSearch's HTML filtering relies on Unicode support which was first built into Perl in version 5.6.1. If you have an earlier version of Perl, change the Plugin entries in onsearch.cfg for the text/html and text/xml MIME types from, plugins/html, to, plugins/text. You will be able to index HTML and XML documents, but OnSearch will not recognize international characters.
Perl I/O Abstraction Layer
In some early releases of Perl version 5.8, OnSearch's background processing causes increased memory usage and reduced search times if Perl is configured with the I/O abstraction layer. If you can not upgrade Perl, the following workarounds help avoid the abstraction layer incompatibilities.

To determine if Perl uses the I/O abstraction layer, use the following shell command.

# perl -V | grep 'useperlio'
The value of, "useperlio," should be, "undef." If the value is, "define," then you can get Perl to use normal (Unix) I/O with either of the following methods.

Apache Installation

Apache 1.3
This section describes how to build and install Apache 1.3.19 from the source code distribution, from
http://www.apache.org, and configure it to work with OnSearch.

Following the instructions in README.configure, build and install Apache with support for the standard modules and Dynamic Shared Object support, and install the server in /usr/local/apache:

# ./configure --prefix=/usr/local/apache \
> --enable-module=most \
> --enable-shared=max
# make
# make install
README.configure also describes many other configurations. To use OnSearch, you must enable at least mod_cgi, mod_auth, and mod_include. Building them should be enabled by default, but this example shows how to enable them explicitly.
# ./configure --prefix=/usr/local/apache \
> --enable-module=cgi \
> --enable-module=include \
> --enable-module=auth
# make
# make install
Start the Apache server:
# /usr/local/apache/bin/apachectl start
NOTE: This configuration omits many standard features and is recommended only if you plan to use the Web server with OnSearch.

If Apache is working correctly, you can view the manual at the following URL:

http://<your-host-name>/manual/
You should then be able to view the Apache home page at the following URL:
http://<your-host-name>/
Apache 2.0
OnSearch can use the default installation of Apache versions 2.0.x. To build and install the Web server, use the following shell commands.
# ./configure
# make
# make install
Then edit the Listen and ServerName lines as described in httpd.conf, and start the server.
# /usr/local/apache2/bin/apachectl start
Browsing to the Web server's address should display the Apache default home page.

The section, Apache 2.0 Configuration, describes configuring the Web server for OnSearch.

Configuring Apache

Note: In all versions of Apache, it may be necessary to add a unique user and group ID for the Web server, in order index from within OnSearch. You will also need to adduser and addgroup (or useradd and groupadd) the Web server's unique user and group.
Apache 1.3
Configuring Apache requires several steps:
  1. Determine that the Web server has loaded the necessary modules.
  2. Enable the handlers for CGI scripts and server side includes.
  3. Add the permissions that OnSearch needs.
  4. Create the password and group files.
  5. Restart the Server.
Load Modules
If you installed Apache with the configuration described in
Apache Installation, you will need to load modules by using LoadModule statements in httpd.conf, as described below. However, if the Web server does not have Dynamic Shared Object support enabled, you can determine which modules are already loaded with the following shell command.
# /usr/local/apache/bin/httpd -l
Compiled-in modules:
  http_core.c
  mod_env.c
  mod_log_config.c
  mod_mime.c
  mod_negotiation.c
  mod_status.c
  mod_include.c
  mod_autoindex.c
  mod_dir.c
  mod_cgi.c
  mod_asis.c
  mod_imap.c
  mod_actions.c
  mod_userdir.c
  mod_alias.c
  mod_access.c
  mod_auth.c
  mod_setenvif.c
suexec: disabled; invalid wrapper /usr/local/apache/bin/suexec
The modules that OnSearch needs are: mod_cgi, mod_include, and mod_auth. If any of these are not compiled into the Web server, you'll need to load them dynamically by adding the following lines to httpd.conf if they are not already present.
LoadModule auth_module /usr/local/apache/libexec/mod_auth.so
LoadModule cgi_module /usr/local/apache/libexec/mod_cgi.so
LoadModule includes_module /usr/local/apache/libexec/mod_include.so
The file paths in the examples are those of a standard Apache directory layout used in Apache Installation. You may need to adjust the paths depending on the Web server's configuration.

If any of these modules are missing, consult INSTALL in the Apache source code archive.

Enable Handlers
To enable the handlers for CGI scripts and server side includes, uncomment the relevant lines in httpd.conf.

Change the lines:

    #AddHandler cgi-script .cgi
    .
    .
    .
    #AddType text/html .shtml
    #AddHandler server-parsed .shtml
So that they look like the following:
    AddHandler cgi-script .cgi
    .
    .
    .
    AddType text/html .shtml
    AddHandler server-parsed .shtml
Add Permissions
The easiest way to add permissions for OnSearch is to add the following sections to httpd.conf.
<Directory /usr/local/apache/htdocs/onsearch>
    Options ExecCGI Includes
    AddHandler cgi-script .cgi
    AddType text/html .shtml
    AddHandler server-parsed .shtml
</Directory>
This template is contained in the file doc/conf.tmpl. You must edit the <Directory ...> line with the name of the directory where OnSearch is installed.

You can also add these options for the entire Web site. The comments in http.conf and the Apache manual provide further information.

The following section configures the OnSearch admin page to use passwords.

<Directory /usr/local/apache/htdocs/onsearch/admin>
    AuthType Basic
    AuthName OnSearch
    AuthUserFile /usr/local/etc/onpasswd
    AuthGroupFile /usr/local/etc/ongroup
    require group onsearch
</Directory>
This template is contained in the file doc/auth.tmpl. It must be added in this format if you want to use passwords with OnSearch. Again, you must edit the <Directory...> line with the name of the OnSearch admin subdirectory.
Create onpasswd and ongroup Files
This step requires the htpasswd utility, which should be installed with the Apache server (in /usr/local/apache/bin in these examples).

The install script creates onpasswd with the user name, "onsearch," password, "onsearch," but you can select any user name and password you like. To create the onpasswd file, use the following shell command.

# htpasswd -nb onsearch onsearch >/usr/local/etc/onpasswd
Create the ongroup file with the following command.
# echo 'onsearch: onsearch' > /usr/local/etc/ongroup
To change the user name or password, or add and delete users, consult the Apache documentation.
Restart the Server
The final step is to restart the server.
# /usr/local/apache/bin/apachectl restart
Apache 2.0 Configuration
Install Apache as described in
Apache Installation, and make certain the Web server is operating correctly.

Then uncomment the following lines in httpd.conf.

AddHandler cgi-script .cgi

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
Also add doc/auth.tmpl and doc/conf.tmpl to httpd.conf and create the password and group files as described in the Apache 1.3 installation section. $Id: INSTALL.html,v 1.26 2005/08/13 22:46:40 kiesling Exp $