201 lines
6.7 KiB
Plaintext
201 lines
6.7 KiB
Plaintext
= Recoll WebUI Apache installation from scratch
|
|
|
|
The https://github.com/koniu/recoll-webui[Recoll WebUI] offers an
|
|
alternative, WEB-based, interface for querying a Recoll index.
|
|
|
|
It can be quite useful to extend the use of a shared index to multiple
|
|
workstations, without the need for a local Recoll installation and shared
|
|
data storage.
|
|
|
|
The Recoll WebUI is based on the
|
|
http://bottlepy.org/docs/dev/index.html[Bottle Python framework], which has
|
|
a built-in WEB server, and the simplest deployment approach is to run it
|
|
standalone. However the built-in server is restricted to handling one
|
|
request at a time, which is problematic in multi-user situations,
|
|
especially because some requests, like extracting a result list into a CSV
|
|
file, can take a significant amount of time.
|
|
|
|
The Bottle framework can work with several multi-threading Python HTTP
|
|
server libraries, but, given the limitations of the Recoll Python module
|
|
and the Python interpreter itself, this will not yield optimal performance,
|
|
and, especially can't efficiently leverage the now ubiquitous
|
|
multiprocessors.
|
|
|
|
In multi-user situations, you can get better performance and ease of use
|
|
from the Recoll WebUI by running it under Apache rather than as a
|
|
standalone process. With this approach, a few requests per second can
|
|
easily be handled even in the presence of long-running ones.
|
|
|
|
Neither Recoll nor the WebUI are optimized for high multi-user load, and it
|
|
would be very unwise to use them as the search interface to a busy WEB
|
|
site.
|
|
|
|
The instructions about using the WebUI under Apache as given in the
|
|
repository README are a bit terse, and are missing a few details,
|
|
especially ones which impact performance.
|
|
|
|
Here follows the synopsis of two WebUI installations on initially
|
|
Apache-less Ubuntu (14.04) and DragonFly BSD systems. The first should
|
|
extend easily to other Debian-based systems, the second at least to
|
|
FreeBSD. rpm-based systems are left as an exercise to the reader, at least
|
|
for now...
|
|
|
|
|
|
CAUTION: THE CONFIGURATIONS DESCRIBED HAVE NO ACCESS CONTROL. ANYONE WITH
|
|
ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY
|
|
DOCUMENT.
|
|
|
|
== On a Debian/Ubuntu system
|
|
|
|
=== Install recoll
|
|
|
|
sudo apt-get install recoll python-recoll
|
|
|
|
Configure the indexing and check that the normal search works (I spent
|
|
quite a lot of time trying to understand why the WebUI did not work, when
|
|
in fact it was the normal recoll configuration which was broken and the
|
|
regular search did not work either).
|
|
|
|
Take care to be logged in as the user you want to run the web search as
|
|
while you do this.
|
|
|
|
|
|
=== Install the WebUI
|
|
|
|
Clone the github repository, or extract the master tar installation, and
|
|
move it to '/var/www/recoll-webui-master/'. Take care that it is read/execute
|
|
accessible by your user.
|
|
|
|
=== Install Apache and mod-wsgi
|
|
|
|
|
|
sudo apt-get install apache2 libapache2-mod-wsgi
|
|
|
|
I then got the following message:
|
|
|
|
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
|
|
|
|
To clear it, I added a ServerName directive to the apache config, maybe you
|
|
won't need it. Edit '/etc/apache2/sites-available/000-default.conf' and add
|
|
the following at the top (globally). Things work without this fix anyway,
|
|
this is just to suppress the error message. You probably need to adjust the
|
|
address or use a real host name:
|
|
|
|
ServerName 192.168.4.6
|
|
|
|
|
|
Edit '/etc/apache2/mods-enabled/wsgi.conf', add the following at the end of
|
|
the "IfModule" section.
|
|
|
|
Change the user ('dockes' in the example) taking care that he is the one who
|
|
owns the index ('.recoll' is in his home directory).
|
|
|
|
WSGIDaemonProcess recoll user=dockes group=dockes \
|
|
threads=1 processes=5 display-name=%{GROUP} \
|
|
python-path=/var/www/recoll-webui-master
|
|
WSGIScriptAlias /recoll /var/www/recoll-webui-master/webui-wsgi.py
|
|
<Directory /var/www/recoll-webui-master>
|
|
WSGIProcessGroup recoll
|
|
Order allow,deny
|
|
allow from all
|
|
</Directory>
|
|
|
|
NOTE: the Recoll WebUI application is mostly single-threaded, so it is of
|
|
little use (and may actually be counter-productive in some cases) to
|
|
specify multiple threads on the WSGIDaemonProcess line. Specify multiple
|
|
processes instead to put multiple CPUs to work on simultaneous requests.
|
|
|
|
|
|
Then run the following to restart apache:
|
|
|
|
sudo apachectl restart
|
|
|
|
The Recoll WebUI should now be accessible. on 'http://my.server.com/recoll/'
|
|
|
|
NOTE: Take care that you need a '/' at the end of the URL used to access
|
|
the search (use: 'http://my.server.com/recoll/', not
|
|
'http://my.server.com/recoll'), else files other than the script itself are
|
|
not found (the page looks weird and the search does not work).
|
|
|
|
CAUTION: THERE IS NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK
|
|
WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT.
|
|
|
|
== Variant for BSD/ports
|
|
|
|
=== Packages
|
|
|
|
As root:
|
|
|
|
pkg install recoll
|
|
|
|
|
|
Do what you need to do to configure the indexing and check that the normal
|
|
search works.
|
|
|
|
Take care to be logged in as the user you want to run the web search as
|
|
while you do this.
|
|
|
|
pkg install apache24
|
|
|
|
Add apache24_enable="YES" in /etc/rc.conf
|
|
|
|
pkg install ap24-mod_wsgi4
|
|
pkg install git
|
|
|
|
=== Clone the webui repository
|
|
|
|
cd /usr/local/www/apache24/
|
|
git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
|
|
|
|
Important: most input handler helper applications (e.g. 'pdftotext') are
|
|
installed in '/usr/local/bin' which is not in the PATH as seen by Apache
|
|
(at least on DragonFly). The simplest way to fix this is to modify the
|
|
launcher module for the webui app so that it fixes the PATH.
|
|
|
|
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after
|
|
the 'import os' line:
|
|
|
|
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
|
|
|
|
|
|
|
|
=== Configure apache
|
|
|
|
Edit /usr/local/etc/apache24/modules.d/270_mod_wsgi.conf
|
|
|
|
Uncomment the LoadModule line, and add the directives to alias /recoll/ to
|
|
the webui script.
|
|
|
|
Change the user (dockes in the example) taking care that he is the one who
|
|
owns the index (.recoll is in his home directory).
|
|
|
|
Contents of the file:
|
|
|
|
## $FreeBSD$
|
|
## vim: set filetype=apache:
|
|
##
|
|
## module file for mod_wsgi
|
|
##
|
|
## PROVIDE: mod_wsgi
|
|
## REQUIRE:
|
|
|
|
LoadModule wsgi_module libexec/apache24/mod_wsgi.so
|
|
|
|
WSGIDaemonProcess recoll user=dockes group=dockes \
|
|
threads=1 processes=5 display-name=%{GROUP} \
|
|
python-path=/usr/local/www/apache24/recoll-webui-master/
|
|
WSGIScriptAlias /recoll /usr/local/www/apache24/recoll-webui-master/webui-wsgi.py
|
|
|
|
<Directory /usr/local/www/apache24/recoll-webui-master>
|
|
WSGIProcessGroup recoll
|
|
Require all granted
|
|
</Directory>
|
|
|
|
=== Restart apache
|
|
|
|
As root:
|
|
|
|
apachectl restart
|
|
|
|
|