solr #4

Description

Enterprise search server based on Lucene
.
Solr is an open source enterprise search server based on the Lucene
Java search library, with XML/HTTP and JSON APIs, hit highlighting,
faceted search, caching, replication, and a web administration interface.


Overview

Enterprise search server based on Lucene.

Solr is an open source enterprise search server based on the Lucene
Java search library, with XML/HTTP and JSON APIs, hit highlighting,
faceted search, caching, replication, and a web administration interface.

http://lucene.apache.org/solr/

Usage

General

A Solr deployment consists of a Solr service:

juju deploy --repository . local:solr solr

Once deployed, you can access its admin interface over a ssh tunnel:

ssh -L 8080:localhost:8080
Open web browser at http://localhost:8080/solr/admin/

(NOTE: For LXC based deployments, a ssh tunnel isn't required and you can simply
access 'http://:8080/solr/admin/')

A Solr instance isn't security hardened enough to safely expose to the
Internet, even with password protection (see below). Use on an internal
network only. Should you need to expose an instance for development purposes,
you can still issue:

juju expose solr

This will expose solr on port 8080.

The Solr server is protected by HTTP digest password protection which isn't
enabled by default, to enable set one or many passwords related to the admin,
update and search roles:

juju set solr "admin-password="
juju set solr "update-password="
juju set solr "search-password="

(NOTE: You could also do this at deployment with the '--config' switch)

Each role has its own password and set of abilities. The abilities are
hierarchical such that the update role includes the abilities of the search
role and the admin role includes all abilities. The roles would typically be
used as:

(Username)
solr-admin - all access
solr-update - POST'ing new documents to index
solr-search - search querying only

Setting an empty password disables protection for that role.

You can search a Solr instance with the following example queries:

# show all documents
http://.../solr/select?q=:

# output in json
http://.../solr/select?q=:&wt=json

The XML schema can be set as follows:

juju set solr "schema=$(base64 < )"

Uploading documents can be done from the commandline using curl (installed on
unit), see 'http://wiki.apache.org/solr/' for details.

Indexing from Database

You can automatically index the content of a RDBMS such as MySQL or PostgreSQL
using Solr's Data Import Handler (DIH). Database configuration is controlled
using the 'db-config' option. This is a comma separated list of core, database
service and database colon separated values:

::,::

if '' is omitted then 'core0' (initial core) is assumed.

(Note: You can use multiple databases from the same MySQL service. When a
database isn't defined for a MySQL service, the database 'solr' is used.
PostgreSQL doesn't support multiple databases from the same service, so database
is ignored.)

For example:

# core0 uses mysql database solr1
# core1 uses mysql database solr2
juju set solr "db-config=core0:mysql:solr1,core1:mysql:solr2"

# core0 uses postgresql
# core1 uses mysql2 database solr
juju set solr "db-config=core0:postgresql,core1:mysql2:solr"

Having deployed the appropriate charm, add the relation:

# mysql
juju add-relation solr:db-mysql mysql:db

# postgresql
juju add-relation solr:db-pgsql postgresql:db

# upload data-config document
juju set solr "dih-config=$(base64 < )"

# core1 data-config document
juju set solr "core1-dih-config=$(base64 < )"

Then use the DIH web interface to perform the importing:

# full import
http://.../solr/dataimport?command=full-import

# delta import
http://.../solr/dataimport?command=delta-import

# status
http://.../solr/dataimport?command=status

See 'http://wiki.apache.org/solr/DataImportHandler' for more details.

JMX

JMX monitoring is disabled by default, but you can enable it with the
following commands:

juju set solr jmx-enabled=True
juju set solr jmx-control-password=
juju set solr jmx-monitor-password=

The control role (username - 'controlRole') allows read/write access, the
monitor role (username - 'monitorRole') read only access. If a password is
empty, it disables access for that role.

There is a final JMX option, 'jmx-localhost'. This determines what hostname is
given to the JMX client to connect to. If false, the internal hostname or IP
address of the unit will be used, meaning connection is suited to either LXC
based deployments or cloud deployments where you have VPN access:

JConsole or VisualVM connect to :10001 with
username/password

For cloud deployments, setting this to true uses 'localhost' hostname, which
allows you to connect over a ssh tunnel:

ssh -L 10001:localhost:10001 -L 10002:localhost:10002
JConsole or VisualVM connect to localhost:10001 with username/password

The latter is much more typical of out of the box deployment, so 'jmx-localhost'
defaults to True.

Multiple Cores

A Solr service by default has a single index ('core'). However with this charm,
you can enable up to 4 extra indexes, each of which may be queried, updated,
etc. individually. Such an arrangement is typical for large
applications which require querying multiple sets of data.

To enable an extra core:

# enable core
juju set solr core-enabled=True

# set core schema
juju set solr "core-schema=$(base64 < )"

where '' is the core number, currently 1-4.

URLs have a '/core/' component to determine the core being accessed. For
example 'http://.../solr/core1/select?q=:'. Excluding the core
component defaults to the first index (or 'core0').

Replication

Solr supports the ability to replicate an index from a master unit to one or
more slave units:

# deploy second service
juju deploy --repository . local:solr solr-slave

# add master/slave relationship
juju add-relation solr:master solr-slave:slave

# add further slave(s)
juju add-unit solr-slave

Each slave will poll and replicate the master's index and schema. Slaves should
be treated as search only. Updates should be applied to the master only. All
enabled cores will be replicated.

Replication allows you to spread the querying load across multiple slave
servers. A slave can also be used to take a backup of the index:

http://.../solr/[core/]replication?command=backup

A concurrency safe snapshot of the index will be created under
'/var/lib/solr/core'. (You can use this backup URL on a non-replicated unit
too).

Masters and slaves relations can be combined freely, so you can set a unit to be
both a master and slave, called a 'Solr Repeater', acting as a local proxy for
the master.

See 'http://wiki.apache.org/solr/SolrReplication' for more details.

Distributed Search

Solr supports the ability to divide an index across multiple units. Each unit
holds and queries its own complete index which requires you to provide your own
mechanism for distributing (ideally randomly) documents to each unit. To create
such a deployment:

# add 2 more units for a 3 unit index
juju add-unit -n 2 solr

Then query across 3 machines:

# show all documents
http://:8080/solr/select?shards=:8080/solr[/core],:8080/solr[/core],:8080/solr[/core]&q=:

(NOTE: The unit addresses need to be contactable from the initial unit you
query. So if you are using a ssh tunnel, these addresses will be the private
LAN addresses instead of localhost.)

If you need to update the index schema, updating this via juju (see above) will
apply the schema across all units providing they are deployed under a single
service.

(NOTE: Distributed search doesn't use a shared Inverse Document Frequency (IDF),
so irregularities in the scoring will occur compared to a combined index on a
single machine. It's therefore important that you distribute documents randomly
across units.)

See 'http://wiki.apache.org/solr/DistributedSearch' for more details.

Distributed Search + Replication

Distributed search features can be combined with replication to do a so called
'deep scale' deployment. Using the notes above, you can create multiple
replication clusters each of which forms a node of a distributed search. This
is a somewhat exotic deployment so is left to the user to explore!

Configuration

jmx-enabled
(boolean) Enable Solr/Tomcat JMX monitoring.
core3-enabled
(boolean) "Enable core 3."
core2-enabled
(boolean) "Enable core 2."
core4-dih-config
(string) Data Import Handler XML configuration (base64 encoded). . "Note: 'dataSource' element will be ignored."
search-password
(string) "Solr search ('solr-search') password."
core4-schema
(string) Solr XML schema (base64 encoded).
lucene-version
(string) Lucene index version to use.
LUCENE_36
jmx-control-password
(string) JMX control password.
core2-schema
(string) Solr XML schema (base64 encoded).
core3-schema
(string) Solr XML schema (base64 encoded).
core4-enabled
(boolean) "Enable core 4."
core1-schema
(string) Solr XML schema (base64 encoded).
dist-url
(string) Distribution URL.
http://mirror.catn.com/pub/apache/lucene/solr/3.6.0/apache-solr-3.6.0.tgz
db-config
(string) Database relation configuration.
core1-enabled
(boolean) "Enable core 1."
admin-password
(string) "Solr admin ('solr-admin') password."
schema
(string) Solr XML schema (base64 encoded).
jmx-localhost
(boolean) Use localhost over LAN hostname in connections. Useful for SSH tunnels.
True
core1-dih-config
(string) Data Import Handler XML configuration (base64 encoded). . "Note: 'dataSource' element will be ignored."
dih-config
(string) Data Import Handler XML configuration (base64 encoded). . "Note: 'dataSource' element will be ignored."
jmx-monitor-password
(string) JMX monitor password.
java-opts
(string) Java options for Solr/Tomcat JVM.
-Xmx1024M
core2-dih-config
(string) Data Import Handler XML configuration (base64 encoded). . "Note: 'dataSource' element will be ignored."
update-password
(string) "Solr update ('solr-update') password."
dist-md5
(string) MD5 distribution hash.
ac11ef4408bb015aa3a5eefcb1047aec
core3-dih-config
(string) Data Import Handler XML configuration (base64 encoded). . "Note: 'dataSource' element will be ignored."