Please note that "haupia" has been renamed to "SmartSearch" with version 2.4.0. In all instances where there may be discrepancies or confusion, "SmartSearch" should be considered the correct and updated term. This might not be reflected in old scripts and examples, so we apologize for any inconvenience and hope that the instances requiring correction will be minimal. |
1. Introduction
The SmartSearch bundles requirements that customers place on the search function of an online presence: An intuitive, high-performance search solution that can be used on extensive websites and delivers relevant results. It offers both high hit quality and optimum search convenience, thus retaining customers on the website.
At the same time, it provides editors with a web interface through the integrated SmartSearch cockpit that can be used without IT knowledge. Editors from specialist and marketing departments are thus enabled to control and monitor search results on the web interface. For this purpose, the cockpit provides statistics, filters and analysis functions and allows the indexing of various data types (for example XML, audio, video, media) from different data sources. With the help of individualized hit lists, editors can prioritize and weight search results in the back end and display selected content for predefined search queries.
This document is intended for administrators and therefore only describes the required installation and configuration steps. All functionalities and use cases provided by the SmartSearch are included in the SmartSearch documentation.
1.1. Architecture
The functionalities of the SmartSearch are realized by an architecture made up of a range of different components (see figure Architecture).
These components are:
-
ZooKeeper
-
Solr
-
SmartSearch
The individual components always interact according to the following schema:
-
Prior to creating the search index, the SmartSearch must collect the required data. For this, it accesses the information to be collected on the customer side, which can exist in the form of websites, portals or databases. In addition, a REST interface provides the option of filling the search index with further data from outside.
-
After that, the SmartSearch server normalizes the data and transfers it to the Solr server. The server receives the data and persists it in an index.
-
The query for data is done equivalently: The SmartSearch server receives the request, modifies it, and then forwards it to the Solr server. This responds with a search result, which the SmartSearch server returns to the customer’s end application via the REST interface.
-
The SmartSearch cockpit is to be seen detached from the other components. It serves the administration of the SmartSearch server and therefore offers a simple, web-based administration interface. Among other things, search solutions can be created and configured in this interface.
-
Configurations made in the SmartSearch cockpit are saved on the ZooKeeper server together with the Solr configuration data.
The communication to the outside is protected by HTTPS, between the components it is done via HTTP.
1.2. Technical Requirements
To deploy SmartSearch, the following technical requirements must be met:
-
Java 11 as the Java Development Kit (JDK) for running ZooKeeper and Solr
-
ZooKeeper version 3.4.10
-
Solr version 8.11.2 in cloud mode
-
SmartSearch in the latest version, specifically requiring Java 21 for execution
Despite Java 11 being the default requirement for ZooKeeper and Solr, SmartSearch operates on Java 21. To accommodate this, both Java 11 and Java 21 must be present on the system. However, only SmartSearch, requires Java 21.
To run SmartSearch with Java 21, execute it using the Java 21 executable. Ensure that Java 11 remains the system’s default JDK for all other operations. You can start the SmartSearch server by specifying the path to the Java 21 executable as follows:
/path/to/java21/bin/java -jar Server.jar -server -Dhaupia.master.profile=STANDALONE -Dfile.encoding=UTF-8
This approach allows you to use Java 21 for running the SmartSearch server without altering the default Java environment configured for Java 11, ensuring compatibility with other dependencies.
2. Installation and Configuration
In order to use the functionalities of the SmartSearch, the individual components must first be installed, which can be distributed as required and are freely scalable. It is essential that they are installed in the following order so that the basic configurations can be made correctly:
-
ZooKeeper
-
Solr
-
SmartSearch
ZooKeeper and Solr are not included in the delivery. They must therefore be downloaded before installation in the version specified in the technical requirements. |
It is not recommended to perform the installation of the components as root. Instead, the use of a technical user with the appropriate access rights is recommended. The name of the technical user is expected to match the name of the component in each case. Solr creates this user automatically. Manual creation is therefore not necessary in this case. |
Since most computer systems are based on Linux, the following subchapters also concentrate exclusively on installation under Linux. The specified commands refer to the following system:
-
Ubuntu 18/04 LTS (Bionic Beaver)
-
OpenJDK 11
Furthermore, the scenario of simple redundancy is assumed for the description of the installation (see figure Simple redundancy). The two nodes N1 and N2 shown in the following figure correspond to two physical or virtual systems, on each of which a ZooKeeper, Solr, and SmartSearch instance is to be installed. Such a cluster operation of at least two nodes ensures basic failover and data redundancy.
If the aspects of failover and data redundancy are not relevant, the entire stack may also be installed on a single node. The chapter on the operation of a single node describes the differences to be considered for this. |
The described scenario can be considered as the basis for a cluster operation with any number of nodes.
2.1. Ports used
Various ports are mentioned in the context of the following chapters on the installation of the various components. The following table shows an overview of these ports and explains their meaning.
Port | Meaning |
---|---|
8181 |
This port is for the SmartSearch cockpit and allows access to the REST API. |
8983 |
This port allows access to the Solr interface and to the API. |
2181 |
This port enables client communication of a ZooKeeper instance. |
2888 |
This port is used for communication between the ZooKeeper instances. |
3888 |
This port is also used for communication between the ZooKeeper instances. |
2.2. ZooKeeper
To backup the configurations of Solr and the SmartSearch, it is necessary to install and configure two ZooKeeper instances, which are not included in the delivery. To do this, download ZooKeeper in the version specified in the technical requirements and install it on both systems in the ZooKeeper
directory to be created. This folder can be located anywhere in the respective system.
By default, the assumption is that the installation directory to be created is located in the |
Then create the necessary configuration file under conf/zoo.cfg
for each system with the following command.
cp ~/zookeeper/conf/zoo_sample.cfg ~/zookeeper/conf/zoo.cfg
In the same directory, create a file named java.env
to adjust the amount of memory available to ZooKeeper. In this file, specify the parameters to change the memory for ZooKeeper.
touch ~/zookeeper/conf/java.env
By default ZooKeeper saves configuration data in the tmp
directory. However, since this is only for temporary storage, the ZooKeeper-data
directory to be created in the installation directory of both systems is needed instead. To ensure ZooKeeper uses it for persistence, the path of the ZooKeeper-data
directory must be specified in the zoo.cfg
file as the value of the dataDir
parameter on both systems. In the same file, add the hostnames or static IP addresses of all ZooKeeper instances to be included. In this case, this specification corresponds to the hostnames hostname-n1
and hostname-n2
. By default, the ZooKeeper instances use the ports 2888
and 3888
to communicate with each other. The port specification after the semicolon refers to the port on which ZooKeeper listens for connections. Additionally, the permission of so-called 4 letter words
commands is required, since Solr needs them.
In the described server configuration, each server is given an ID: For example, in the server.1
specification, the ID is 1
. The ID must correspond to a number between 1 and 255. The definition of the server ID is done in the file myid
to be created in the data directory, which has no further content beyond that.
server.1=hostname-n1:2888:3888;2181
server.2=hostname-n2:2888:3888;2181
4lw.commands.whitelist=*
After that, both ZooKeeper servers can be started using the following command.
~/zookeeper/bin/zkServer.sh start
In addition to the SmartSearch configurations, the Solr server settings are also stored in ZooKeeper. To avoid conflicts, it is necessary to separate the data. This is achieved by creating an empty Solr node that ZooKeeper uses to persist the Solr files. The node can only be created with a running ZooKeeper client, which must be terminated afterwards.
Since ZooKeeper keeps the data synchronized, creation of the Solr node is only necessary on one of the two instances. The SmartSearch node, which is also required, is automatically created during the installation of the SmartSearch server and therefore does not need to be created manually. |
~/zookeeper/bin/zkCli.sh
create /solr ""
quit
Finally, copy the ZooKeeper.service
file, which is included in the systemd-samples
folder in the delivery, to the directory on both systems respectively. This allows to use ZooKeeper as a service and to control it via
systemctl
.
It must be ensured that both the user and the path of the installation directory are specified correctly in each case in the service file. By default, the user |
The storage of the configurations from the backend of the SmartSearch is also done in ZooKeeper. The name of the root node is haupia
by default. Thus a separation to the Solr data is ensured. Within this folder the configuration data are stored readable. The assigned names in the configuration are also the names of the nodes.
2.3. Solr
Persistence of custom data collected by the SmartSearch server is done using two Solr servers. These are not included in the delivery and must therefore be installed manually. To do this, download Solr in the version specified in the technical requirements and install it on both systems in the Solr
directory to be created. The directory can be located anywhere in the respective system.
By default, the assumption is that the installation directory to be created is located in the |
Solr provides the install_solr_service.sh
script for the installation, which first has to be extracted from the downloaded file and then executed on both systems. The target directory for persisting the collected data can be chosen freely in each case. The script installs both Solr servers as a service, creates the user solr on them and also creates all the required files and directories.
For the pending configuration, it is mandatory that both Solr servers are in a non-running state. |
./install_solr_service.sh solr-8.11.2.tgz -d <VARIABLE_PATH> -n
For example, for memory usage, the Solr servers require various Java variables. These are to be defined per system in the configuration file etc/default/solr.in.sh
.
Furthermore, the use of the Solr cloud must be enabled in this file respectively and the Solr node previously created during the ZooKeeper installation must be specified.
Before executing the script, make sure that the Solr node was created during the ZooKeeper installation and actually exists there. |
sed -i 's/#ZK_HOST=.*/ZK_HOST=hostname-n1:2181,hostname-n2:2181\/solr/' /etc/default/solr.in.sh
Since Solr versions before and inlcuding 8.6.3 are affected by the Log4J CVE-2021-44228 issue, the recommendation is to adjust the following configuration: if such a version is being used on older installations.
More information on the topic can be found https://solr.apache.org/news.html#apache-solr-affected-by-apache-log4j-cve-2021-44228 [here]. |
Finally, copy the solr.service
file, which is included in the systemd-samples
folder in the delivery, to the directory on both systems respectively. This allows to use Solr as a service and to control it via
systemctl
.
It must be ensured that both the user and the path of the installation directory are specified correctly in each case in the service file. By default, an installation under |
The SmartSearch provides its own schema to be installed on the Solr server. For this, after the Solr installation, but before the first start, the following jar files must also be present in the classpath. These are part of the Solr delivery and are contained in the contrib
directory.
-
morfologik-stemming-X.Y.Z.jar
-
morfologik-fsa-X.Y.Z.jar
-
morfologik-polish-X.Y.Z.jar
-
lucene-analyzers-morfologik-X.Y.Z.jar
-
lucene-analyzers-smartcn-X.Y.Z.jar
2.4. SmartSearch
The SmartSearch is a search solution based on Spring Boot and enables the control, filtering and analysis of individualized result lists. In order to use the functionalities of the SmartSearch stack, it has to be installed on both systems in the SmartSearch
directory to be created. It is provided in the form of a zip file. The installation directory created for each system must also contain the license, which can be requested from Technical Support. The directory can be created at any location in the respective system.
By default, the assumption is that the installation directory to be created is located in the |
After installation of the SmartSearch stack, it is necessary to perform the following configuration steps on both systems:
- Creation of the directory data
-
The SmartSearch stack needs the directory
data
for data storage. This must be created manually within the respective installation directory.
- Adaptation of the application.yml file
-
The
application.yml
file included in the delivery enables the configuration of the respective SmartSearch server. In particular, the following points must be observed:-
The SmartSearch server uses port
8181
by default. However, you can optionally customize this per system. -
The communication of the server to the outside is protected by SSL. For this purpose, a self-signed certificate is included in the delivery. This is exclusively for local development and must be exchanged with an officially signed certificate for productive use.
The SSL configuration included in theapplication.yml
file may be modified, but not removed, since a valid SSL configuration is required for the operation of the SmartSearch. -
For the storage of configuration data the SmartSearch server needs a connection to the ZooKeeper of its system. For this the address of the corresponding ZooKeeper server must be made known to it under the
connection
parameter of theZooKeeper
key. -
The previously created
data
directory must also be made known to the SmartSearch stack. For this purpose, theroot
parameter of thehaupia
key must be adapted accordingly in the associatedapplication.yml
file. -
The host name of each individual SmartSearch instance must be known to the ZooKeeper instances. This must therefore be specified as the value of the
server.address
parameter.-
The host name to be specified is the host name of the node on which the application.yml is located.
-
The specification is made without protocol.
-
The specification is without port.
-
The host name is used to forward from {module name} to the cockpit of the leader in redundant / cluster operation.
-
If no specification is made, it is not possible to use the SmartSearch cockpit. If no specification is made, usage of the SmartSearch cockpit is not possible.
-
-
User data is determined by accessing a cache. This requires a certain amount of time for updating in case of changes. The parameter
haupia.zookeeper.users.cache.await
defines the timeout in seconds to wait for such an update. Initially, the default value equals a timeout of one second. -
The SmartSearch determines the URL scheme to be used from the Solr cluster property
urlScheme
. If this value is not set,http
is used by default. Thesolr.url.scheme
parameter allows overriding the behavior and must have the valuehttp
orhttps
. -
It is possible to secure the Solr server using basic authentication. The required credentials are to be specified via the
solr.auth.username
andsolr.auth.password
parameters. -
When using the FirstSpirit module SmartSearch Connect, the
haupia.connect.datagenerator.pool.size
parameter allows configuring how many FirstSpirit projects must be responded to in the form of a separate data generator. If the value is smaller than the number of connected FirstSpirit projects, data cannot be received from all projects. If this value is not set, then a maximum ofnumber of cores x 500
applies. -
The timeout for automatic user logout from the SmartSearch cockpit is defined by the
server.servlet.session.timeout
parameter. Possible values for it are, for example,60s
or1h
. The default timeout is 15 minutes.
-
The master admin is automatically created at the initial start of the first SmartSearch server and must therefore have the same data for all instances. It can neither be disabled nor deleted, thus preventing lockout from the SmartSearch cockpit. The master admin password can be changed in the SmartSearch cockpit. Starting the SmartSearch server with the optional |
- Start parameters, encoding and JVM mode
-
Both SmartSearch servers need the parameter
-Dhaupia.master.profile=STANDALONE
to start. By default, this is contained in the suppliedserver.conf
file and must not be changed. In addition, the encoding and the mode of the Java Virtual Machine can be determined in this file, among other things. These are also already set and can be adjusted if necessary. Additionally, the file offers the possibility to define the memory usage of the respective SmartSearch server.
Start the SmartSearch stack of the first system via the server.jar
contained in the zip file. This contains a launch script that allows the jar file to be used as a Unix service. Only when the log file of the first SmartSearch instance contains the following message about the successful start, the second instance may be started analogously:
less /opt/SmartSearch/logs/SmartSearch.log
(...) Started Server in 60 seconds (JVM running for 61.2)
When starting server.jar with Java 21, it’s important to ensure that the server runs using the correct version of Java while still functioning seamlessly as a service. If the startup script is executed from the etc/init.d directory, it will recognize this context and start the server as a service. To run server.jar as a service using Java 21, you must explicitly set the MODE variable to service and use the path to the Java 21 executable in the command. Here is how you can start SmartSearch in service mode: MODE=service /path/to/java21/bin/java -jar server.jar -server -Dhaupia.master.profile=STANDALONE -Dfile.encoding=UTF-8 start |
Finally, copy and adapt the smart-search.service
file, which is included in the systemd-samples
folder in the delivery, to the directory on both systems respectively. This allows to use the SmartSearch as a service and to control it via
systemctl
.
Ensure that both the user and the path of the installation directory are correctly specified in the service file in each case. In the revised setup, the service must explicitly reference the Java 21 executable path |
The installed and configured SmartSearch instances communicate with each other and perform a so-called leader election, which determines the leading instance. While the REST service behaves identically on all running instances, the SmartSearch cockpit can only be accessed on the leader. As a result, both the configuration and the automatic or manual execution of the data generators are also only possible on the leading instance.
If an attempt is made to address the cockpit via an instance other than the leader, automatic forwarding takes place. This forwarding is realized by a leader query addressed to ZooKeeper. For this, the ZooKeeper must know the host name of each instance.
In addition, the SmartSearch license must be stored in a text file on the SmartSearch server. The application for the license is done via the Technical Support of the Crownpeak Technology GmbH. The following key allows to refer to this license file in the application.yml
file:
haupia:
licence:
path: ./license.txt
2.5. HTTPS Proxy Configuration
If an http(s) proxy is necessary for the Web/XML crawlers to have access to their crawl targets, it can be configured via JAVA_OPTS. The relevant parameters for configuration are the following:
Parameter | Description | Value |
---|---|---|
http|https.proxyHost |
Host for the http(s) proxy |
12.13.14.15 |
http|https.proxyPort |
Port for the http proxy |
8080 |
http.nonProxyHosts |
Exceptions where the proxy should not be considered. |
localhost |
JAVA_OPTS="-server
-Dhaupia.master.profile=STANDALONE
-Dfile.encoding=UTF-8
-Dhttp.proxyHost=12.13.14.15
-Dhttp.proxyPort=8080
-Dhttps.proxyHost=12.13.14.15
-Dhttps.proxyPort=8080
-Dhttp.nonProxyHosts=localhost|127.0.0.1"
In the updated setup, where a This might also be needed in the |
3. Further configuration
The SmartSearch is based on Spring Boot and can be configured via the application.yml
file, which is located in the SmartSearch
directory next to the server.jar
file. The application.yml
file provides a variety of configuration options, which are described in the Spring Boot documentation. For example, it offers the possibility to configure logging and security settings. The following example explains the configuration of SSL, which is a common use case.
Example: SSL configuration
By default, the SmartSearch uses SSL to enable secure data transfer, using the Spring Boot Jetty. This works in most cases. However, problems can arise with the choice of protocol or encryption. The Spring Boot documentation describes the configuration keys server.ssl.enabled-protocols
and server.ssl.ciphers
which can be used to solve such problems. First, to get a list of possible parameters, logging for Jetty must be set to DEBUG
in the right place. To do this, add the following configuration to the application.yml
file:
logging:
level:
org:
eclipse:
jetty:
util:
ssl: DEBUG
After configuring logging and restarting, the log contains a DEBUG
output at the end of the startup process, which has approximately the following appearance:
May 22 11:14:50 smart-search-cluster-1 server.jar[1208]: 2017-05-22 11:14:50.621 DEBUG 1230 --- [ main] o.e.jetty.util.ssl.SslContextFactory : Selected Protocols [TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
May 22 11:14:50 smart-search-cluster-1 server.jar[1208]: 2017-05-22 11:14:50.622 DEBUG 1230 --- [ main] o.e.jetty.util.ssl.SslContextFactory : Selected Ciphers [TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, [...]
May 22 11:14:50 smart-search-cluster-1 server.jar[1208]: A, SSL_DH_anon_WITH_DES_CBC_SHA, [...]
These two log outputs show the current values for the protocol and encryption as well as the possible values. To limit the encryption to two encryption methods if required, the application.yml
file has to be adapted as follows:
server:
ssl:
ciphers:
TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256
The same applies to the protocol used. After a successful configuration, it can be found in the logs during a restart.
3.1. Configuration of the Solr replicas
Finally, after installing ZooKeeper, Solr and the SmartSearch, the following steps must be performed in the Solr web interface. They enable on both Solr instances the automatic replication of the data needed by the SmartSearch to answer search queries.
The steps outlined refer only to the described cluster operation with at least two nodes. For a single node operation they can be ignored. |
Use the URL http://hostname-<INSTANCE>:8983
to open the Solr web interface on one of the two systems and select the menu item Collections
. The web interface will then display a list of all existing collections. In this list, select the collection that has the name of the client and click on the Shard: shard1 item.
The Add replica button enables the creation of another replica. Leave the dropdown in the state No specified node
and create the replica using the button Create Replica. Solr selects a free node automatically.
Reload the web interface after creating the replica and verify the existence of the second replica by clicking on the Shard: shard1 item.
Next, repeat the described steps with the collection that has the suffix _signals
.
3.2. Single node operation
If the aspects of failover and data redundancy are not relevant for the operation of the SmartSearch, the entire stack may also be installed on a single node.
The installation of the individual instances is performed in the same way as described in the previous chapters. However, the following differences should be noted:
- Specification of ZooKeeper instances
-
In cluster mode, the hostnames or static IP addresses of all ZooKeeper instances to be included must be specified in the
zoo.cfg
files of all systems. This specification is omitted in the case of a single node. - Sed script to adjust the Solr configuration
-
The adjustment of the Solr configuration is done via the script
solr.in.sh
. The command for its execution in this case is as follows:Adaptation of the Solr configuration for single node operationsed -i 's/#ZK_HOST=.*/ZK_HOST=localhost:2181\/solr/' /etc/default/solr.in.sh
- Solr replikas
-
The creation of replicas refers only to the cluster mode. Therefore, for single node operation, their creation should be ignored.
3.3. LDAP
By default, the management of the users and groups necessary for the use of the SmartSearch, as well as the definition of the permissions, takes place within the SmartSearch cockpit. The storage of the users and groups is done by the previously installed ZooKeeper server.
Alternatively, however, it is possible to implement user and group management based on LDAP. It should be noted that this option only refers to authentication (users and groups), but not to authorization (ACLs).
The LDAPs connection only establishes read access to the LDAP server. A management of users and groups within the SmartSearch cockpit is no longer possible afterwards. |
To use LDAP, the following aspects must be present on the LDAP server:
-
On the LDAP side, the users are to be configured on the groups.
-
If the administration of the users and groups is done via the SmartSearch cockpit, it provides a master admin for the very first login and the initial assignment of permissions. Equivalently, such a master admin is also required on the LDAP side. This must be assigned to the also required group
ADMIN
.Figure 5. User management on the LDAP-side
It should be noted that any user included in the admin group is allowed to edit permissions within the SmartSearch cockpit without restriction. |
-
The successful login of a user requires that the SmartSearch cockpit knows the password encoder used by LDAP. This can therefore be specified in the password field in the form
{id}hash
. The id corresponds to the id of the hashing algorithm and must be one of the following values:-
bcrypt
-
sha, SHA oder SHA-1
-
md4 oder MD4
-
md5 oder MD5
-
noop oder NOOP
-
pbkdf2
-
SHA-256, SHA256 oder sha256
-
Alternatively, it is possible to define the password encoder in the |
In addition to the adaptations on the LDAP side, the following mandatory parameters must be specified in the application.yml
file of the SmartSearch server:
-
haupia.ldap.enable
This parameter enables the use of LDAP. The value of thehaupia.ldap.enable
parameter must be set totrue
. -
spring.ldap.username
andspring.ldap.password
The connection to the LDAP server requires a technical user to be made known to the SmartSearch server. For this reason, the Distinguished Name (DN) and the password of the technical user must be specified for thespring.ldap.username
andspring.ldap.password
parameters. -
haupia.ldap.user-search-base
orhaupia.ldap.group-search-base
For searching the user or group objects, the corresponding Distinguished Name (DN) must also be specified as the value of thehaupia.ldap.user-search-base
orhaupia.ldap.group-search-base
parameter.
(Example:ou=people,dc=example,dc=org
orou=groups,dc=example,dc=org
)
The search for user or group objects refers exclusively to the specified level. Subtrees are excluded from this search. |
In addition to these mandatory details, further optional adaptations can also be made:
-
spring.ldap.urls
This parameter contains a list of URLs of the available LDAP servers. By default, the parameter has the valueldap://localhost:389
. -
spring.ldap.base
This parameter can be used to define a DN suffix for all operations performed against the LDAP server. -
haupia.ldap.user-filter
To find a user object in the LDAP server tree, a filter can be specified with this parameter. By default it has the valueuid={0}
. The placeholder{0}
is replaced by the entered user name. -
haupia.ldap.group-filter
Equivalent to the user filter, this parameter allows to specify a group filter that finds all group objects belonging to a user. By default it has the value(member={0})
. The{0}
placeholder is replaced by the Distinguished Name (DN) of the corresponding user in this case. -
haupia.ldap.default-password-encoder
If the password encoder used is not added to the password field on the LDAP server, it can be specified using this parameter instead. By default it has the valuebcrypt
. -
haupia.ldap.user-attributes.uid
andhaupia.ldap.user-attributes.password
The values of these parameters correspond to the fields where the name or password of a user is stored on the LDAP side. By default they have the valuesuid
anduserPassword
. -
haupia.ldap.user-attributes.language
andhaupia.ldap.user-attributes.default-language
Thelanguage
parameter can be used to control in which language the SmartSearch cockpit starts. If it is missing, the value of thedefault-language
parameter is used instead. This parameter initially defines English as the default language. -
haupia.ldap.group-attributes.name
andhaupia.ldap.group-attributes.user
Equivalent to the user attributes, the values of these parameters correspond to the name of a group and the list of users contained in a group. By default, they have the valuescn
andmember
.
The group names are always displayed in uppercase in the SmartSearch cockpit, even if they have a different spelling on the LDAP side. |
spring:
[...]
ldap:
urls: ldap://localhost:389
password: admin
username: cn=admin,dc=example,dc=org
base: dc=example,dc=org
[...]
haupia:
[...]
ldap:
enable: false
user-search-base: ou=people,dc=example,dc=org
user-filter: uid={0}
group-search-base: ou=groups,dc=example,dc=org
group-filter: (member={0})
default-password-encoder: bcrypt
user-attributes:
uid: uid
password: userPassword
language: language
default-language: en
group-attributes:
name: cn
user: member
4. SSL
The processing of the data collected by the SmartSearch requires communication between the individual components and the customer’s end application. The communication of the SmartSearch stack to the outside is thereby protected by SSL. By default, the SmartSearch server uses a self-signed certificate for this purpose, which is included in the delivery.
The self-signed certificate included in the delivery is only designed for use in local development environments. For the use of the SmartSearch stack in productive operation, the use of an officially signed certificate is therefore strongly recommended. |
The execution of the following steps assumes that an officially signed certificate has already been requested and that the file |
To use the SmartSearch stack in the production system, a new keystore must first be created. This must be made known to the server to replace the existing certificates. The following command shows an example for the creation of the new keystore:
openssl pkcs12 -export -in server.crt -inkey server.key -out server.p12 -name server -CAfile ca.crt -caname root
The command causes a conversion of the x509 certificate and the associated key into a pkcs12 file. The password to be assigned must be specified in the application.yml
file under the key-store-password
parameter of the key ssl
. At the same place the newly created keystore must be defined for the key-store
parameter as well as the keyAlias server
.
The operation of the SmartSearch requires a valid SSL configuration. Due to this, the SSL configuration may be changed, but not removed. |
server:
ssl:
key-store: server.p12
key-store-password: PASSWORD
keyStoreType: PKCS12
keyAlias: server
4.1. Unsigned SSL certificates
When crawling an https server, the error shown below may occur. It states that no valid certificate is entered in the keystore.
Caused by: sun.security.validator.ValidatorException:
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
To fix the error, the SSL certificate of the crawled https server must be included in the keystore of the SmartSearch server. The certificate is included by downloading the certificate file, which is to be added to the JRE with the keytool
command.
An example call for Mac OS looks like this:
sudo keytool -import -alias ANY_ALIAS -keystore /PATH_TO_JAVA_JRE/lib/security/cacerts -file /PATH/TO/THE/CERTIFICATE/certificates.file
After successful integration of the certificate, the described error no longer occurs and the page is crawlable as desired.
5. Monitoring
Monitoring is used to monitor processes, procedures and states. Thereby the main purpose is to ensure that the SmartSearch server continues to run properly and that no failures occur.
The SmartSearch also offers this possibility for its own server. A simple connection to Prometheus allows the collection and evaluation of the server data. For the communication between Prometheus and the SmartSearch, minor configurations are required at both ends.
On the SmartSearch side, the key management.metric.export.prometheus.enabled
must be set to true
in the application.yml
.
management:
metrics:
export:
prometheus:
enabled: true
After that, the Prometheus service is available via the following URL:
smart-search-server:port/actuator/prometheus
Finally, the SmartSearch server must be made known to Prometheus. This is done in the configuration file prometheus.yml
and requires the following settings:
scrape_configs:
- job_name : smart-search
metrics_path: /actuator/prometheus
scheme: https
basic_auth:
username: USERNAME
password: PASSWORD
static_configs:
- targets:
- smart-search-server:PORT
5.1. Metrics
To monitor the activity of a SmartSearch server, Prometheus collects the so-called metrics. These are available after performing the configuration for monitoring on the Prometheus server. Metrics are parameters that describe certain measured variables of the server. They allow deriving various information about the behavior of the SmartSearch server and are periodically collected and stored by Prometheus.
The JVM parameters, such as memory usage and the number of threads, are of particular interest. Other important metrics include CPU utilization or the performance of HTTP requests.
5.2. Grafana
The additional configuration of a Grafana instance allows a comprehensive visualization with dashboards. Grafana is a software that displays values from metrics in graphs. The visualization of values offers the advantage of better comparability. Information on how to install and start a Grafana server can be found on the official website.
The addition of the configured Prometheus server as a data source in the Grafana user interface allows access to the metrics provided by the SmartSearch.
5.3. Solr Prometheus exporter
With version 8 Solr also provides a Prometheus exporter. This is located in the contrib
subfolder of the Solr installation folder. More information about the operation and configuration of the exporter can be found in the official documentation.
6. Update
The use of the functions, possibilities as well as performance adjustments of new SmartSearch versions requires regular version updates. With these, changes are potentially necessary both within the software of the SmartSearch itself and in the connected ZooKeeper as well as Solr instances. They are primarily minimal adjustments to the Solr schema or changes to the ZooKeeper structure in which the SmartSearch persists data. The update
startup parameter allows these changes to be automatically applied during a version update by triggering the execution of the required modifications within ZooKeeper and Solr. Its use is necessary only if an exit code 1
occurs during the update process.
A SmartSearch update will never perform an automatic version update for ZooKeeper or Solr. Furthermore, SmartSearch updates are generally backwards compatible and do not affect the functionality of the SmartSearch APIs as far as possible. If in exceptional cases there are deviations from these two points, the release notes explicitly point this out and are to be considered in each case before the execution of an update. This ensures that connected systems are prepared in advance for possible adaptations. SmartSearch update packages can be requested at Crownpeak | e-Spirit Support. |
Performing a version update is always done using the steps below:
-
The first step is to download the new SmartSearch archive and unpack it in any (empty) folder of the target system. It then contains various directories and files, of which the following are potentially relevant for the update:
-
application.yml
-
server.conf
-
server.jar
-
-
Depending on the deployment scenario, either single operation requires this one node to be shut down, or cluster operation requires a single SmartSearch node to be shut down.
-
Before performing any further steps, first check the release notes for any update notes.
-
If the release notes contain any update notes, make the changes to
application.yml
orserver.conf
as described in the release notes. -
In the execution directory of the stopped instance, both the
server.jar
file and theshared_resources
directory must then be replaced with the file or directory of the unpacked archive.Directories like
taglib
ordocumentation
do not have to be replaced to ensure functionality, but they may be replaced as well without risk. -
In the last step, the previously stopped SmartSearch node must be restarted. The software automatically checks at each start whether all necessary configurations exist. If this is not the case or if ZooKeeper respectively Solr need additional changes, it issues a corresponding message and exits with an
exit code 1
.In the case of an
exit code 1
, the information about the missing adjustments is contained in the log of the SmartSearch process and these are to be carried out manually accordingly.Then, to restart the SmartSearch node, the
update
parameter must be used (for example./server.jar update
)The software performs a recheck and makes necessary adjustments to the ZooKeepers or Solrs data structure automatically. These adjustments are also recorded in the SmartSearch log.
After successfully making all necessary changes, the software will start as usual.
If the SmartSearch does not start successfully after being called with the |
In cluster mode, steps 3-6 must then be performed for each SmartSearch node. The use of the update
parameter is only necessary in these cases if the release notes explicitly refer to it.
7. Operation
This chapter contains hints that are necessary for the operation of the SmartSearch as well as the connected systems.
7.1. Resetting the master admin password
At the initial start of the SmartSearch server the master admin is automatically created with the data from the application.yml
. The admin’s password can be changed in the SmartSearch cockpit at any time afterwards. Starting the SmartSearch server with the optional resetAdminPassword
parameter resets the master admin password to the value from the application.yml
. This way, lockout from the SmartSearch cockpit is prevented.
8. Troubleshooting
Data processing by the SmartSearch can only be done if the individual components are working properly. Therefore, if disruptions occur or an update is required, both the SmartSearch server and all ZooKeeper and Solr servers must always be considered. The following sections describe some possible solutions to known challenges.
- Activation of Spring Boot actuators
-
Spring Boot provides a set of actuators that are enabled in the default configuration of the SmartSearch. With these it is possible to get an insight into the server at runtime. The actuators provide for example information about the current environment or about the configurations of the different log levels. Detailed background information about the actuators can be found in the Spring Boot actuators documentation.
The actuators provide their information via REST services. These are secured via http basic authentication and require valid credentials of an admin user.
The following configuration enables the default actuator endpoints, but explicitly disables the
heapdump
endpoint. Theheapdump
endpoint enables a GB-sized download, which may not be desired in a production environment.Configuration for default endpointsmanagement: endpoint: health: show-details: when-authorized metrics: enabled: true prometheus: enabled: true endpoints: web: exposure: include: "*" exclude: - "heapdump" metrics: export: prometheus: enabled: false descriptions: false
The following URL provides an overview of the current actuator endpoints.
-
Method: GET
-
URL: /actuator
The call performs a basic authentication with an admin user. For this purpose the user from the
application.yml
file can be used.Possible response to a query of the actuators{ "_links": { "self": { "href": "https://smart-search-server:8181/actuator", "templated": false }, "auditevents": { "href": "https://smart-search-server:8181/actuator/auditevents", "templated": false }, "beans": { "href": "https://smart-search-server:8181/actuator/beans", "templated": false }, "health": { "href": "https://smart-search-server:8181/actuator/health", "templated": false }, "conditions": { "href": "https://smart-search-server:8181/actuator/conditions", "templated": false }, [...] "mappings": { "href": "https://smart-search-server:8181/actuator/mappings", "templated": false } } }
-
- Use of the logging API to adjust the log levels
-
A special Spring Boot actuator exists for handling the log settings. This allows the log levels to be adjusted at runtime. Calling the following relative URL on a SmartSearch instance generates the output of all currently configured log levels:
-
Method: GET
-
URL: /actuator/loggers
It is also possible to output information about a special logger:
-
Method: GET
-
URL: /actuator/loggers/<Logger>
Example URL to query the logger
en.arithnea.haupia.Server
: -
URL: /actuator/loggers/de.arithnea.haupia.Server
The following code snippet shows a possible response from the SmartSearch server to this query.
Example response{ "configuredLevel": "INFO", "effectiveLevel": "INFO" }
The adjustment of a log level is executed by a POST request against a specific logger URL. In this request, the new log level is transmitted via JSON format.
-
Method: POST
-
URL: /actuator/loggers/<Logger>
The body is a JSON object that has the desired log level as the value for the
configuredLevel
key.Example:
Curl call$ curl 'https://smart-search-server:8181/actuator/loggers/de.arithnea.haupia.Server' -i -u 'user:password' -X POST \ -H 'Content-Type: application/json' \ -d '{ "configuredLevel" : "DEBUG" }'
The returned HTTP status code
204
confirms the successful setting of the log level.
-
- Prometheus endpoint
-
Prometheus is a tool for monitoring processes. In regular intervals, the tool records the state of a process and allows a evaluation of the data over time. This enables, for example, to observe the memory consumption in relation to the requests to the REST services. The SmartSearch provides a Prometheus endpoint already preconfigured, which is to be activated via the
application.yml
file. For activation the following key has to be set totrue
:Example of activating the Prometheus endpointmanagement: metrics: export: prometheus: enabled: true
An endpoint is then available via the following URL:
-
URL: /actuator/prometheus
For example, the endpoint can be included in Prometheus as follows:
Example of embedding the endpoint in Prometheusscrape_configs: - job_name: 'smart-search' metrics_path: '/actuator/prometheus' scheme: https basic_auth: username: admin@localhost.de password: admin static_configs: - targets: ['smart-search-server:8181']
-
9. GDPR
The General Data Protection Regulation (GDPR) is an EU regulation that protects the fundamental right of European citizens to privacy and regulates the handling of personal data. Simplified, all persons from whom data is collected have the following options, among others, via the GDPR:
-
to learn which of their personal data is stored and how it is processed (duty to inform and right to information),
-
to restrict the collection of data (right to restriction of processing),
-
to influence the data collected (right to rectification); and
-
to delete the data collected (right to be forgotten).
- What is personal data?
-
Personal data is any information by which a natural person is directly or indirectly identifiable. This includes any potential identifiers:
-
direct identifiers, such as
-
Names
-
Email addresses
-
Telephone numbers
-
-
indirect identifiers, such as
-
Location data
-
Customer numbers
-
Staff numbers
-
-
Online identifiers, such as
-
IP addresses
-
Cookies
-
Tracking Pixel
-
-
Detailed information on the GDPR can be found in the blogpost The Ultimate Resource for GDPR Readiness. |
9.1. GDPR and the SmartSearch
The search engine SmartSearch stores data as documents that can be made available on various platforms. The type and scope of the data, hereinafter referred to as "collected data", depends on the purpose of the product.
The manufacturer Crownpeak Technology GmbH expressly points out that it is the responsibility of the customer to check collected data to determine whether it contains personal data and to ensure that appropriate measures are taken. |
In addition to the editorial data, the SmartSearch stores personal data (basically the email, and if LDAP is used the user’s username), which are used for logging in to the system and auditing configurations, in order to be able to contact an editor of an element if necessary. Parts of this data are kept in log files. In the following, this data is referred to as "personal system data" (see below).
9.2. Personal system data in the SmartSearch
The Crownpeak Technology GmbH takes the protection and security of your data very seriously. Of course, we comply with the legal data protection regulations and treat personal data but also non-personal data of our users with appropriate care. We only collect personal data if it is necessary for the security and functionality of SmartSearch.
The following subchapters provide information about the collection and handling of personal data in the SmartSearch.
9.2.1. Data for authorization and authentication of users in the SmartSearch
The SmartSearch works with a consistent user and rights system. New users are created and managed via the user management. After creation the user is known on the server and can log in (with a valid login) via the SmartSearch cockpit. Access to the configuration elements is granted via group rights/roles.
- Why is the data needed?
-
Authorization and authentication ensure that only authenticated users can access the SmartSearch and that these users can edit elements only according to the rights granted to them. Thus, this data is mandatory for the security of the information.
- Where is the data used or displayed?
-
Information about the user is displayed in various places, for example:
-
when logging in to the cockpit
-
when granting group rights
-
When changing an object via auditing
-
and many more
-
- Where is the data stored?
-
The credentials of the individual users are always stored in the configuration component. In the case of LDAP, the personal system data is loaded from the customer’s LDAP in read-only mode.
- How long is the data stored?
-
When a user is removed via the user management, the user’s credentials are immediately removed from the configuration component.
Deactivating a user in the user management does not delete their data.
9.2.2. Data for error analysis and troubleshooting in the SmartSearch (logging).
The SmartSearch uses log files to track actions and events on the SmartSearch server. Log files are collected to maintain secure operations. They can be used to analyze and troubleshoot error states.
- Why is the data needed?
-
Some of the log files used by SmartSearch include IP addresses, login names, dates, times, requests, etc. and thus contain personal data.
- Where is the data stored?
-
Basically, log files are written to the
logs
subdirectory of the SmartSearch server. - How long is the data stored?
-
Default behavior: When a fixed file size of 100 MB is reached, the current log file is archived. Up to nine archived files are kept. If a new file is then archived, the oldest one is deleted. This behavior can be customized via the configuration file
logback-spring.xml
.
9.2.3. Data for auditing configuration items
Each time an editor makes a change to configuration items (for example, an Prepared Search), it is noted on it who made the last change and when. This overwrites any existing last change.
+ When a new configuration item is created, it notes who created the item and when.
- Why is the data needed?
-
One objective of data storage in the SmartSearch is the traceability of the last changes, but also information about the creation of configuration items. For this purpose the SmartSearch stores auditing data.
- Where is the data used or displayed?
-
The auditing data (which user made a change and when?) is displayed in list views for the configuration items.
- Where is the data stored?
-
The data is stored at the configuration elements in the configuration component.
- How long is the data stored?
-
Default behavior: When a user is deleted, references to that user are anonymized. It is also possible to anonymize references of already deleted users afterwards, in case the default behavior was not applied during the deletion. After anonymization, a report is displayed in which elements the user was anonymized. No information is logged for this purpose. The report is the only way to get information about the anonymization.
9.2.4. Usage of cookies in the cockpit
Cookies are information that can be stored in the browser of the viewer about a visited website.
- Why is the data needed?
-
The SmartSearch uses cookies in the cockpit to save the user’s session and for protection against XSRF attacks.
- Where is the data used or displayed?
-
The cookies are stored in the browser and sent with every interaction with the cockpit from the moment of logging in.
- How long is the data stored?
-
The lifetime of the cookies is set to
session
.
10. Legal hints
The SmartSearch is a product of Crownpeak Technology GmbH, Dortmund, Germany. Only a license agreed upon with Crownpeak Technology GmbH is valid for using the SmartSearch.
11. Help
The Technical Support of the Crownpeak Technology GmbH provides expert technical support covering any topic related to the FirstSpirit™ product. You can get and find more help concerning relevant topics in our community.
12. Disclaimer
This document is provided for information purposes only. Crownpeak Technology GmbH may change the contents hereof without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. Crownpeak Technology GmbH specifically disclaims any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. The technologies, functionality, services, and processes described herein are subject to change without notice.