SmartSearch

e-Spirit GmbH

20.07.2022
Table of Contents

1. Introduction

The SmartSearch bundles requirements that customers place on the search function of an online presence: An intuitive, high-performance search solution that can be used on extensive websites and delivers relevant results. It offers both high result quality and optimum search convenience, thus retaining customers on the website.

At the same time, it provides editors with a web interface through the integrated SmartSearch cockpit that can be used without IT knowledge. Editors from specialist and marketing departments are thus enabled to control and monitor search results on the web interface. For this purpose, the cockpit provides statistics, filters and analysis functions and allows the indexing of various data types (for example XML, audio, video, media) from different data sources. With the help of individualized result lists, editors can prioritize and weight search results in the back end and display selected content for predefined search queries.

1.1. Architecture

The functionalities of the SmartSearch are realized by an architecture made up of a range of different components (see figure Architecture).

These components are:

  • ZooKeeper
  • Solr
  • SmartSearch
Architecture
Figure 1. Architecture


The individual components always interact according to the following schema:

  • Prior to creating the search index, the SmartSearch must collect the required data. For this, it accesses the information to be collected on the customer side, which can exist in the form of websites, portals or databases. In addition, a REST interface provides the option of filling the search index with further data from outside.
  • After that, the SmartSearch server normalizes the data and transfers it to the Solr server. The server receives the data and persists it in an index.
  • The query for data is done equivalently: The SmartSearch server receives the request, modifies it, and then forwards it to the Solr server. This responds with a search result, which the SmartSearch server returns to the customer’s end application via the REST interface.
  • The SmartSearch cockpit is to be seen detached from the other components. It serves the administration of the SmartSearch server and therefore offers a simple, web-based administration interface. Among other things, search solutions can be created and configured in this interface.
  • Configurations made in the SmartSearch cockpit are saved on the ZooKeeper server together with the Solr configuration data.

The communication to the outside is protected by HTTPS, between the components it is done via HTTP.

1.2. Technical requirements

To use the SmartSearch, the following technical requirements must be met:

  • Java 11 or higher
  • ZooKeeper in the version 3.4.10
  • Solr in the version 8.6.3 in cloud mode
  • the SmartSearch in the latest version

ZooKeeper and Solr are not included in the delivery. They must therefore be downloaded before installation in the version specified.

2. SmartSearch cockpit

The SmartSearch cockpit is a component of the SmartSearch. It enables the backend-side administration of the data collected by the SmartSearch and offers a simple, web-based interface for this purpose. This is divided into the areas Configuration, Analysis, Data and System, which can be reached via the menu. The button with the globe icon also provides a language switcher for German and English.

By default, the SmartSearch cockpit is accessible via the following URL:

http://<Servername>:8181

The first start of the cockpit must be done with the master admin. It is created automatically with the data from the application.yml at the initial start of the SmartSearch server.

If the user and group management is implemented via an LDAP server, the credentials may differ.

After valid login, the user is automatically redirected to the dashboard of the cockpit. Re-authentication is only required after an explicit logout or after the session has expired.

SmartSearch dashboard
Figure 2. SmartSearch dashboard


2.1. Configuration

The Configuration area is divided into the submenus Prepared Search, Stopwords and Synonyms. These allow the configuration of the output of the data collected by the SmartSearch.

The following subsections describe the submenus and the functions provided by them.

2.1.1. Prepared Search

The customer-side gathering of the required data is done by the so-called data generators, which are a part of the Data area. For their management, the SmartSearch provides the Prepared Searches. These allow optimizing the search results by prioritizing individual data.

The creation and administration of the Prepared Searches is done in the interface of the same name, which can be called via the menu entry ConfigurationPrepared Search.

The area shows a list of all already existing Prepared Searches and is initially empty.

In cloud mode, the list also displays the accessibility of each Prepared Search.

Prepared Searches
Figure 3. Prepared Searches


New Prepared Search

For the creation of a new Prepared Search there is a separate view, which can be called by clicking on the button New Prepared Search and is divided into the three tabs General, Facets and Preview.

Creating a Prepared Search
Figure 4. Creating a Prepared Search


General

The first thing to do within the tab General is to specify a name for the new Prepared Search. In cloud mode, the additional checkbox publicly accessible is located next to the input field for the name. With it the accessibility of a Prepared Search can be defined. Activating the checkbox enables the Prepared Search to be queried via the internet (API gateway). Otherwise the Prepared Search is only accessible as it is the cockpit.

The following selection of any number of data generators in the selection list of the same name shows their available fields. The initially activated checkbox Verbose shows or hides all technical fields. The button provided together with the checkbox enables the emphasis of the selected fields.

The list of field names per data generator is cached. When creating a new data generator and running it for the first time, it may take several minutes for the field names to appear in the list.

The selected fields can be transferred via button to the list of fields relevant for a search, which by default contains the fields content, link and title. A previously defined emphasis is automatically assigned to each of these fields.

The list provides the following configuration options per field:

  • Highlight: By activating this checkbox, a search word is highlighted within a text segment in the search result. The length of the text section is freely configurable (see below).
  • Output: This option defines whether the field is visible in the search result. For example, in the case of the link that only refers to the entire document, this may not be desired.
  • Search: In order for the associated field to be taken into account by the search, this checkbox must be activated.

    Deactivating the Search option hides the subsequent Partial match and Emphasis options for the corresponding field.

  • Partial match: This option enables partial matches to be taken into account. If the checkbox is selected, the search for electricity, for example, also finds the match eco-electricity provider. The search word must have a length of three up to twenty characters.
  • Emphasis: The emphasis offers the possibility to set a prioritization for matches of the selected fields and thus to influence the search result.

    The button with the trash can icon available for each field allows deleting the corresponding field from the list.

    The tab additionally contains the following general configuration options:

  • Hits per page: According to its name, this button specifies the maximum number of hits displayed on a search results page. In combination with the URL parameter page it is also possible to split the search results into multiple pages.
  • Length of highlight (in characters): Using this button, the length of the text segment in which a search term is visually highlighted can be defined, as mentioned before.
  • Sort by: By default, the search results are sorted in descending order of their score. If a different sorting is desired, this text input field allows a corresponding adjustment. For this, any field is to be specified and supplemented by the expression ASC for an ascending or DESC for a descending sorting.
  • Spellcheck (hits less than): If the number of search results is smaller than the value configured at this point, a spelling check is performed and the search is performed for search words of similar spelling.
  • Must Match: For searching multiple terms, the entry of this text input field determines how they are to be linked:

    • The value 0 corresponds to an OR reference between the search terms used.
    • The value 100% corresponds to an AND reference between the search terms used.
    • An absolute value defines the number of terms that must be contained within a search hit. For example, the value 2 for five given search terms means that two of the five terms must be contained within a search result. Furthermore, the values 2 and 50% are equivalent to each other for four search terms.
  • Groovy Script: In addition, Prepared Searches allow the inclusion of a self-implemented Groovy script. Such a script enables additional modifications of the dataset. For example, additional documents can be added to the dataset or the existing dataset can be edited.
Facets

Facets provide the possibility to restrict result lists according to fields that are included in a document. Since facets always refer to the data generators selected in the tab General, the tab Facets is initially empty.