1. Introduction

The SmartSearch bundles requirements that customers place on the search function of an online presence: An intuitive, high-performance search solution that can be used on extensive websites and delivers relevant results. It offers both high result quality and optimum search convenience, thus retaining customers on the website.

At the same time, it provides editors with a web interface through the integrated SmartSearch cockpit that can be used without IT knowledge. Editors from specialist and marketing departments are thus enabled to control and monitor search results on the web interface. For this purpose, the cockpit provides statistics, filters and analysis functions and allows the indexing of various data types (for example XML, audio, video, media) from different data sources. With the help of individualized result lists, editors can prioritize and weight search results in the back end and display selected content for predefined search queries.

1.1. Architecture

The functionalities of the SmartSearch are realized by an architecture made up of a range of different components (see figure Architecture).

These components are:

  • ZooKeeper

  • Solr

  • SmartSearch

architecture
Figure 1. Architecture

The individual components always interact according to the following schema:

  • Prior to creating the search index, the SmartSearch must collect the required data. For this, it accesses the information to be collected on the customer side, which can exist in the form of websites, portals or databases. In addition, a REST interface provides the option of filling the search index with further data from outside.

  • After that, the SmartSearch server normalizes the data and transfers it to the Solr server. The server receives the data and persists it in an index.

  • The query for data is done equivalently: The SmartSearch server receives the request, modifies it, and then forwards it to the Solr server. This responds with a search result, which the SmartSearch server returns to the customer’s end application via the REST interface.

  • The SmartSearch cockpit is to be seen detached from the other components. It serves the administration of the SmartSearch server and therefore offers a simple, web-based administration interface. Among other things, search solutions can be created and configured in this interface.

  • Configurations made in the SmartSearch cockpit are saved on the ZooKeeper server together with the Solr configuration data.

The communication to the outside is protected by HTTPS, between the components it is done via HTTP.

1.2. Technical requirements

To use the SmartSearch, the following technical requirements must be met:

  • Java 11 or higher

  • ZooKeeper in the version 3.4.10

  • Solr in the version 8.6.3 in cloud mode

  • the SmartSearch in the latest version

ZooKeeper and Solr are not included in the delivery. They must therefore be downloaded before installation in the version specified.

2. SmartSearch cockpit

The SmartSearch cockpit is a component of the SmartSearch. It enables the backend-side administration of the data collected by the SmartSearch and offers a simple, web-based interface for this purpose. This is divided into the areas Configuration, Analysis, Data and System, which can be reached via the menu. The button with the globe icon also provides a language switcher for German and English.

By default, the SmartSearch cockpit is accessible via the following URL:

The first start of the cockpit must be done with the master admin. It is created automatically with the data from the application.yml at the initial start of the SmartSearch server.

If the user and group management is implemented via an LDAP server, the credentials may differ.

After valid login, the user is automatically redirected to the dashboard of the cockpit. Re-authentication is only required after an explicit logout or after the session has expired.

dashboard
Figure 2. SmartSearch dashboard

2.1. Configuration

The Configuration area is divided into the submenus Prepared Search, Stopwords and Synonyms. These allow the configuration of the output of the data collected by the SmartSearch.

The following subsections describe the submenus and the functions provided by them.

2.1.1. Prepared Search

The customer-side gathering of the required data is done by the so-called data generators, which are a part of the Data area. For their management, the SmartSearch provides the Prepared Searches. These allow optimizing the search results by prioritizing individual data.

The creation and administration of the Prepared Searches is done in the interface of the same name, which can be called via the menu entry Configuration  Prepared Search.

The area shows a list of all already existing Prepared Searches and is initially empty.

In cloud mode, the list also displays the accessibility of each Prepared Search.

preparedsearch
Figure 3. Prepared Searches

New Prepared Search

For the creation of a new Prepared Search there is a separate view, which can be called by clicking on the button New Prepared Search and is divided into the three tabs General, Facets and Preview.

psEditPage
Figure 4. Creating a Prepared Search
General

The first thing to do within the tab General is to specify a name for the new Prepared Search. In cloud mode, the additional checkbox publicly accessible is located next to the input field for the name. With it the accessibility of a Prepared Search can be defined. Activating the checkbox enables the Prepared Search to be queried via the internet (API gateway). Otherwise the Prepared Search is only accessible as it is the cockpit.

The following selection of any number of data generators in the selection list of the same name shows their available fields. The initially activated checkbox Verbose shows or hides all technical fields. The button provided together with the checkbox enables the emphasis of the selected fields.

The list of field names per data generator is cached. When creating a new data generator and running it for the first time, it may take several minutes for the field names to appear in the list.

The selected fields can be transferred via button to the list of fields relevant for a search, which by default contains the fields content, link and title. A previously defined emphasis is automatically assigned to each of these fields.

The list provides the following configuration options per field:

  • Highlight: By activating this checkbox, a search word is highlighted within a text segment in the search result. The length of the text section is freely configurable (see below).

  • Output: This option defines whether the field is visible in the search result. For example, in the case of the link that only refers to the entire document, this may not be desired.

  • Search: In order for the associated field to be taken into account by the search, this checkbox must be activated.

    Deactivating the Search option hides the subsequent Partial match and Emphasis options for the corresponding field.

  • Partial match: This option enables partial matches to be taken into account. If the checkbox is selected, the search for electricity, for example, also finds the match eco-electricity provider. The search word must have a length of three up to twenty characters.

  • Emphasis: The emphasis offers the possibility to set a prioritization for matches of the selected fields and thus to influence the search result.

    The button with the trash can icon available for each field allows deleting the corresponding field from the list.

    The tab additionally contains the following general configuration options:

  • Hits per page: According to its name, this button specifies the maximum number of hits displayed on a search results page. In combination with the URL parameter page it is also possible to split the search results into multiple pages.

  • Length of highlight (in characters): Using this button, the length of the text segment in which a search term is visually highlighted can be defined, as mentioned before.

  • Sort by: By default, the search results are sorted in descending order of their score. If a different sorting is desired, this text input field allows a corresponding adjustment. For this, any field is to be specified and supplemented by the expression ASC for an ascending or DESC for a descending sorting.

  • Spellcheck (hits less than): If the number of search results is smaller than the value configured at this point, a spelling check is performed and the search is performed for search words of similar spelling.

  • Must Match: For searching multiple terms, the entry of this text input field determines how they are to be linked:

    • The value 0 corresponds to an OR reference between the search terms used.

    • The value 100% corresponds to an AND reference between the search terms used.

    • An absolute value defines the number of terms that must be contained within a search hit. For example, the value 2 for five given search terms means that two of the five terms must be contained within a search result. Furthermore, the values 2 and 50% are equivalent to each other for four search terms.

  • Groovy Script: In addition, Prepared Searches allow the inclusion of a self-implemented Groovy script. Such a script enables additional modifications of the dataset. For example, additional documents can be added to the dataset or the existing dataset can be edited.

Facets

Facets provide the possibility to restrict result lists according to fields that are included in a document. Since facets always refer to the data generators selected in the tab General, the tab Facets is initially empty.

facets
Figure 5. Facets

New facet

For the creation of a new facet there is a separate view, which can be called by clicking on the button New Facet. This is only active if at least one data generator is defined in the General tab.

Within the tab, a field must first be selected in the dropdown of the same name. The available fields are taken from the selected data generators. With the selection of the field a list of the values belonging to it appears, for which the following configuration options are available:

  • Sort by number of matches | alphabetical: This option defines whether the values are displayed in the facet list in descending order by their number or alphabetically by their name.

  • Display values on number of matches greater than: This option can be used to exclude values from the facet list whose number of matches is less than the specified threshold.

  • Multiple selection possible: This option allows filtering the search results list according to different aspects. For example, the search for a specific object can subsequently be restricted to a specific size as well as to a specific color. This makes the search more specific and minimizes the search results list.

  • Exclude own filter: In order to provide different filter options for the search, the selection of a filter may only refer to the search result list, but not to the filter options. Otherwise, the other options would also be hidden by the search.

    For example, if the filter options German, English and French exist for the facet Language, the search will only return English documents when the option English is selected. If the filter English is not excluded in this case, the list of available filters will also only show English. In this case it is no longer possible to switch to another filter or to make a multiple selection in connection with the previous configuration option.

Preview

The preview tab provides the option to test a Prepared Search configuration on a preview page. For this purpose it is not necessary to save the Prepared Search. The settings from the other tabs are directly applied to the search queries. On the preview page there is an input field for search terms. The entered term is then looked up in the current Prepared Search and the results are displayed. Next to each search result there is a button marked with an arrow pointing upwards. Clicking this button displays all of the fields that are selected in this Prepared Search below the result.

In order to transfer a facet to the preview, it is sufficient to confirm the configuration by clicking OK. After switching back to the preview tab, the last search term is used again automatically and new filter options appear in the column next to the search results. These receive their names and values from the previously created facet. The preview feature can also be used to test weightings or groovy scripts.

Previously created Adaptable Results will appear in the specified order in the preview.