Please note that "haupia" has been renamed to "SmartSearch" with version 2.4.0. In all instances where there may be discrepancies or confusion, "SmartSearch" should be considered the correct and updated term. This might not be reflected in old scripts and examples, so we apologize for any inconvenience and hope that the instances requiring correction will be minimal.

1. Introduction

Searching a website is especially important as more and more information is available online these days and websites become larger and more complex. An effective search function allows users to quickly and easily find the information they need without having to click through countless pages and links. It is also an efficient way to save time and improve the user experience, as it allows users to go directly to the relevant content without having to deal with unnecessary information. In addition, a good search function can also help a website be perceived as user-friendly and professional, which in turn increases user confidence and can help them stay longer on the website and come back more often. Overall, effective search on a website is an important factor in a positive user experience and the success of a website.

The search function on the website can also be used as a marketing tool to provide visitors with personalized offers and recommendations. For example, if a visitor is searching for a specific product, the company can recommend similar or complementary products or make special offers to increase their willingness to buy. In addition, certain hits can be better placed in search results to draw the attention of website visitors to specific content.

Search gives the company valuable insights into the needs and interests of visitors. By analyzing search terms and results, companies can identify trends and patterns and adapt their website content accordingly to better meet the needs of the target group. In this way, the website can also serve as a market research tool.

Overall, a well-designed search function on the website can not only help improve user experience and increase conversion rates, but also provide valuable insights into the needs of the target audience and be used as a marketing tool for customer retention and acquisition.

The SmartSearch bundles requirements that customers place on the search function of an online presence: An intuitive, high-performance search solution that can be used on extensive websites and delivers relevant results. It offers both high result quality and optimum search convenience, thus retaining customers on the website.

At the same time, it provides editors with a web interface through the integrated SmartSearch cockpit that can be used without IT knowledge. Editors from specialist and marketing departments are thus enabled to control and monitor search results on the web interface. For this purpose, the cockpit provides statistics, filters and analysis functions and allows the indexing of various data types (for example XML, audio, video, media) from different data sources. With the help of individualized result lists, editors can prioritize and weight search results in the back end and display selected content for predefined search queries.

1.1. Concept

The SmartSearch functionality, e.g. settings to user permissions and group permissions, can be managed in the browser-based SmartSearch cockpit without any prior technical knowledge.

1.1.1. data generator

In the SmartSearch cockpit, you can create data generators to capture searchable data.

There are three types of data generators:

  • Web: The web crawler enhances the searchability of an existing web site.

  • XML: The XML file crawler speeds up the processing of data when it exists as XML files.

  • API: Assists in passing data from a self-developed application or the FirstSpirit SmartSearch Connect module to an API.

Using a Prepared Search, you can configure the search result, for example by bundling content from multiple data generators into one Prepared Search.

1.1.3. Adaptable Result

In an Adaptable Result, you can actively modify the search results, for example to display certain search results higher in the list, or to exclude search results from the list.

1.1.4. synonyms and stopwords

You can specify a synonym to associate a term with the search that does not appear in the website text, but is meaningfully associated with another search term. synonyms can be derived from common search queries, for example.

You can use stopwords to ignore certain terms in the search query. A list of typical stopwords for each language is provided and can be extended.

1.2. Architecture

The functionalities of the SmartSearch are realized by an architecture made up of a range of different components (see figure Architecture).

These components are:

  • ZooKeeper

  • Solr

  • SmartSearch

architecture
Figure 1. Architecture

The individual components always interact according to the following schema:

  • Prior to creating the search index, the SmartSearch must collect the required data. For this, it accesses the information to be collected on the customer side, which can exist in the form of websites, portals or databases. In addition, a REST interface provides the option of filling the search index with further data from outside.

  • After that, the SmartSearch server normalizes the data and transfers it to the Solr server. The server receives the data and persists it in an index.

  • The query for data is done equivalently: The SmartSearch server receives the request, modifies it, and then forwards it to the Solr server. This responds with a search result, which the SmartSearch server returns to the customer’s end application via the REST interface.

  • The SmartSearch cockpit is to be seen detached from the other components. It serves the administration of the SmartSearch server and therefore offers a simple, web-based administration interface. Among other things, search solutions can be created and configured in this interface.

  • Configurations made in the SmartSearch cockpit are saved on the ZooKeeper server together with the Solr configuration data.

The communication to the outside is protected by HTTPS, between the components it is done via HTTP.

1.3. Technical Requirements

To deploy SmartSearch, the following technical requirements must be met:

  • Java 11 as the Java Development Kit (JDK) for running ZooKeeper and Solr

  • ZooKeeper version 3.4.10

  • Solr version 8.11.2 in cloud mode

  • SmartSearch in the latest version, specifically requiring Java 21 for execution

ZooKeeper and Solr are not included in the delivery. They must therefore be downloaded before installation in the version specified.

Despite Java 11 being the default requirement for ZooKeeper and Solr, SmartSearch operates on Java 21. To accommodate this, both Java 11 and Java 21 must be present on the system. However, only SmartSearch, requires Java 21.

To run SmartSearch with Java 21, execute it using the Java 21 executable. Ensure that Java 11 remains the system’s default JDK for all other operations. You can start the SmartSearch server by specifying the path to the Java 21 executable as follows:

/path/to/java21/bin/java -jar Server.jar -server -Dhaupia.master.profile=STANDALONE -Dfile.encoding=UTF-8

This approach allows you to use Java 21 for running the SmartSearch server without altering the default Java environment configured for Java 11, ensuring compatibility with other dependencies.

2. SmartSearch cockpit

The SmartSearch cockpit is a component of the SmartSearch. It enables the backend-side administration of the data collected by the SmartSearch and offers a simple, web-based interface for this purpose. This is divided into the areas Configuration, Analysis, Data and System, which can be reached via the menu. The button with the globe icon also provides a language switcher for German and English.

By default, the SmartSearch cockpit is accessible via the following URL:

The first start of the cockpit must be done with the master admin. It is created automatically with the data from the application.yml at the initial start of the SmartSearch server.

If the user and group management is implemented via an LDAP server, the credentials may differ.

After valid login, the user is automatically redirected to the dashboard of the cockpit. Re-authentication is only required after an explicit logout or after the session has expired.

dashboard
Figure 2. SmartSearch dashboard

2.1. Configuration

The Configuration area is divided into the submenus Prepared Search, Stopwords and Synonyms. These allow the configuration of the output of the data collected by the SmartSearch.

The following subsections describe the submenus and the functions provided by them.

2.1.1. Prepared Search

The customer-side gathering of the required data is done by the so-called data generators, which are a part of the Data area. For their management, the SmartSearch provides the Prepared Searches. These allow optimizing the search results by prioritizing individual data.

The creation and administration of the Prepared Searches is done in the interface of the same name, which can be called via the menu entry Configuration  Prepared Search.

The area shows a list of all already existing Prepared Searches and is initially empty.

In cloud mode, the list also displays the accessibility of each Prepared Search.

preparedsearch
Figure 3. Prepared Searches

New Prepared Search

For the creation of a new Prepared Search there is a separate view, which can be called by clicking on the button New Prepared Search and is divided into the three tabs General, Facets and Preview.

psEditPage
Figure 4. Creating a Prepared Search
General

The first thing to do within the tab General is to specify a name for the new Prepared Search. In cloud mode, the additional checkbox publicly accessible is located next to the input field for the name. With it the accessibility of a Prepared Search can be defined. Activating the checkbox enables the Prepared Search to be queried via the internet (API gateway). Otherwise the Prepared Search is only accessible as it is the cockpit.

The following selection of any number of data generators in the selection list of the same name shows their available fields. The initially activated checkbox Verbose shows or hides all technical fields. The button provided together with the checkbox enables the emphasis of the selected fields.

The list of field names per data generator is cached. When creating a new data generator and running it for the first time, it may take several minutes for the field names to appear in the list.

The selected fields can be transferred via button to the list of fields relevant for a search, which by default contains the fields content, link and title. A previously defined emphasis is automatically assigned to each of these fields.

The list provides the following configuration options per field:

  • Highlight: By activating this checkbox, a search word is highlighted within a text segment in the search result. The length of the text section is freely configurable (see below).

  • Output: This option defines whether the field is visible in the search result. For example, in the case of the link that only refers to the entire document, this may not be desired.

  • Search: In order for the associated field to be taken into account by the search, this checkbox must be activated.

    Deactivating the Search option hides the subsequent Partial match and Emphasis options for the corresponding field.

  • Partial match: This option enables partial matches to be taken into account. If the checkbox is selected, the search for electricity, for example, also finds the match eco-electricity provider. The search word must have a length of three up to twenty characters.

  • Emphasis: The emphasis offers the possibility to set a prioritization for matches of the selected fields and thus to influence the search result.

    The button with the trash can icon available for each field allows deleting the corresponding field from the list.

    The tab additionally contains the following general configuration options:

  • Hits per page: According to its name, this button specifies the maximum number of hits displayed on a search results page. In combination with the URL parameter page it is also possible to split the search results into multiple pages.

  • Length of highlight (in characters): Using this button, the length of the text segment in which a search term is visually highlighted can be defined, as mentioned before.

  • Sort by: By default, the search results are sorted in descending order of their score. If a different sorting is desired, this text input field allows a corresponding adjustment. For this, any field is to be specified and supplemented by the expression ASC for an ascending or DESC for a descending sorting.

  • Spellcheck (hits less than): If the number of search results is smaller than the value configured at this point, a spelling check is performed and the search is performed for search words of similar spelling.

  • Must Match: For searching multiple terms, the entry of this text input field determines how they are to be linked:

    • The value 0 corresponds to an OR reference between the search terms used.

    • The value 100% corresponds to an AND reference between the search terms used.

    • An absolute value defines the number of terms that must be contained within a search hit. For example, the value 2 for five given search terms means that two of the five terms must be contained within a search result. Furthermore, the values 2 and 50% are equivalent to each other for four search terms.

  • Groovy Script: In addition, Prepared Searches allow the inclusion of a self-implemented Groovy script. Such a script enables additional modifications of the dataset. For example, additional documents can be added to the dataset or the existing dataset can be edited.

Facets

Facets provide the possibility to restrict result lists according to fields that are included in a document. Since facets always refer to the data generators selected in the tab General, the tab Facets is initially empty.

facets
Figure 5. Facets

New facet

For the creation of a new facet there is a separate view, which can be called by clicking on the button New Facet. This is only active if at least one data generator is defined in the General tab.

Within the tab, a field must first be selected in the dropdown of the same name. The available fields are taken from the selected data generators. With the selection of a field, a list of the values associated with this field appears, for which the following configuration options are available:

  • Filter: This input field can be used to search for facet values.

  • Show weighted values only: By selecting this checkbox, only facet values that have been weighted will be displayed.

  • Weight: By clicking on the Weight field, a weighting can be assigned to a facet value. This number represents a multiplier of the score and can be between 0.00 and 9.99. A value between 0.00 and 2.00 is recommended. Results that belong to this facet value receive a weighting and are ranked accordingly higher or lower in the search results.

  • Display values on number of matches greater than: This option can be used to exclude values from the facet list whose number of matches is less than the specified threshold.

  • Multiple selection possible: This option allows filtering the search results list according to different aspects. For example, the search for a specific object can subsequently be restricted to a specific size as well as to a specific color. This makes the search more specific and minimizes the search results list.

  • Exclude own filter: In order to provide different filter options for the search, the selection of a filter may only refer to the search result list, but not to the filter options. Otherwise, the other options would also be hidden by the search.

    For example, if the filter options German, English and French exist for the facet Language, the search will only return English documents when the option English is selected. If the filter English is not excluded in this case, the list of available filters will also only show English. In this case it is no longer possible to switch to another filter or to make a multiple selection in connection with the previous configuration option.

    Preview

    The preview tab provides the option to test a Prepared Search configuration on a preview page. For this purpose it is not necessary to save the Prepared Search. The settings from the other tabs are directly applied to the search queries. On the preview page there is an input field for search terms. The entered term is then looked up in the current Prepared Search and the results are displayed. Next to each search result there is a button marked with an arrow pointing upwards. Clicking this button displays all of the fields that are selected in this Prepared Search below the result.

    In order to transfer a facet to the preview, it is sufficient to confirm the configuration by clicking OK. After switching back to the preview tab, the last search term is used again automatically and new filter options appear in the column next to the search results. These receive their names and values from the previously created facet. The preview feature can also be used to test weightings or groovy scripts.

Previously created Adaptable Results will appear in the specified order in the preview.