Please note that "haupia" has been renamed to "SmartSearch" with version 2.4.0. In all instances where there may be discrepancies or confusion, "SmartSearch" should be considered the correct and updated term. This might not be reflected in old scripts and examples, so we apologize for any inconvenience and hope that the instances requiring correction will be minimal.

1. Introduction

In the SmartSearch context the search index is filled by so called datagenerators. It is possible to add own datagenerators in addition to the provided ones. These are called external datagenerators and may be implemented based on the external datagenerators REST API. Prior to using the external datagenerator REST API two requirements have to be fulfilled. First an external datagenerator has to be created in the SmartSearch cockpit. It is later identified by its name. Second a technical user has to exist with the right to execute a datageneration. All calls to the REST API have to authenticate by using an authentication header (See section 4.2 of RFC 7235) and type “Basic”. If this is not the case or the credentials are not valid, the REST services will return with an HTTP status code of 405 (Not authorized). The REST service works exclusively with JSON data, besides the call to the datagenerator status. It is expected that for all calls the Accept header is set to "application/json". For calls which send data in JSON format a Content-Type header with value "application/json" is required. The usage of the API is similar to using a transaction. At first a begin has to be issued. Then documents are added to the datagenerator. Once all documents have been submitted the session is concluded by calling commit. At any time during the session it is possible to abort it. After the session is over, either by commit or abort, the held resources on the server are freed.

Before the data is synced the configured enhancers are applied. These are configured in the SmartSearch cockpit.

Most services include the possibility to send an additional message. This message is logged on INFO and also sent to the cockpit to be shown on the datagenerator list page.

The SmartSearch REST API for external datagenerators is available since SmartSearch version 2.0.0.59.

The SmartSearch REST API for external datagenerators supersedes the SmartSearch 1 Java frontend API (external datagenerator part).

2. Begin a datageneration

Every datageneration is based on a storage. The storage is used to persist the documents during the data generation until they are synced with the index. When beginning a datageneration it may be chosen to apply a fresh storage or a storage pre-filled with documents from the datagenerators last one. To begin a datageneration the following REST call may be used:

  • Method: POST

  • URL: /rest/api/v1/datagenerator/external/{name}/begin

Table 1. /rest/api/v1/datagenerator/external/{name}/begin

name

The configured datagenerator name

The body of the REST call may be a JSON object containing the desired storage creation type. This is optional, if no body is sent, a fresh storage is applied. A full example of a call to the REST service:

Example call
POST /rest/api/v1/datagenerator/external/eocpyfmhzr/begin HTTP/1.1 Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 64
{
   "message" : "Started DG", "creationType" : "COPY_LATEST"
}

The JSON object of the request body is as follows:

Table 2. JSON object

creationType

String

OPTIONAL The storage creation type. May be one of: NEW, COPY_LATEST

message

String

OPTIONAL Message used for logging and sent to the cockpit.

If the JSON object doest not contain the storage creation type the default one is applied.
The response call has no content.
There may be only one datageneration active at a time. If it is tried to start a datageneration while another is running the response code is 412 (PRECONDITION FAILED).

The following table contains the main HTTP response codes:

Table 3. Response codes

204

NO CONTENT

The datageneration was successfully started.

400

BAD REQUEST

The given JSON data was either an array or not a valid JSON string.

404

NOT FOUND

A datagenerator with the name does not exists.

412

PRECONDITION FAILED

There is already a datageneration running.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.

3. Add a document

Once a datageneration has been started one or more documents needs to be added. A document is represented as JSON object with the key "data" and the document data as value. The data itself is a JSON object. The entries of the object is mapped to a document by using the key as field name and the value as field value. The value itself may be represented in the JSON object as simple string or an array in case there are more than one values. It is of course also possible to represent the value with an array containing one value which is exactly the same as the simple string. These possibilities are all illustrated in this example:

Example
{
   "uid": "abc123",
   "data": {
      "key1": [
         "val1",
         "val2",
         "val3" ],
      "key2": "val4",
      "key3": [ "val5" ]
   }
}

The uid of the document is a required field. It is expected, that the uid is truly unique across the datageneration. If a document already exists with the same id, the “old” document is replaced.

The REST service is defined as follows:

  • Method: PUT

  • URL: /rest/api/v1/datagenerator/external/{name}

Table 4. /rest/api/v1/datagenerator/external/{name}

name

The configured datagenerator name

The body of the REST call is required and contains the document to add.

Example
PUT /rest/api/v1/datagenerator/external/ciozgcbkvh HTTP/1.1 Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 101
{
   "message" : "",
  "data" : {
      "toldt" : [ "rvhhf", "wdzun", "perda" ]
   },
   "uid" : "mdzaj"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Table 5. Response codes

204

NO CONTENT

The datageneration was successfully added.

400

BAD REQUEST

The given JSON data was either an array or not a valid JSON string. Also if the uid is missing in the document this status code is returned.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.

4. Commit datageneration

As soon as all documents are added the datagenerator is committed. During this stage the enhancers will iterate over all documents, the storage is synced and the resources are cleaned up. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.

  • Method: POST

  • URL: /rest/api/v1/datagenerator/external/{name}/commit

Table 6. /rest/api/v1/datagenerator/external/{name}/commit

name

The configured datagenerator name

The body of the REST call is optional and may contain the message with should be submitted with the commit.

Example
POST /rest/api/v1/datagenerator/external/vlrqjvjhhn/commit HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 29
{
   "message" : "Commit DG"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Table 7. Response codes

204

NO CONTENT

The commit was successfully started.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.

5. Abort datageneration

It is possible to abort a running datageneration at any time. On the server side reserved resources will be freed. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.

  • Method: POST

  • URL: /rest/api/v1/datagenerator/external/{name}/abort

Table 8. /rest/api/v1/datagenerator/external/{name}/abort

name

The configured datagenerator name

The body of the REST call is optional and may contain the message with should be submitted with the abort.

Example
POST /rest/api/v1/datagenerator/external/ysxiplvrtd/abort HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 28
{
   "message" : "Abort DG"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Table 9. Response codes

204

NO CONTENT

The abort was successfully started.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.

6. Get datagenerator status

The datagenerator has a specific status on the server. These are for example shown on the datagenerator list page. An external datagenerator may only be started if the current state is IDLE or ERROR otherwise the call to begin will return with PRECONDITION FAILED. To get the current state of a datagenerator this REST service may be used:

  • Method: GET

  • URL: /rest/api/v1/datagenerator/external/{name}/status

Table 10. /rest/api/v1/datagenerator/external/{name}/status

name

The configured datagenerator name

The response is a string representing the status of the datagenerator. The result content type is text/plain and should be requested accordingly:

Example
GET /rest/api/v1/datagenerator/external/xigtbewetb/status HTTP/1.1
Accept: text/plain
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181

An example response look like this: CRAWLING

In case of success a status code 200 (OK) is returned.

The following table contains the main HTTP response codes:

Table 11. Response codes

200

OK

The request was successful.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.

SmartSearch is a product of Crownpeak Technology GmbH, Dortmund, Germany.
Only a license agreed upon with Crownpeak Technology GmbH is valid with respect to the user for using the module.

8. Help

The Technical Support of the Crownpeak Technology GmbH provides expert technical support covering any topic related to the FirstSpirit™ product. You can get and find more help concerning relevant topics in our community.

9. Disclaimer

This document is provided for information purposes only. Crownpeak Technology GmbH may change the contents hereof without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. Crownpeak Technology GmbH specifically disclaims any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. The technologies, functionality, services, and processes described herein are subject to change without notice.