haupia

REST API for external datagenerators

e-Spirit AG

06.05.2021
Inhaltsverzeichnis

1. Introduction

In the haupia context the search index is filled by so called datagenerators. It is possible to add own datagenerators in addition to the provided ones. These are called external datagenerators and may be implemented based on the external datagenerators REST API. Prior to using the external datagenerator REST API two requirements have to be fulfilled. First an external datagenerator has to be created in the haupia cockpit. It is later identified by its name. Second a technical user has to exist with the right to execute a datageneration. All calls to the REST API have to authenticate by using an authentication header (See section 4.2 of RFC 7235) and type “Basic”. If this is not the case or the credentials are not valid, the REST services will return with an HTTP status code of 405 (Not authorized). The REST service works exclusively with JSON data, besides the call to the datagenerator status. It is expected that for all calls the Accept header is set to "application/json". For calls which send data in JSON format a Content-Type header with value "application/json" is required. The usage of the API is similar to using a transaction. At first a begin has to be issued. Then documents are added to the datagenerator. Once all documents have been submitted the session is concluded by calling commit. At any time during the session it is possible to abort it. After the session is over, either by commit or abort, the held resources on the server are freed.

Before the data is synced the configured enhancers are applied. These are configured in the haupia cockpit.

Most services include the possibility to send an additional message. This message is logged on INFO and also sent to the cockpit to be shown on the datagenerator list page.

The haupia REST API for external datagenerators is available since haupia version 2.0.0.59.

The haupia REST API for external datagenerators supersedes the haupia 1 Java frontend API (external datagenerator part).

2. Begin a datageneration

Every datageneration is based on a storage. The storage is used to persist the documents during the data generation until they are synced with the index. When beginning a datageneration it may be chosen to apply a fresh storage or a storage pre-filled with documents from the datagenerators last one. To begin a datageneration the following REST call may be used:

  • Method: POST
  • URL: /rest/api/v1/datagenerator/external/{name}/begin
Tabelle 1. /rest/api/v1/datagenerator/external/{name}/begin
  

name

The configured datagenerator name



The body of the REST call may be a JSON object containing the desired storage creation type. This is optional, if no body is sent, a fresh storage is applied. A full example of a call to the REST service:

Example call. 

POST /rest/api/v1/datagenerator/external/eocpyfmhzr/begin HTTP/1.1 Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 64
{
   "message" : "Started DG", "creationType" : "COPY_LATEST"
}

The JSON object of the request body is as follows:

Tabelle 2. JSON object
   

creationType

String

OPTIONAL The storage creation type. May be one of: NEW, COPY_LATEST

message

String

OPTIONAL Message used for logging and sent to the cockpit.



If the JSON object doest not contain the storage creation type the default one is applied.
The response call has no content.
There may be only one datageneration active at a time. If it is tried to start a datageneration while another is running the response code is 412 (PRECONDITION FAILED).

The following table contains the main HTTP response codes:

Tabelle 3. Response codes
   

204

NO CONTENT

The datageneration was successfully started.

400

BAD REQUEST

The given JSON data was either an array or not a valid JSON string.

404

NOT FOUND

A datagenerator with the name does not exists.

412

PRECONDITION FAILED

There is already a datageneration running.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.



3. Add a document

Once a datageneration has been started one or more documents needs to be added. A document is represented as JSON object with the key "data" and the document data as value. The data itself is a JSON object. The entries of the object is mapped to a document by using the key as field name and the value as field value. The value itself may be represented in the JSON object as simple string or an array in case there are more than one values. It is of course also possible to represent the value with an array containing one value which is exactly the same as the simple string. These possibilities are all illustrated in this example:

Example. 

{
   "uid": "abc123",
   "data": {
      "key1": [
         "val1",
         "val2",
         "val3" ],
      "key2": "val4",
      "key3": [ "val5" ]
   }
}

The uid of the document is a required field. It is expected, that the uid is truly unique across the datageneration. If a document already exists with the same id, the “old” document is replaced.

The REST service is defined as follows:

  • Method: PUT
  • URL: /rest/api/v1/datagenerator/external/{name}
Tabelle 4. /rest/api/v1/datagenerator/external/{name}
  

name

The configured datagenerator name



The body of the REST call is required and contains the document to add.

Example. 

PUT /rest/api/v1/datagenerator/external/ciozgcbkvh HTTP/1.1 Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 101
{
   "message" : "",
  "data" : {
      "toldt" : [ "rvhhf", "wdzun", "perda" ]
   },
   "uid" : "mdzaj"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Tabelle 5. Response codes
   

204

NO CONTENT

The datageneration was successfully added.

400

BAD REQUEST

The given JSON data was either an array or not a valid JSON string. Also if the uid is missing in the document this status code is returned.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.



4. Commit datageneration

As soon as all documents are added the datagenerator is committed. During this stage the enhancers will iterate over all documents, the storage is synced and the resources are cleaned up. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.

  • Method: POST
  • URL: /rest/api/v1/datagenerator/external/{name}/commit
Tabelle 6. /rest/api/v1/datagenerator/external/{name}/commit
  

name

The configured datagenerator name



The body of the REST call is optional and may contain the message with should be submitted with the commit.

Example. 

POST /rest/api/v1/datagenerator/external/vlrqjvjhhn/commit HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 29
{
   "message" : "Commit DG"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Tabelle 7. Response codes
   

204

NO CONTENT

The commit was successfully started.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.



5. Abort datageneration

It is possible to abort a running datageneration at any time. On the server side reserved resources will be freed. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.

  • Method: POST
  • URL: /rest/api/v1/datagenerator/external/{name}/abort
Tabelle 8. /rest/api/v1/datagenerator/external/{name}/abort
  

name

The configured datagenerator name



The body of the REST call is optional and may contain the message with should be submitted with the abort.

Example. 

POST /rest/api/v1/datagenerator/external/ysxiplvrtd/abort HTTP/1.1
Content-Type: application/json
Accept: application/json
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181 Content-Length: 28
{
   "message" : "Abort DG"
}

In case of success a status code 204 (NO CONTENT) is returned.

The following table contains the main HTTP response codes:

Tabelle 9. Response codes
   

204

NO CONTENT

The abort was successfully started.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.



6. Get datagenerator status

The datagenerator has a specific status on the server. These are for example shown on the datagenerator list page. An external datagenerator may only be started if the current state is IDLE or ERROR otherwise the call to begin will return with PRECONDITION FAILED. To get the current state of a datagenerator this REST service may be used:

  • Method: GET
  • URL: /rest/api/v1/datagenerator/external/{name}/status
Tabelle 10. /rest/api/v1/datagenerator/external/{name}/status
  

name

The configured datagenerator name



The response is a string representing the status of the datagenerator. The result content type is text/plain and should be requested accordingly:

Example. 

GET /rest/api/v1/datagenerator/external/xigtbewetb/status HTTP/1.1
Accept: text/plain
Authorization: Basic dXNlcjpwYXNzd29yZA==
Host: localhost:8181

An example response look like this: CRAWLING

In case of success a status code 200 (OK) is returned.

The following table contains the main HTTP response codes:

Tabelle 11. Response codes
   

200

OK

The request was successful.

404

NOT FOUND

The datagenerator name does not exist.

415

UNSUPPORTED MEDIA TYPE

Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'.