In the haupia context the search index is filled by so called datagenerators. It is possible to add own datagenerators in addition to the provided ones. These are called external datagenerators and may be implemented based on the external datagenerators REST API. Prior to using the external datagenerator REST API two requirements have to be fulfilled. First an external datagenerator has to be created in the haupia cockpit. It is later identified by its name. Second a technical user has to exist with the right to execute a datageneration. All calls to the REST API have to authenticate by using an authentication header (See section 4.2 of RFC 7235) and type “Basic”. If this is not the case or the credentials are not valid, the REST services will return with an HTTP status code of 405 (Not authorized). The REST service works exclusively with JSON data, besides the call to the datagenerator status. It is expected that for all calls the Accept header is set to "application/json". For calls which send data in JSON format a Content-Type header with value "application/json" is required. The usage of the API is similar to using a transaction. At first a begin has to be issued. Then documents are added to the datagenerator. Once all documents have been submitted the session is concluded by calling commit. At any time during the session it is possible to abort it. After the session is over, either by commit or abort, the held resources on the server are freed.
Before the data is synced the configured enhancers are applied. These are configured in the haupia cockpit. |
Most services include the possibility to send an additional message. This message is logged on INFO and also sent to the cockpit to be shown on the datagenerator list page.
The haupia REST API for external datagenerators is available since haupia version 2.0.0.59. |
The haupia REST API for external datagenerators supersedes the haupia 1 Java frontend API (external datagenerator part). |
Every datageneration is based on a storage. The storage is used to persist the documents during the data generation until they are synced with the index. When beginning a datageneration it may be chosen to apply a fresh storage or a storage pre-filled with documents from the datagenerators last one. To begin a datageneration the following REST call may be used:
POST
/rest/api/v1/datagenerator/external/{name}/begin
name | The configured datagenerator name |
The body of the REST call may be a JSON object containing the desired storage creation type. This is optional, if no body is sent, a fresh storage is applied. A full example of a call to the REST service:
Example call.
POST /rest/api/v1/datagenerator/external/eocpyfmhzr/begin HTTP/1.1 Content-Type: application/json Accept: application/json Authorization: Basic dXNlcjpwYXNzd29yZA== Host: localhost:8181 Content-Length: 64 { "message" : "Started DG", "creationType" : "COPY_LATEST" }
The JSON object of the request body is as follows:
creationType | String | OPTIONAL The storage creation type. May be one of: NEW, COPY_LATEST |
message | String | OPTIONAL Message used for logging and sent to the cockpit. |
If the JSON object doest not contain the storage creation type the default one is applied.
The response call has no content.
There may be only one datageneration active at a time.
If it is tried to start a datageneration while another is running the response code is 412 (PRECONDITION FAILED).
The following table contains the main HTTP response codes:
204 | NO CONTENT | The datageneration was successfully started. |
400 | BAD REQUEST | The given JSON data was either an array or not a valid JSON string. |
404 | NOT FOUND | A datagenerator with the name does not exists. |
412 | PRECONDITION FAILED | There is already a datageneration running. |
415 | UNSUPPORTED MEDIA TYPE | Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'. |
Once a datageneration has been started one or more documents needs to be added. A document is represented as JSON object with the key "data" and the document data as value. The data itself is a JSON object. The entries of the object is mapped to a document by using the key as field name and the value as field value. The value itself may be represented in the JSON object as simple string or an array in case there are more than one values. It is of course also possible to represent the value with an array containing one value which is exactly the same as the simple string. These possibilities are all illustrated in this example:
Example.
{ "uid": "abc123", "data": { "key1": [ "val1", "val2", "val3" ], "key2": "val4", "key3": [ "val5" ] } }
The uid of the document is a required field. It is expected, that the uid is truly unique across the datageneration. If a document already exists with the same id, the “old” document is replaced.
The REST service is defined as follows:
PUT
/rest/api/v1/datagenerator/external/{name}
name | The configured datagenerator name |
The body of the REST call is required and contains the document to add.
Example.
PUT /rest/api/v1/datagenerator/external/ciozgcbkvh HTTP/1.1 Content-Type: application/json Accept: application/json Authorization: Basic dXNlcjpwYXNzd29yZA== Host: localhost:8181 Content-Length: 101 { "message" : "", "data" : { "toldt" : [ "rvhhf", "wdzun", "perda" ] }, "uid" : "mdzaj" }
In case of success a status code 204 (NO CONTENT) is returned.
The following table contains the main HTTP response codes:
204 | NO CONTENT | The datageneration was successfully added. |
400 | BAD REQUEST | The given JSON data was either an array or not a valid JSON string. Also if the uid is missing in the document this status code is returned. |
404 | NOT FOUND | The datagenerator name does not exist. |
415 | UNSUPPORTED MEDIA TYPE | Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'. |
As soon as all documents are added the datagenerator is committed. During this stage the enhancers will iterate over all documents, the storage is synced and the resources are cleaned up. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.
POST
/rest/api/v1/datagenerator/external/{name}/commit
name | The configured datagenerator name |
The body of the REST call is optional and may contain the message with should be submitted with the commit.
Example.
POST /rest/api/v1/datagenerator/external/vlrqjvjhhn/commit HTTP/1.1 Content-Type: application/json Accept: application/json Authorization: Basic dXNlcjpwYXNzd29yZA== Host: localhost:8181 Content-Length: 29 { "message" : "Commit DG" }
In case of success a status code 204 (NO CONTENT) is returned.
The following table contains the main HTTP response codes:
204 | NO CONTENT | The commit was successfully started. |
404 | NOT FOUND | The datagenerator name does not exist. |
415 | UNSUPPORTED MEDIA TYPE | Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'. |
It is possible to abort a running datageneration at any time. On the server side reserved resources will be freed. The REST call itself will return immediately thus will not mirror any problems during sync or when the process is finished.
POST
/rest/api/v1/datagenerator/external/{name}/abort
name | The configured datagenerator name |
The body of the REST call is optional and may contain the message with should be submitted with the abort.
Example.
POST /rest/api/v1/datagenerator/external/ysxiplvrtd/abort HTTP/1.1 Content-Type: application/json Accept: application/json Authorization: Basic dXNlcjpwYXNzd29yZA== Host: localhost:8181 Content-Length: 28 { "message" : "Abort DG" }
In case of success a status code 204 (NO CONTENT) is returned.
The following table contains the main HTTP response codes:
204 | NO CONTENT | The abort was successfully started. |
404 | NOT FOUND | The datagenerator name does not exist. |
415 | UNSUPPORTED MEDIA TYPE | Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'. |
The datagenerator has a specific status on the server. These are for example shown on the datagenerator list page. An external datagenerator may only be started if the current state is IDLE or ERROR otherwise the call to begin will return with PRECONDITION FAILED. To get the current state of a datagenerator this REST service may be used:
GET
/rest/api/v1/datagenerator/external/{name}/status
name | The configured datagenerator name |
The response is a string representing the status of the datagenerator. The result content type is text/plain and should be requested accordingly:
Example.
GET /rest/api/v1/datagenerator/external/xigtbewetb/status HTTP/1.1 Accept: text/plain Authorization: Basic dXNlcjpwYXNzd29yZA== Host: localhost:8181
An example response look like this: CRAWLING
In case of success a status code 200 (OK) is returned.
The following table contains the main HTTP response codes:
200 | OK | The request was successful. |
404 | NOT FOUND | The datagenerator name does not exist. |
415 | UNSUPPORTED MEDIA TYPE | Either the accept header (as well as the content-type header if content is sent) is missing or not set to 'application/json'. |
haupia is a product of e-Spirit AG, Dortmund, Germany.
Only a license agreed upon with e-Spirit AG is valid with respect to the user for using the module.