1. Introduction

The CaaS platform is the link between FirstSpirit and the customer’s end application. The REST Interface receives information and updates it in the internal persistence layer of the CaaS platform. An update of data in the customer’s end application is done by requests to the REST Interface.

The CaaS platform includes the following components, which are available as docker containers:

REST Interface (caas-rest-api)

The REST Interface is used both for transferring and retrieving data to and from the repository. For this purpose it provides a REST endpoint that can be used by any service. It also supports authentication and authorization.

Between CaaS version 2.11 and 2.13 (inclusive), the authentication and authorization functionality was provided by a separate Security Proxy.

CaaS repository (caas-mongo)

The CaaS repository is not accessible from the Internet and can be only accessed within the platform via the REST Interface. It serves as a storage for all project data and internal configuration.

2. Technical requirements

The operation of the CaaS platform has to be realized with Kubernetes.

If you do not feel able to operate, configure, monitor, and analyze and resolve operating problems of the cluster infrastructure accordingly, we strongly advise against on-premises operation and refer to our SaaS offering.

Since the CaaS-platform is delivered as Helm artifact, the Helm client must be available.

It is important that Helm is installed in a secure manner. For more information, refer to the Helm Installation Guide.

For system requirements please consult the technical data sheet of the CaaS platform .

3. Installation and configuration

The setup of the CaaS platform for operation with Kubernetes is done by using Helm-Charts. These are part of the delivery and already contain all necessary components.

The following subchapters describe the necessary installation and configuration steps.

3.1. Import of the images

The first step in setting up the CaaS platform for operation with Kubernetes requires the import of the images into your central Docker registry (e.g. Artifactory). The images are contained in the file caas-docker-images-18.2.0.zip in the delivery.

The credentials for cluster access to the repository must be known.

The steps necessary for the import can be found in the documentation of the registry you are using.

3.2. Configuration of the Helm chart

After the import of the images the configuration of the Helm chart is necessary. This is part of the delivery and contained in the file caas-18.2.0.tgz. A default configuration of the chart is already make in the values.yaml file. All parameters specified in this values.yaml can be overwritten with a manually created custom-values.yaml by a specific value.

3.2.1. Authentication

All authentication settings for the communication with or within the CaaS platform are specified in the credentials block of the custom-values.yaml. So here you will find usernames and default passwords as well as the CaaS Master API Key. It is strongly recommended adjusting the default passwords and the CaaS Master API Key.

All selected passwords must be alphanumeric. Otherwise, problems will occur in connection with CaaS.

The CaaS Master API Key is automatically created during the installation of CaaS and thus allows the direct use of the REST Interface.

3.2.2. CaaS repository (caas-mongo)

The configuration of the repository includes two parameters:

storageClass

The possibility of overwriting parameters from the values.yaml file mainly affects the parameter mongo.persistentVolume.storageClass.

For performance reasons, we recommend that the underlying MongoDB filesystem is provisioned with XFS.

clusterKey

For the authentication key of the Mongo Cluster a default configuration is delivered. The key can be defined in the parameter credentials.clusterKey. It is strongly recommended that you use the following command to create a new key for productive operation:

openssl rand -base64 756

This value may only be changed during the initial installation. If it is changed at a later time, this can lead to a permanent unavailability of the database, which can only be repaired manually.

3.2.3. Docker- Registry

An adjustment of the parameters imageRegistry and imageCredentials is necessary to configure the used Docker registry.

sample configuration in a custom-values.yaml
imageRegistry: docker.company.com/e-spirit

imageCredentials:
   username: "username"
   password: "special_password"
   registry: docker.company.com
   enabled: true

3.2.4. Ingress Configurations

Ingress-Definitions control the incoming traffic to the respective component. However, the definitions contained in the chart are not created in the standard configuration. The parameters restApi.ingress.enabled and restApi.ingressPreview.enabled allow the ingress configuration for the REST Interface.

The Ingress definitions of the Helm chart assume the NGINX Ingress Controller to be used, since annotations plus the class of this concrete implementation are used. If you are using a different implementation, you must adapt the annotations and the attribute spec.ingressClassName of the Ingress definitions in your custom-values.yaml file accordingly.

Ingress creation in a custom-values.yaml
restApi:
   ingress:
      enabled: true
      hosts:
         - caas.company.com
   ingressPreview:
      enabled: true
      hosts:
         - caas-preview.company.com

If the setting options are not sufficient for the specific application, the Ingress can also be generated independently. In this case the corresponding parameter must be set to the value enabled: false. The following code example provides an orientation for the definition.

Ingress definition for the REST Interface
apiVersion: networking.k8s.io/v1
child: Ingress
metadata:
   labels:
   name: caas
spec:
   rules:
   - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: caas-rest-api
            port:
              number: 80
   host: caas-rest-api.mydomain.com
   ingressClassName: my-ingress-caas

3.3. Installation of the Helm-Chart

After the configuration of the Helm-chart it has to be installed into the Kubernetes cluster. The installation is done with the following commands, which must be executed in the directory of the Helm-chart.

Installation of the chart
kubectl create namespace caas
helm install RELEASE_NAME . --namespace=caas --values /path/to/custom-values.yaml

The name of the release can be chosen freely.

If the namespace is to have a different name, you must replace the specifications within the commands accordingly.

If an already existing namespace is to be used, the creation is omitted and the desired namespace must be specified within the installation command.

Since the containers are first downloaded from the used image registry, the installation can take several minutes. Ideally, however, a period of five minutes should not be exceeded before the CaaS platform is operational.

The status of each component can be obtained with the following command:

kubectl get pods --namespace=caas

Once all components have the status Running, the installation is complete.

NAME                                 READY     STATUS        RESTARTS   AGE
caas-mongo-0                         2/2       Running       0          4m
caas-mongo-1                         2/2       Running       0          3m
caas-mongo-2                         2/2       Running       0          1m
caas-rest-api-1851714254-13cvn       1/1       Running       0          5m
caas-rest-api-1851714254-13cvn       1/1       Running       0          4m
caas-rest-api-1851714254-xs6c0       1/1       Running       0          4m

3.4. TLS

The communication of the CaaS platform to the outside world is not encrypted by default. If it is to be protected by TLS, there are two configuration options:

Using an officially signed certificate

To use an officially signed certificate, a TLS secret is required, which must be generated first. This must contain the keys tls.key and the certificate tls.crt.

The steps necessary to generate the TLS secret are described in the Kubernetes Ingress Documentation.

Automated certificate management

As an alternative to using an officially signed certificate, it is possible to automate the administration using the cert manager. This must be installed within the cluster and takes over the generation, distribution and updating of all required certificates. The configuration of the Cert-Manager allows for example the use and automatic renewal of Let’s-Encrypt-Certificates.

The necessary steps for installation are explained in the Cert-Manager-Documentation.

3.5. Scaling

In order to be able to quickly process the information transferred to CaaS, the CaaS platform must ensure optimal load distribution at all times. For this reason, the REST Interface and the Mongo database are scalable and already configured to deploy at least three instances at a time for failover. This minimum number of instances is mandatory, especially for the Mongo cluster.

3.5.1. REST Interface

The scaling of the REST Interface is done with the help of a Horizontal Pod Autoscaler. Its activation as well as configuration must be done in the custom-values.yaml file to overwrite the default values defined in the values.yaml file.

default configuration of the REST Interface
restApi:
  horizontalPodAutoscaler:
    enabled: false
    minReplicas: 3
    maxReplicas: 9
    targetCPUUtilizationPercentage: 50

The Horizontal Pod Autoscaler allows to scale down or up the REST Interface depending on the current CPU load. The parameter targetCPUUtilizationPercentage specifies the percentage value from which scaling should take place. At the same time the parameters minReplicas and maxReplicas define the minimum and maximum number of possible REST Interfacen instances.

The threshold value for the CPU load should be chosen with care:
If too low a percentage is selected, the REST Interface scales up too early in the case of increasing load. If too high a percentage is selected, the REST Interface will not scale fast enough as the load increases.

A wrong configuration can therefore endanger the stability of the system.

The official Kubernetes Horizontal Pod Autoscaler-documentation as well as the examples listed in it contain further information on the use of an Horizontal Pod Autoscaler.

3.5.2. Mongo database

We distinguish horizontal scaling from vertical scaling here. Horizontal scaling means additional instances that handle the traffic. Vertical scaling means allocating more CPU/RAM to existing instances.

Horizontal scaling

Unlike REST Interface, horizontally scaling the Mongo database is only possible manually. Therefore, it cannot be performed automatically using a Horizontal Pod Autoscaler.

Scaling the Mongo database is done using the replicas parameter. This parameter must be entered in the custom-values.yaml file to override the default value defined in the values.yaml file.

At least three instances are required to run the Mongo Cluster, otherwise there is no Primary node available and the database is not writable. If the number of available instances falls below a value of 50% of the configured instances, no more Primary nodes can be selected. However, this is essential for the functionality of the REST Interface.

The chapter Consider Fault Tolerance of the MongoDB documentation describes how many nodes can explicitly fail, until the determination of a new Primary node is impossible. The information contained in the documentation must be taken into account when scaling the installation.

Further information on scaling and replicating the Mongo database is available in the chapters Replica Set Deployment Architectures and Replica Set Elections.

definition of the replica parameter
mongo:
  replicas: 3

Do not scale the StatefulSet directly in K8s. If you do, certain templated connection urls will not be correct and the additional instances won’t be used properly. Use the helm chart values instead.

A downscaling of the Mongo database is not possible without direct intervention and requires a manual reduction of the replicaset of the Mongo database. The MongoDB documentation describes the necessary steps for this.
Additionally, after removing the deleted instances in the replicaset configuration it is recommended that you also delete the corresponding persistent volume claims. Not doing so runs the risk that should you ever scale up again in the future the instances will not be added automatically to the replicaset.

Such intervention increases the risk of failure and is therefore not recommended.

Vertical scaling

Vertical scaling is done using a Vertical Pod Autoscalers. Vertical Pod Autoscalers are Custom Resources in Kubernetes, so first you need to ensure support in your cluster.

After that, you can configure the following parameters in your custom-values.yaml:

Configuration of the Vertical Pod Autoscaler
mongo:
  verticalPodAutoscaler:
    enabled: false
    apiVersion: autoscaling.k8s.io/v1beta2
    updateMode: Auto
    minAllowed:
      cpu: 100m
      memory: 500Mi
    maxAllowed:
      cpu: 1
      memory: 2000Mi

Applying the configuration

The updated custom-values.yaml file must be applied after the configuration changes for the REST Interface or Mongo database with the following command.

upgrade command
helm upgrade -i RELEASE_NAME path/to/caas-<VERSIONNUMBER>.tgz --values /path/to/custom-values.yaml

The release name can be determined with the command helm list --all-namespaces.

3.6. Monitoring

The CaaS platform is a microservice architecture and therefore consists of different components. In order to be able to monitor its status properly at any time and to be able to react quickly in the event of an error, integration in a cluster-wide monitoring system is absolutely essential for operation with Kubernetes.

The CaaS platform is already preconfigured for monitoring with Prometheus Operator, since this scenario is widely used in the Kubernetes environment. It includes Prometheus ServiceMonitors for collecting metrics, Prometheus Alerts for notification in case of problems and predefined Grafana dashboards for visualizing the metrics.

3.6.1. Requirements

It is essential to set up monitoring and log persistence for the Kubernetes cluster. Without these prerequisites, there are hardly any analysis possibilities in case of a failure and Technical Support lacks important information.

Metrics

To install the Prometheus Operator please use the official Helm-Chart, so that cluster monitoring can be set up based on it. For further information please refer to the corresponding documentation.

If you are not running a Prometheus Operator, you must turn off the Prometheus ServiceMonitors and Prometheus Alerts.

Logging

With the use of Kubernetes it is possible to provide various containers or services in an automated and scalable way. To ensure that the logs remain in such a dynamic environment even after an instance has been terminated, an infrastructure must be integrated that persists the instance beforehand.

Therefore, we recommend the use of a central logging system, such as Elastic-Stack. The Elastic or ELK stack is a collection of open source projects that help to persist, search and analyze log data in real time.

Here too, you can use an existing Helm-Chart for the installation.

3.6.2. Prometheus ServiceMonitors

The deployment of the ServiceMonitors provided by the CaaS platform for the REST Interface and the mongo database, is controlled via the custom-values.yaml file of the Helm-Charts.

The access to the metrics of the REST Interface is secured by API Key and the access to the metrics of the MongoDB by a corresponding MongoDB user. The respective access data is contained in the credentials block of the values.yaml file of the Helm-Charts.

Please adjust the credentials in your custom-values.yaml file for security reasons.

Typically, Prometheus is configured to consider only ServiceMonitors with specific labels. The labels can therefore be configured in the custom-values.yaml file and are valid for all ServiceMonitors of the CaaS Helm chart. Furthermore, the parameter scrapeInterval allows a definition of the frequency with which the respective metrics are retrieved.

monitoring:
  prometheus:
    # Prometheus service monitors will be created for enabled metrics. Each Prometheus
    # instance has a configured serviceMonitorSelector property, to be able to control
    # the set of matching service monitors. To allow defining matching labels for CaaS
    # service monitors, the labels can be configured below and will be added to each
    # generated service monitor instance.
    metrics:
      serviceMonitorLabels:
        release: "prometheus-operator"
      mongo:
        enabled: true
        scrapeInterval: "30s"
      caas:
        enabled: true
        scrapeInterval: "30s"

The MongoDB metrics are provided via a sidecar container and retrieved with the help of a separate database user. You can configure the database user in the credentials block of the custom-values.yaml. The sidecar container is stored with the following standard configuration:

mongo:
  metrics:
    image: mongodb-exporter:0.11.0
    syncTimeout: 1m

3.6.3. Prometheus Alerts

The deployment of the alerts provided by the CaaS platform is controlled via the custom-values.yaml file of the Helm-Charts.

Prometheus is typically configured to consider only alerts with specific labels. The labels can therefore be configured in the custom-values.yaml file and apply to all alerts in the CaaS Helm chart:

monitoring:
  prometheus:
    alerts:
      # Labels for the PrometheusRule resource
      prometheusRuleLabels:
        app: "prometheus-operator"
        release: "prometheus-operator"
      # Additional Prometheus labels to attach to alerts (or overwrite existing labels)
      additionalAlertLabels: {}
      caas:
        enabled: true
        useAlphaAlerts: false
        # Namespace(s) that should be targeted by the alerts (supports Go template and regular expressions)
        targetNamespace: "{{ .Release.Namespace }}"

3.6.4. Grafana Dashboards

The deployment of the Grafana dashboards provided by the CaaS platform is controlled via the custom-values.yaml file of the Helm-Charts.

Typically, the Grafana Sidecar Container is configured to consider only configmaps with specific labels and in a defined namespace. The labels of the configmap and the namespace in which it is deployed can therefore be configured in the custom-values.yaml file:

monitoring:
  grafana:
    dashboards:
      enabled: true
      # Namespace that the ConfigMap resource will be created in (supports Go template and regular expressions)
      configmapNamespace: "{{ .Release.Namespace }}"
      # Additional labels to attach to the ConfigMap resource
      configMapLabels: {}
      overviewDashboardsEnabled: false

4. Development Environment

Kubernetes and Helm form the basis of all CaaS platform installations. In case of development environments we recommend installing CaaS platform into a separate namespace on your production cluster or any cluster configured similarly. We do not recommend using local CaaS platform instances, even for development.

If you need a local environment on developer machines you have to create a local Kubernetes cluster to be used. One of the following projects may be used to achieve this:

This list does not claim to be exhaustive. Rather, it is intended to give some examples of which we know that operation is generally possible without us permanently using these projects ourselves.

Each of these projects can be used to manage Kubernetes clusters locally. However, we’re not able to give you support for any of these specific projects. The CaaS platform uses only standard Helm and Kubernetes features and is thus independent of any particular Kubernetes distribution.

Please be sure to configure the following features correctly when using a local Kubernetes cluster:

  • Kubernetes Image Pull Secrets to resolve the docker images from your local or company Docker registry

  • disabling monitoring features in custom-values.yaml or installing the needed prerequisites

  • tweaking host systems DNS settings to be able to work with Kubernetes Ingress resources or using local port forwards into the cluster

5. REST Interface

5.1. Authentication

Each request to the REST Interface must be authenticated, otherwise it will be rejected. The various authentication options are explained below.

5.1.1. Authentication with API Keys

Each request to the REST Interface must contain an HTTP Authorization header containing the API Key as Bearer token: Authorization: Bearer <key>. The value of key is expected to be the value of the key attribute of the corresponding API Key.

See the Validation of API Keys section below for more information.

5.1.2. Authentication with security token

It is possible to generate a short-lived (up to 24 hours) security token for an API Key. The token contains the same permissions as the API Key which it was generated for. There are two ways to generate and use these tokens:

Query Parameter

A GET request authenticated with an API Key to the /_logic/securetoken?tenant=<db> endpoint generates a security token. Such a token can be issued only for one specified database, regardless of whether the API Key has permissions on multiple databases. The parameter &ttl=<lifetime in seconds> is optional. The JSON response contains the security token.

Each request to the REST Interface can optionally be authenticated using a query parameter ?securetoken=<token>.

A GET request authenticated with an API Key to the /_logic/securetokencookie?tenant=<db> endpoint generates a security token cookie. Such a cookie can be issued only for one specified database, regardless of whether the API Key has permissions on multiple databases. The parameter &ttl=<lifetime in seconds> is optional. The response includes a set-cookie header with the security token.

All requests that include this cookie get automatically authenticated.

5.1.3. Authentication order

If multiple authentication mechanisms are used at the same time in a request, only the first one will be evaluated. The order is as follows:

  1. The securetoken query parameter.

  2. The Authorization header.

  3. The securetoken cookie.

5.2. Query documents and media

The REST Interface can be used to manage and query content in the form of JSON documents over HTTP. They are stored in so-called collections, which are subordinated to databases. The following three-part URL scheme applies:

https://REST-HOST:PORT/<database>/<collection>/<document>

database

This part of the URL contains the tenant ID.

collection

The collection is composed of the FirstSpirit-project UUID and the respective preview or release state.

document

In this case, the UUID of the FirstSpirit element is used together with the language locale.

Binary content (media) is an exception in that it is stored in so-called buckets. The associated collections always end with the suffix .files:

https://REST-HOST:PORT/<tenant>/<project>.<release|preview>.files/<document>

Please note that binary content is not transferred to the CaaS buckets in our cloud offering.

5.2.1. HAL format

The interface returns all results in HAL format. This means that they are not simply raw data, such as traditionally unstructured content in JSON format.

The HAL format offers the advantage of simple but powerful structuring. In addition to the required content, the results contain additional meta-information on the structure of this content.

Example

{
   "_size": 5,
   "_total_pages": 1,
   "_returned": 3,
   "_embedded": { CONTENT }
}

In this example a filtered query was sent. Without knowing the exact content, its structure can be read directly from the meta information. At this point, the REST Interface returns three results from a set of five documents corresponding to the filter criteria and displays them on a single page.

If the element to be requested is a medium, the URL only determines its metadata. The HAL format contains corresponding links that refer to the URL with the actual binary content of the medium. For further information please refer to the documentation.

5.2.2. Page size of queries

The results of REST Interface are always delivered paginated. To control the page size and requested page, the HTTP query parameters pagesize and page can be used for GET requests. The default value for the pagesize parameter is set to 20 in the CaaS platform and the maximum is set to 100. These values can be changed in your custom-values.yaml in case of on-premises installation. For more information, see the RESTHeart documentation.

5.2.3. Use of filters

Filters are always used when documents are not to be determined by their ID but by their content. In this way, both single and multiple documents can be retrieved.

For example, the query of all English language documents from the products collection has the following structure:

https://REST-HOST:PORT/Database/products?filter={fs_language: "EN"}

Beyond this example there are further filter possibilities. For more information, see query documentation.

5.2.4. Resolving references

CaaS documents can reference other CaaS documents. When fetching documents the referenced content will often times also be needed. In that case, the query parameter resolveRef can be used to avoid sequential requests.

The following 2 JSON document examples demonstrate this:

{
  "_id": "my-document",
  "fsType": "ProjectProperties",
  "formData": {
    "ps_audio": {
      "fsType": "FS_REFERENCE",
      "value": {
        "fsType": "PageRef",
        "url": "https://caas-api.e-spirit.cloud/my-db/col/my-referenced-document"
      }
    }
  }
}
{
  "_id": "my-referenced-document",
  "fsType": "PageRef",
  "name": "audio"
}

In the first document, the JSON at formData.ps_audio.value.url contains an absolute URL to another document in CaaS. The following request example shows how that reference can be resolved in the same request:

curl -X GET --location "https://caas-api.e-spirit.cloud/my-db/col/my-document?resolveRef=formData.ps_audio.value.url" \
    -H "Authorization: Bearer my-api-key"

The value of the query parameter must specify the path in the JSON to the URL. The response then contains an additional attribute _resolvedRefs:

{
  "_id": "my-document",
  "fsType": "ProjectProperties",
  "formData": {
    "ps_audio": {
      "fsType": "FS_REFERENCE",
      "value": {
        "fsType": "PageRef",
        "url": "https://caas-api.e-spirit.cloud/my-db/col/my-referenced-document"
      }
    }
  },
  "_resolvedRefs": {
    "https://caas-api.e-spirit.cloud/my-db/col/my-referenced-document": {
      "_id": "my-referenced-document",
      "fsType": "PageRef",
      "name": "audio"
    }
  }
}

Resolving references is affected by the Configuration and limitations!

The resolveRef parameter can also be used for queries on collections. In this case, the references in all returned documents are resolved. The documents of the resolved references are collected in a new document, which is added to the array of the response.

curl -X GET --location "https://caas-api.e-spirit.cloud/my-db/col?resolveRef=formData.ps_audio.value.url" \
    -H "Authorization: Bearer my-api-key"
[
  {
    "_id": "my-document",
    "fsType": "ProjectProperties",
    "formData": {
      "ps_audio": {
       "fsType": "FS_REFERENCE",
       "value": {
         "fsType": "PageRef",
         "url": "https://caas-api.e-spirit.cloud/my-db/col/my-referenced-document"
       }
      }
    }
  },
  {
    "_id": "_resolvedRefs",
    "https://caas-api.e-spirit.cloud/my-db/col/my-referenced-document": {
      "_id": "my-referenced-document",
      "fsType": "PageRef",
      "name": "audio"
    }
  }
]

This additional document is always the last element in the array of the response and can be identified by the ID _resolvedRefs.

The query parameter pagesize does not affect this document, which means that the actual size of the response array can be pagesize + 1.

Transitive references

Referenced documents may in turn contain further references. These can also be resolved in the original request. To do this, the path to the next reference must also be specified in the request, including the prefix $i., where i represents the depth of the reference resolution chain.

Using the previous example as a reference, the following request example demonstrates this and assumes that my-referenced-document contains another reference in the attribute page.url.

curl -X GET --location "https://caas-api.e-spirit.cloud/my-db/col?resolveRef=formData.ps_audio.value.url&resolveRef=$1.page.url" \
    -H "Authorization: Bearer my-api-key"
Reference path syntax

Depth 0 describes the documents that are returned for the original request. The prefix $0. is optional for depth 0. Depth 1+ means all documents that were found by successfully resolving the references at the previous depth.

Depth JSON document resolveRef Explanation

0

{
  "data": {
    "url": "<url>"
  }
}

data.url

In all documents of depth 0, references are searched for under the path data.url and resolved.

$0.data.url

{
  "data": [
    { "url": "<url1>" },
    { "url": "<url2>" }
  ]
}

data[*].url

The array with the name data is searched in all documents of depth 0. * means that all objects in the array are searched. The value of url is resolved as a reference in these objects.

data[0].url

Resolves url only in the first object of the array.

data[1].url

Resolves url only in the second object of the array.

{
  "data": [
    [ {"url": "<url1>"} ],
    [ {"url": "<url2>"} ]
  ]
}

data[*][*].url

All arrays in the data array are searched. The value of url is resolved as a reference in these objects.

data[0][*].url

The first array in the data array is searched. In all objects of the array, url is resolved.

1

<JSON document>

$1.<path>

Die Pfade müssen mit $1. anfangen. Ansonsten können sie genauso wie bei Tiefe 0 angegeben werden.

n

<JSON document>

$<n>.<path>

Die Pfade müssen mit $<n>. anfangen, wobei <n> der Tiefe der Auflösungskette entspricht. Ansonsten können sie genauso wie bei Tiefe 0 angegeben werden.

Configuration and limitations
  • Only absolute URLs can be resolved.

  • No errors are thrown when resolving the references if incorrect paths or URLs are specified. Incorrect paths and references are silently ignored.

  • Paths must be specified according to the Reference path syntax. The syntax is based on JsonPath, but only the documented operators are supported.

  • The URLs are not normalized in the _resolvedRefs object. URL references that are not exactly identical are treated as different documents, even if they point to the same document in CaaS. This can be caused, for example, by an additional / at the end of the URL or by different query parameters.

  • By default, the maximum depth of the reference resolution is 3. This means that at most the reference path prefix $2. can be used. This can be configured in the custom Helm values by setting the value restApi.additionalConfigOverrides./refResolvingInterceptor/max-depth.

  • A maximum of 100 references can be resolved in one request. When this limit is reached, no other reference are resolved and won’t be part of the response. The maximum number can be configured in the custom Helm values by setting the value restApi.additionalConfigOverrides./refResolvingInterceptor/limit.

  • The reference resolution is activated by default. It can be deactivated in the custom Helm values by setting the value restApi.additionalConfigOverrides./refResolvingInterceptor/enabled: false.

5.3. Storage of documents

The HTTP methods POST, PUT and PATCH can be used for storing documents. Documents can also be deleted with the DELETE method.

The following excerpt shows the creation of a document my-document within the collection my-collection, which is located in the database my-db.

curl --location --request PUT 'https://REST-HOST:PORT/my-db/my-collection/my-document' \
--header 'Authorization: Bearer my-api-key' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": "some-data"
}'

For more information about saving documents, refer to corresponding sections in the RESTHeart documentation.

When saving documents using the POST, PUT or PATCH methods, the write mode upsert is used by default. This differs from the default used by RESTHeart. Further information about the write mode can be found in the RESTHeart documentation.

5.4. Managing databases and collections

Unlike the storage of documents, the management of databases and collections is limited to the HTTP methods PUT and DELETE.

The following excerpt shows the creation of the database my-db with a PUT request.

curl --location --request PUT 'https://REST-HOST:PORT/my-db' \
--header 'Authorization: Bearer my-api-key'

There are reserved databases that can’t be used for saving content. The reserved database names include caas_admin, _logic, graphql. For more information regarding the use of the reserved database graphql see Managing GraphQL apps.

Further information on database management can be found in the corresponding sections of the RESTHeart documentation.

A collection my-collection can be created in the database my-db with a PUT request as follows.

Managing databases is not supported in our SaaS offering due to access restrictions.

curl --location --request PUT 'https://REST-HOST:PORT/my-db/my-collection' \
--header 'Authorization: Bearer my-api-key'

There are reserved collections that can’t be used for saving content. The reserved collection names include apikeys and gql-apps. For more information regarding their uses see Management of API Keys → REST endpoints and Managing GraphQL apps.

For more information on managing collections, see corresponding sections in the RESTHeart documentation.

5.5. Management of API Keys

API Keys, like all other resources in CaaS, can be managed via REST endpoints. In general, it is important to distinguish that API Keys can be managed at two levels: global or local per database. Global API Keys differ from local API Keys by their scope of validity.

When using an API Key for authentication, the CaaS platform always searches the local API Keys first. If no matching API Key is found, the global API Keys are evaluated afterwards.

5.5.1. Global API Keys

Global API Keys are cross-database and are managed in the apikeys collection of the caas_admin database. Unlike local API Keys, they allow permissions to be defined for multiple or even all databases.

5.5.2. Local API Keys

Local API Keys are defined per database and are managed accordingly in the apikeys collection of any database. Unlike global API Keys, local API Keys can only define permissions for resources within the same database.

5.5.3. Authorization Model

The authorization of an API Key is performed on the basis of all of its permission entries. Its permission entries are defined in the permissions attribute.

The url attribute of a permission is used to check whether access should be granted. The value is compared with the URL path of an incoming request. What type of comparison is executed depends on the mode of the permission.

There are three different modes:

  • PREFIX and REGEX

    With mode PREFIX, the authorization checks whether the url attribute of the permission is a prefix of the URL path of an incoming request.

    The REGEX mode expects a regular expression in the url attribute. Using this mode, the authorization checks whether the regular expression pattern matches the URL path of an incoming request.

    Additionally, for the modes PREFIX and REGEX a general authorization distinction is made when it comes to the type of the API Key. Global API Keys always check against the entire path of the request, while local API Keys check against the part of the path after the database. For more information regarding global and local API Keys see the example Local and global API Key distinction or chapter Management of API Keys.

  • GRAPHQL
    The mode GRAPHQL of a permission authorizes the execution of a specific GraphQL app. During an authorization check the url attribute of the permissions must exactly match the URI of a GraphQL app that an incoming request is trying to execute. The URI of a GraphQL app is defined in the descriptor.uri attribute of an app definition. See chapter GraphQL for more information.

Local and global API Key distinction

The following table includes examples that illustrates the authorization distinction that is made for local and global API keys when using the permission mode PREFIX or REGEX.

Table 1. API Key authorization
authorization in API Key type of API Key request URL path Allowed

/

global

/

yes

/project/

yes

/project/content/

yes

/other-project/

yes

/other-project/content/

yes

/project/

global

/

no

/project/

yes

/project/content/

yes

/other-project/

no

/other-project/content/

no

/

local in project

/

no

/project/

yes

/project/content/

yes

/other-project/

no

/other-project/content/

no

/content/

local in project

/

no

/project/

no

/project/content/

yes

/other-project/

no

/other-project/content/

no

5.5.4. REST endpoints

The following endpoints are available for managing API Keys:

  • GET /<database>/apikeys

  • POST /<database>/apikeys
    Note: the parameters _id and key are mandatory and must have identical values

  • PUT /<database>/apikeys/{id}
    Note: the parameter key must have the same value as the {id} in the URL

  • DELETE /<database>/apikeys/{id}

To manage API Keys, an API Key can also be used as an authorization method. In this case, the API Key used must have write permission on the corresponding API Keys collection. This is also true for read-only requests and prevents privilege escalation.

The database is based on the type of API Key.

The following snippet shows the example creation of a local API Key.

curl "https://REST-HOST:PORT/<tenant>/apikeys" \
     -H 'Content-Type: application/json' \
     -u '<USER>:<PASSWORD>' \
     -d $'{
  "_id": "1e0909b7-c943-45a5-ae96-79f294249d48",
  "key": "1e0909b7-c943-45a5-ae96-79f294249d48",
  "name": "New-Apikey",
  "description": "Some descriptive text",
  "permissions": [
    {
      "url": "/<collection>",
      "permissionMode": "PREFIX",
      "methods": [
        "GET",
        "PUT",
        "POST",
        "PATCH",
        "DELETE",
        "HEAD",
        "OPTIONS"
      ]
    }
  ]
}'

In this example, a new ApiKey is created via cURL, which has the appropriate permissions set in the (via the url attribute) to the specified collection.

To create an API Key with permission mode REGEX simply adjust the above example as follows:

"url": "<regex>",
"permissionMode": "REGEX",

The apikeys collections are reserved for API Keys and cannot be used for normal content. They are automatically added to existing databases along with a validation scheme when the application is started, and also created during runtime when databases are created/updated.

5.5.5. Validation of API Keys

Each API Key is validated against a stored JSON schema when created and updated. The JSON schema secures the basic structure of API Keys and can be queried at /<database>/_schemas/apikeys.

Further validations ensure that no two API Keys can be created with the same key. Likewise, an API Key must not contain a URL more than once.

If an API Key does not satisfy the requirements, the corresponding request is rejected with HTTP status 400.

If the JSON schema has not been successfully stored in the database before, requests are answered with HTTP status 500.

The key attribute of a API Key should contain a valid UUID. The format of a UUID is strictly specified by https://tools.ietf.org/html/rfc4122 [RFC 4122]. This includes, in particular, the presence of lowercase letters. Although the CaaS platform does not currently validate this property, we reserve the right to enable this restriction in the future.

5.6. Indexes for efficient query execution

The runtime of queries with filters can get longer as the number of documents in a collection grows. If it exceeds a certain value, the query is answered by the REST Interface with HTTP status 408. More efficient execution can be achieved by creating an index on the attributes used in the affected filter queries.

For detailed information on database indices, please refer to the documentation of the MongoDB.

5.6.1. Predefined indexes

If you have CaaS Connect in use, predefined indices are already created that support some frequently used filter queries. The exact definitions can be found at https://REST-HOST:PORT/Database/Collection/_indexes/.

5.6.2. Customer-specific indexes

If the predefined indices do not cover your use cases and you observe long response times or even request timeouts, you can create your own indexes. The REST Interface can be used to manage the desired indexes. The procedure is described in the RESTHeart documentation.

Please only create the indexes you need.

5.7. Managing GraphQL apps

It is possible to create, update and delete GraphQL applications with the REST Interface.

For a quick introduction to these apps see the tutorial Getting started with GraphQL Apps.

These applications consist of a definition and a GraphQL schema. The definition must first be created in the reserved collection gql-apps for each desired GraphQL application. The collection is automatically created when databases are created/updated. However, if it does not yet exist in the database, it must be created beforehand with a PUT request.

When editing the GraphQL app definitions (e.g., Create/update app), the permissions of the API Key are validated. An operation can only be performed if the API Key has access to all databases and collections listed in the definition.

The gql-apps collections of all databases are reserved for GraphQL purposes and cannot be used for normal content.

Mutations are currently not supported.

5.7.1. Create/update app

To create or update a GraphQL app, send a PUT request including an app definition to the following URL:
https://REST-HOST:PORT/<tenant>/gql-apps/<tenant>___<name>

A GraphQL app can retrieve data using MongoDB queries or aggregations. See Object Mappings for more information. An example using an aggregation to fetch data is provided in the appendix.

Please note that variables are not supported inside any permission-related attributes of aggregation stages, such as database or collection names.

Just like page size limits in REST Interface queries, there is a limit to the amount of results in GraphQL queries as well. The default value is set to 20. Please use arguments for pagination to query more documents. For more information see section Field to Field mapping of RestHearts Object Mappings chapter.

Creating the definition will provision an endpoint at the following URL:
https://REST-HOST:PORT/graphql/<tenant>___<name>

For more information on how to execute GraphQL queries see chapter GraphQL API.

Complex mappings with multiple foreign key relationships may result in increased query response times. For more efficient query execution, we recommend using indexes. Configuring batching and caching can also help optimize response times. Details can be found in this documentation.

All operations on GraphQL applications via the REST interface are mirrored in the background to the global caas_admin/gql-apps collection for technical reasons.

The Re-synchronization of existing GraphQL apps chapter contains more information on how this mechanism can also be executed manually.

There are two ways for creating a GraphQL app:

Using JSON body ("application/json")

The app definition with the sections descriptor, schema and mapping is passed in the request body (content type application/json) . The descriptor.uri parameter must match the last path segment of the URL (i.e. <tenant>___<name>) and is not optional in the CaaS - contrary to what is stated in the RESTHeart documentation.

An example of such a GraphQL app definition can be found in the GraphQL example application chapter.

Using file upload ("multipart/form-data")

The app definition and schema can also be created/updated using a multipart upload (content type multipart/form-data). This allows storing the app definition and schema as separate files and more importantly, the schema does not need to be JSON encoded when creating/updating the app.

To upload the app definition and schema, each must be present in the request as individual parts:

  • Part name: app

    • Contains the app definition as JSON.

  • Part name: schema

    • Contains the raw text version of the schema in the GraphQL schema definition language.

Uploading files using curl
curl -i -X PUT \
  -H "Authorization: Bearer $API_KEY" \
  -F app=@my-app-def.json \
  -F schema=@my-schema.gql \
  https://REST-HOST:PORT/<tenant>/gql-apps/<tenant>___<name>

For fast feedback cycles during development, this can be combined with a tool to watch for file changes to continuously update the GraphQL app:

Uploading files continuously using fswatch and curl
fswatch -o my-app-def.json my-schema.gql | xargs -n1 -I{} \
  curl -i -X PUT \
  -H "Authorization: Bearer $API_KEY" \
  -F app=@my-app-def.json \
  -F schema=@my-schema.gql \
  https://REST-HOST:PORT/<tenant>/gql-apps/<tenant>___<name>

5.7.2. Delete app

To delete a GraphQL app, a DELETE request is made to the following URL:
https://REST-HOST:PORT/<tenant>/gql-apps/<tenant>___<name>

5.7.3. Re-synchronization of existing GraphQL apps

Under certain conditions, such as after restoring individual collections from a backup, it might happen that the previously created GraphQL apps are out of sync. In such a case, all tenant-specific GraphQL apps must be resynchronized.

To trigger the resynchronization of all existing GraphQL apps of all tenants, send a POST request against the /_logic/sync-gql-apps endpoint.

5.8. Push notifications (change streams)

It is often convenient to be notified about changes in the CaaS platform. For this purpose the CaaS platform offers change streams. This feature allows a websocket connection to be established to the CaaS platform, through which events about the various changes are published.

Change streams are created by putting a definition in the metadata of a collection. If you use CaaS Connect, a number of predefined change streams are already created for you. You also have the option to define your own change streams.

The format of the events corresponds to standard MongoDB events.

When working with websockets, we recommend taking into account connection failures that may occur. Regular ping messages and a mechanism for automatic connection recovery should be included in your implementation.

You can find an example of using change streams in the browser in the appendix.

5.9. Additional information

Additional information regarding the functionality of the REST interface can be found in the official RESTHeart documentation.

6. GraphQL API

Each of the GraphQL applications defined through the management API (see Managing GraphQL apps) provisions a GraphQL API endpoint. This endpoint can be used to fetch data (see Fetch data).

Mutations are currently not supported.

6.1. Authentication and authorization

The authentication and authorization works different for GraphQL queries compared to queries for the REST Interface. Unlike queries for the REST Interface where the request path is evaluated against a set of allowed URLs, GraphQL queries are authorized using one of the following permission checks.

Local API Keys additionally have a precondition that they can only access GraphQL apps of the same database.

  1. Explicit execution permission
    Checks whether the API Key has an explicit authorization (GRAPHQL permissions of the API Key) to execute queries for the GraphQL app.

  2. Implicit execution permission
    Checks whether the API Key has access to all databases and collections. This is done by validating the URL permissions (PREFIX and REGEX) of the API Key against the paths of all databases and collections that the underlying GraphQL app provides access to.

For more information about the different kinds of permissions an API Key can define see chapter Authorization Model.

6.2. Fetch data

The GraphQL API can be queried through HTTP endpoints at:

https://REST-HOST:PORT/graphql/<app-uri>

To query data, send a POST request with JSON to the desired endpoint and specify the query in the request body, for example:

Querying data via GraphQL using curl
curl -i -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"query": "query($lang: [String!]){products(_language: $lang) {name description categories {name} picture {name binaryUrl width height}}}", "variables": {"lang": ["EN"]}}' \
  https://REST-HOST:PORT/graphql/<app-uri>

You can find a more elaborate example of using GraphQL in the appendix.

7. Metrics

Metrics are used for monitoring and error analysis of CaaS components during operation and can be accessed via HTTP endpoints. If metrics are available in Prometheus format, corresponding ServiceMonitors are generated for this purpose, see also Prometheus ServiceMonitors.

7.1. REST Interface

Healthcheck

The Healthcheck endpoint provides information about the functionality of the corresponding component in the form of a JSON document. This status is calculated from several checks. If all checks are successful, the JSON response has the HTTP status 200. As soon as at least one check has the value false, the response has HTTP status 500.

The query is made using the URL: \\http://REST-HOST:PORT/_logic/healthcheck

The functionality of the REST Interface depends on the accessibility of the MongoDB cluster as well as on the existence of a primary node. If the cluster does not have a primary node, it is not possible to perform write operations on the MongoDB.

HTTP Metrics

Performance metrics of the REST Interface can be retrieved in Prometheus format at the following URLs:

  • http://REST-HOST:PORT/_logic/metrics

  • http://REST-HOST:PORT/_logic/metrics-caas

  • http://REST-HOST:PORT/_logic/metrics-jvm

7.2. MongoDB

The metrics of the MongoDB are provided by a sidecar container. This container accesses the MongoDB metrics with a separate database user and provides them via HTTP.

The metrics can be accessed at the following URL: http://MONGODB-HOST:METRICS-PORT/metrics.

Please note that the MongoDB metrics are delivered via a separate port. This port is not accessible from outside the cluster and therefore not protected by authentication.

8. Maintenance

The transfer of data to CaaS can only work if the individual components work properly. If faults occur or an update is necessary, all CaaS components must therefore be considered. The following subchapters describe the necessary steps of an error analysis in case of a malfunction and the execution of a backup or update.

8.1. Error analysis

CaaS is a distributed system and is based on the interaction of different components. Each of these components can potentially generate errors. Therefore, if a failure occurs while using CaaS, it can have several causes. The basic analysis steps for determining the causes of faults are explained below.

Status of the components

The status of each component of the CaaS platform can be checked using the kubectl get pods --namespace=<namespace> command. If the status of an instance differs from running or ready, it is recommended to start debugging at this point and check the associated log files.

If there are problems with the Mongo database, check whether a Primary node exists. If the number of available instances falls below 50% of the configured instances, no more Primary nodes can be selected. However, this is essential for the functionality of the REST Interface. The absence of a Primary node means that the pods of the REST Interface no longer have the status ready and are therefore unreachable.

The chapter Consider Fault Tolerance of the MongoDB documentation describes how to avoid this, how many nodes can explicitly fail until the determination of a new primary node is impossible

Analysis of the logs

In case of problems, the log files are a good starting point for analysis. They offer the possibility to trace all processes on the systems. In this way, any errors and warnings become apparent.

Current log files of the CaaS components can be viewed using kubectl --namespace=<namespace> logs <pod>, but only contain events that occurred within the lifetime of the current instance. To be able to analyze the log files after a crash or restart of an instance, we recommend setting up a central logging system.

The log files can only be viewed for the currently running container. For this reason, it is necessary to set up a persistent storage to access the log files of already finished or newly started containers.

8.2. Backup

The architecture of CaaS consists of different, independent components that generate and process different information. If there is a need for data backup, this must therefore be done depending on the respective component.

A backup of the information stored in CaaS must be performed using the standard mechanisms of the Mongo database. This can either be done by creating a copy of the underlying files or by using mongodump.

8.3. Update

Operating the CaaS platform with Helm in Kubernetes provides the possibility of updating to the new version without the need for a new installation.

Before updating the Mongo database, a Backup is strongly recommended.

The helm list --all-namespaces command first returns a list of all already installed Helm charts. This list contains both the version and the namespace of the corresponding release.

sample list of installed releases
\$ helm list --all-namespaces
NAME            NAMESPACE    REVISION  UPDATED             STATUS    CHART        APP VERSION
firstinstance   integration  1         2019-12-11 15:51..  DEPLOYED  caas-2.10.4  caas-2.10.4
secondinstance  staging      1         2019-12-12 09:31..  DEPLOYED  caas-2.10.4  caas-2.10.4

To update a release, the following steps must be carried out one after the other:

Transfer the settings

To avoid losing the previous settings, it is necessary to have the custom-values.yaml file with which the initial installation of the Helm chart was carried out.

Adoption of further adjustments

If there are adjustments to files (e.g. in the config directory), these must also be adopted.

Update

After performing the previous steps, the update can be started. It replaces the existing installation with the new version without any downtime. To do this, execute the following command, which starts the process:

helm upgrade RELEASE_NAME caas-18.2.0.tgz --values /path/to/custom-values.yaml

9. Appendix

9.1. Examples

9.1.1. Change stream example

Usage of change streams with Javascript and Browser API
<script type="module">
  import PersistentWebSocket from 'https://cdn.jsdelivr.net/npm/pws@5/dist/index.esm.min.js';

  // Replace this with your API key (needs read access for the preview collection)
  const apiKey = "your-api-key";

  // Replace this with your preview collection url (if not known copy from CaaS Connect Project App)
  // e.g. "https://REST-HOST:PORT/my-tenant-id/f948bb48-4f6b-4a8a-b521-338c9d352f2b.preview.content"
  const previewCollectionUrl = new URL("your-preview-collection-url");

  const pathSegments = previewCollectionUrl.pathname.split("/");
  if (pathSegments.length !== 3) {
    throw new Error(`The format of the provided url '${previewCollectionUrl}' is incorrect and should only contain two path segments`);
  }

  (async function(){
    // Retrieving temporary auth token
    const token = await fetch(new URL(`_logic/securetoken?tenant=${pathSegments[1]}`, previewCollectionUrl.origin).href, {
      headers: {'Authorization': `Bearer ${apiKey}`}
    }).then((response) => response.json()).then((token) => token.securetoken).catch(console.error);

    // Establishing WebSocket connection to the change stream "crud"
    // ("crud" is the default change stream that the CaaS Connect module provides)
    const wsUrl = `wss://${previewCollectionUrl.host + previewCollectionUrl.pathname}`
      + `/_streams/crud?securetoken=${token}`;
    const pws = new PersistentWebSocket(wsUrl, { pingTimeout: 60000 });

    // Handling change events
    pws.onmessage = event => {
      const {
        documentKey: {_id: documentId},
        operationType: changeType,
      } = JSON.parse(event.data);
      console.log(`Received event for '${documentId}' with change type '${changeType}'`);
    }
  })();
</script>

9.1.2. GraphQL example application

This chapter describes a use case for a GraphQL application as an example. For this we will outline the individual steps that belong to the creation of a GraphQL application and its later use.

Create the GraphQL app definition

In the example scenario, a GraphQL application is created that can be used to query data records located in the CaaS. The data sets used here are the products from the example project of the fictitious company “Smart Living”. Image references and product categories in the data sets are resolved directly.

The entire command to create the GraphQL app definition for this example scenario looks like this.

Example of a full GraphQL app definition
curl --location --request PUT 'https://REST-HOST:PORT/mycorp-dev/gql-apps/mycorp-dev___products' \
--header 'Authorization: Bearer <PERMITTED_APIKEY>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "descriptor": {
        "name": "products",
        "description": "example app to fetch product relevant information from SLG",
        "enabled": true,
        "uri": "mycorp-dev___products"
    },

    "schema": "type Picture{ name: String! identifier: String! binaryUrl: String! width: Int! height: Int! } type Category{ name: String! identifier: String! } type Product{ name: String! identifier: String! description: String categories: [Category] picture: Picture } type Query{ products(_language: [String!] = [\"DE\", \"EN\"]): [Product] }",

    "mappings": {
        "Category": {
            "name": "displayName",
            "identifier": "_id"
        },
        "Picture": {
            "name": "displayName",
            "identifier": "_id",
            "binaryUrl": "resolutionsMetaData.ORIGINAL.url",
            "width": "resolutionsMetaData.ORIGINAL.width",
            "height": "resolutionsMetaData.ORIGINAL.height"
        },
        "Product": {
            "name": "displayName",
            "identifier": "_id",
            "description": "formData.tt_abstract.value",
            "picture": {
                "db": "mycorp-dev",
                "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
                "find": {
                    "identifier": {
                        "$fk": "formData.tt_media.value.0.formData.st_media.value.identifier"
                    },
                    "locale.identifier": {
                        "$fk": "locale.identifier"
                    }
                }
            },
            "categories": {
                "db": "mycorp-dev",
                "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
                "find": {
                    "identifier": {
                        "$in": {
                            "$fk": "formData.tt_categories.value.identifier"
                        }
                    },
                    "locale.identifier": {
                        "$fk": "locale.identifier"
                    }
                }
            }
        },
        "Query": {
            "products": {
                "db": "mycorp-dev",
                "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
                "find": {
                    "locale.identifier": { "$in": { "$arg": "_language" } },
                    "entityType": "product"
                }
            }
        }
    }
}'

When creating a GraphQL app definition, the schema must be specified as a JSON string. For better readability, we recommend formatting the schema more appropriately.

Schema of the GraphQL app definition

The schema used for the example contains the following definitions.

Example of a formatted GraphQL schema
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
type Picture {
    name: String!
    identifier: String!
    binaryUrl: String!
    width: Int!
    height: Int!
}

type Category {
    name: String!
    identifier: String!
}

type Product {
    name: String!
    identifier: String!
    description: String
    categories: [Category]
    picture: Picture
}

type Query {
    products(_language: [String!] = ["DE", "EN"]): [Product]
}

Lines 1, 9 and 14 of the schema are the starting point for the type definitions of the objects used in the GraphQL app. In addition, each GraphQL schema contains a query type (line 22) that defines what data can be queried by a GraphQL app. More details about schemas in GraphQL can be found in the GraphQL documentation.

In line 23 we define a query with the name products, which returns a collection of [Product]. To specify the languages in which we need this data, we add the _language variable. As most of our customers are German or English, we also add a default value of ["DE", "EN"]. This marks the variable as optional.

Mapping the GraphQL app definition

The GraphQL app definition mapping represents the connection between the schema and the data in the database. Each type described in the schema generally requires an explicit entry, so this part of a GraphQL app definition is usually the longest. There may be situations where the fields in the type should be named exactly as the keys of the data. In this special case, no explicit entry in the mapping is necessary. For details about the mapping in GraphQL app definition, see the corresponding chapter in RESTHeart documentation.
The following example is an excerpt from creating a GraphQL app definition and clarifies some use cases.

Example of a GraphQL mapping
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
  "Category": {
    "name": "displayName",
    "identifier": "_id"
  },
  "Picture": {
    "name": "displayName",
    "identifier": "_id",
    "binaryUrl": "resolutionsMetaData.ORIGINAL.url",
    "width": "resolutionsMetaData.ORIGINAL.width",
    "height": "resolutionsMetaData.ORIGINAL.height"
  },
  "Product": {
    "name": "displayName",
    "identifier": "_id",
    "description": "formData.tt_abstract.value",
    "picture": {
      "db": "mycorp-dev",
      "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
      "find": {
        "identifier": {
          "$fk": "formData.tt_media.value.0.formData.st_media.value.identifier"
        },
        "locale.identifier": {
          "$fk": "locale.identifier"
        }
      }
    },
    "categories": {
      "db": "mycorp-dev",
      "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
      "find": {
        "identifier": {
          "$in": {
            "$fk": "formData.tt_categories.value.identifier"
          }
        },
        "locale.identifier": {
          "$fk": "locale.identifier"
        }
      }
    }
  },
  "Query": {
    "products": {
      "db": "mycorp-dev",
      "collection": "d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content",
      "find": {
        "locale.identifier": {
          "$in": {
            "$arg": "_language"
          }
        },
        "entityType": "product"
      }
    }
  }
}

The first use case considered is the so-called field to field mapping. In this type of mapping, a field in the type is assigned a corresponding attribute of the data. An example of this can be seen in line 3, where the field Category.name from the schema refers to attribute displayName from the data.

The second use case is the field to query mapping. Here a field in the type is mapped to the result of a data query. An example of such a mapping can be found in line 45ff: the field Query.products is mapped by the data found in the REST Interface under /mycorp-dev/d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content and correspond to the filters entityType": "product" and "locale.identifier": { "$in": { "$arg":"_language" } }. This means that exactly those products are queried which are located in the defined source, represent an entity of “product” and use one of the language abbreviations passed in the “_language” argument.

Another example of a “field to query mapping” can be found starting at line 29. In this mapping definition, the product categories, which are maintained in separate records, are identified using a foreign key relationship. The complete entry from line 29-42 states that the product.categories field will list all product categories that are under /mycorp-dev/d8db6f24-0bf8-4f48-be47-5e41d8d427fd.preview.content, whose identifier is stored in the formData.tt_categories.value.identifier field, and whose locale.identifier exactly matches what is in the product record as locale.identifier. Since a product can reference multiple categories in the dataset under formData.tt_categories.value.identifier, the key $in is used here.

If multiple filters are specified in a "find", they are automatically joined, eliminating the need for an additional parenthesized "$and.

Using the GraphQL app

Requests can now be made to a GraphQL application using this app definition.

GraphQL query example
curl --location --request POST 'https://REST-HOST:PORT/graphql/mycorp-dev___products' \
--header 'Authorization: Bearer <PERMITTED_APIKEY>' \
--header 'Content-Type: application/json' \
--data-raw '{"query": "query($lang: [String!]){products(_language: $lang) {name description categories {name} picture {name binaryUrl width height}}}", "variables": {"lang": ["DE"]}}'

This request example shows how to call the GraphQL app using cURL. The app is always available at /graphql/<descriptor.uri>. Through this query, product data is retrieved depending on the variable $lang. The variable is passed as a value to the _language argument defined in the schema. Since a default value for _language is included in the schema, specifying a value is optional in this scenario. Further details on query arguments and variables can be found in the GraphQL documentation.

9.1.3. GraphQL example application using Aggregations

Aggregations can be used to create complex queries to the CaaS. For example, it’s possible to dynamically add a computed attribute to the returned documents. Much more complex aggregations can be created. A full list of possible aggregation stages and operators can be found here.

Example of a GraphQL app with an aggregation inside the query mapping
{
  "_id": "mytenantid-dev___pagerefs",
  "descriptor": {
    "name": "pagerefs",
    "description": "Query PageRefs",
    "enabled": true,
    "uri": "mytenantid-dev___pagerefs"
  },
  "schema": "type PageRef { _id: String projectId: String } type Query{ pageRefs(projectId: String): [PageRef] }",
  "mappings": {
    "PageRefs": {
      "count": "count"
    },
    "Query": {
      "pageRefs":{
        "db": "mytenantid-dev",
        "collection": "641154a9-b90c-4b10-a5f7-38677cbb5abc.release.content",
        "stages": [
          { "$match": { "fsType":"PageRef" }},
          { "$addFields": { "projectId": { "$arg": "projectId" } } }
        ]
      }
    }
  }
}

9.2. Tutorials

9.2.1. Getting started with GraphQL Apps

This tutorial serves as an introduction to creating and using GraphQL apps. To complete it you need to be familiar with the REST Interface in terms of authentication and querying documents.

GraphQL apps allow you to query your CaaS documents via your own GraphQL API endpoints. These endpoints also make the document data accessible in a specific format/schema that you define.

In the following steps we will be creating such a GraphQL app and use it to query document data. We’ll be using curl to send HTTP requests, but you can use other alternative tools.

  1. Create sample documents

    Let’s create a collection to store some sample documents.

    # set these once so you can re-use them for other commands
    TENANT='YOUR-TENANT-ID'
    API_KEY='YOUR-API-KEY'
    
    curl --location --request PUT "https://$TENANT-caas-api.e-spirit.cloud/$TENANT/posts" \
    --header "Authorization: Bearer $API_KEY"

    And create the documents.

    # you can execute this multiple times to create many documents
    curl --location "https://$TENANT-caas-api.e-spirit.cloud/$TENANT/posts" \
    --header "Authorization: Bearer $API_KEY"
    --header 'Content-Type: application/json' \
    --data "{
        \"content\": \"My post created at $(date)..\"
    }"
  2. Define the desired GraphQL schema

    Now that we have documents available, we need to define a schema for accessing their data.

    We’ll save a simple data model for the sample documents we created earlier, but you can create arbitrarily complex data models.

    Save GraphQL schema definition (schema.gql)
    cat > schema.gql << EOF
    type BlogPost {
      content: String!
    }
    
    type Query {
      posts: [BlogPost!]
    }
    EOF
  3. Create the GraphQL API endpoint

    The next step is to create the GraphQL app using our schema, which automatically provisions a new API endpoint.

    First, however, we need define a couple of things so that CaaS knows how to provision our new endpoint and how to fetch/map the documents to our schema. We do this by creating a GraphQL app definition.

    Save GraphQL app definition (app.json)
    cat > app.json << EOF
    {
        "descriptor": {
            "name": "myposts",
            "description": "Example app to fetch blog posts.",
            "enabled": true,
            "uri": "${TENANT}___myposts"
        },
        "mappings": {
            "Query": {
                "posts": {
                    "db": "$TENANT",
                    "collection": "posts",
                    "find": {
                        "content": { "\$exists": true }
                    }
                }
            }
        }
    }
    EOF

    Now that we have prepared a schema (schema.gql) and a corresponding app definition (app.json), we can use them both to create a GraphQL app using the REST Interface.

    Creating GraphQL app
    curl -X PUT \
    -H "Authorization: Bearer $API_KEY" \
    -F app=@app.json \
    -F schema=@schema.gql \
    https://$TENANT-caas-api.e-spirit.cloud/$TENANT/gql-apps/${TENANT}___myposts
  4. Query data using the new endpoint

    At this point, CaaS has automatically provisioned a new GraphQL API endpoint using our definitions. The endpoint is available at

    https://\{YOUR-TENANT-ID}-caas-api.e-spirit.cloud/graphql/\{YOUR-TENANT-ID}___myposts

    and we can query our documents using the new endpoint.

    curl --location "https://$TENANT-caas-api.e-spirit.cloud/graphql/${TENANT}___myposts" \
    --header "Authorization: Bearer $API_KEY" \
    --header 'Content-Type: application/json' \
    --data '{"query":"{ posts { content } }","variables":{}}'
  5. Congratulations on querying data using your own GraphQL app! You should see a result similar to this.

    {
      "data": {
        "posts": [
          {
            "content": "My post created at Tue Aug  8 17:08:32 CEST 2023.."
          },
          {
            "content": "My post created at Tue Aug  8 17:12:23 CEST 2023.."
          }
        ]
      }
    }

The default page size for GraphQL queries is 20. If you need to query more documents than that, please use pagination arguments. For more information see section Field to Field mapping of RestHearts Object Mappings chapter.

9.3. Troubleshooting: Known errors

9.3.1. File upload using PUT request fails

The errors

  • E11000 duplicate key error collection: [some-file-bucket].chunks index: files_id_1_n_1 dup key

  • or error updating the file, the file bucket might have orphaned chunks

indicate the presence of orphaned file chunks in the MongoDB data. The orphaned data can be removed with the following mongo shell script:

Clean up orphaned file chunks of a specific file bucket
// Name of the file bucket to clean up (e.g., my-bucket.files)
var filesBucket = "{YOUR_FILE_BUCKET_NAME}";

var chunksCollection = filesBucket.substring(0, filesBucket.lastIndexOf(".")) + ".chunks";
db[chunksCollection].aggregate([
  // avoid accumulating binary data in memory
  { $unset: "data" },
  {
      $lookup: {
        from: filesBucket,
        localField: "files_id",
        foreignField: "_id",
        as: "fileMetadata"
      }
  },
  { $match: { fileMetadata: { $size: 0 } } }
]).forEach(function (c) {
  db[chunksCollection].deleteOne({ _id: c._id });
  print("Removed orphaned GridFS chunk with id " + c._id);
});

10. Help

The Technical Support of the Crownpeak Technology GmbH provides expert technical support covering any topic related to the FirstSpirit™ product. You can get and find more help concerning relevant topics in our community.

11. Disclaimer

This document is provided for information purposes only. Crownpeak Technology GmbH may change the contents hereof without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. Crownpeak Technology GmbH specifically disclaims any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. The technologies, functionality, services, and processes described herein are subject to change without notice.