Self-hosting Nunux Keeper

May 17, 2017

Self hosting

Before running any script, let’s have a quick overview of the software architecture. Many software are used to power the solution, and a first glance at the architecture may give you the feeling that the solution is a bit complex. You would be wrong.

A great approach to describe an architecture is a good drawing. An efficient schema model is the C4 model introduced by Simon Brown. This model is inspired by the geographic mapping. By analogy with Google Map you start at a global scale of the system: the context. Your system is summed up as a big box with short phrases explaining global features. All interactions with external systems or actors are also illustrated. Then you zoom in to an area of the map: the container level. It is the internal scaffolding of your systems. Which containers compounds the system and how they interact each other. Zoom in to the component level. It is the internal structure of a container. Finally you zoom again to see classes of a component. A class is a direct binding with the code.

Context, Container, Components and Class: Here come the C4 model.

By the way, if you enjoy this kind of diagram, feel free to download and adapt them. It is SVG format and it’s open source!

So let’s start with a overview of the system:

System context diagram

As you can see the system is straightforward: The system consumes web content of external systems and expose this content thru an REST API. This API handles main features that we can expect from a content curation system. Finally, the API can be consumed by other systems like a CLI, a Web App, etc.

Now, like using Google Map, we zoom in a bit to figure out how the system is structured.

Here the container diagram:

Container diagram

The core of the system is the API container. It is a Node.js app powered by Express. This container exposes a RESTFul API protected with JWT. The token creation is delegated to an external system. It can be forged by Auth0 but in our case it will be forged by a great open source IAM product: Keycloak.

The Core API stores web documents and meta data inside a NoSQL Data Store. This document data store can be MongoDB or ElasticSearch. The data store also persists other entity like labels, users and sharing informations.

In order to enable full text search, documents are indexed by a search engine: ElasticSearch.

Binary files attached to documents are stored inside an object storage container. This container can be a classic file system or S3 compatible object storage service. These files are downloaded by a job worker.

All asynchronous tasks are handled by Job Workers. This container implementation is based on Kue. This distributed job framework uses Redis as job queuing system. Job Workers are autonomous and can be deployed in parallel to handle the load. There are Job Workers to handle file downloads, import/export tasks and some administration task like database cleanup.

The Core API and some job workers produce metrics using the StatsD protocol. Any StatsD collector can be used to collect, aggregate and forward metrics to a Time Series Database (such as OpenTSDB, Prometheus or InfluxDB). For instance, the hosting platform of Nunux Keeper uses Telegraf and InfluxDB to handle those metrics.

Regarding the self-hosting need, we can stop here. You should now have a good understanding of Nunux keeper architecture.

Now, it is time to set up all of them. To handle this task, our best friend will be Docker and its new stack creation capability.


For this purpose we create a dedicated project: keeper-docker

Installation is simple:

$ git clone
$ cd keeper-docker
$ make deploy

With a bit of patience you will get the following services up and running:

Check the project repository README for more details about the installation and what could be missing to get a full and production ready installation.

Welcome in the wonderful world of the self hosting!

But if you don’t want to handle this you are welcome on our hosting platform.