Design

CONSIDERATIONS

Containerized

Lives in its own container.

Distributed

There will be multiple instances per site for HA. If one of the instances dies, the queue replica from another instance can be used to process the records.

Extensions

The core engine can be extended to achieve features like asynchronous replication.

Backend agnostic

All interactions will go through S3. This way it is not restricted to one backend and we can reuse existing solutions for different backends.

Background Jobs

Backbeat will include a background tasks which runs on a crontab schedule.

Multi-threaded

Using Kafka’s KeyedMessage mechanism, multiple consumers can process entries in parallel achieving high throughput.

DESIGN OVERVIEW

design

DEPLOYMENT (Mono/Multi-site)

The nodes of a Backbeat cluster must live within one given datacenter. Ideally there will be one standalone cluster per site. Each cluster will have multiple instances for HA.

Backbeat will be the backbone of features like asynchronous replication which rely on such local services to perform remote actions. The basis of replication relies on versioning, Metadata and Data propagation from site to site, which does not need the message queue to be shared between sites (and we don’t want that either because the internals of Backbeat are not necessarily geo-aware.

KAFKA

Kafka is used as a queue manager for Backbeat.

PROS

CONS

APPLYING BACKBEAT

USING BACKBEAT FOR ASYNC REPLICATION

crr

For asynchronous replication, we can use Backbeat to replicate the objects from a bucket of one region to the other. The design highlights how this feature can be achieved by using Backbeat. Reference: AWS rules of what is and what is not replicated http://docs.aws.amazon.com/AmazonS3/latest/dev/crr-what-is-isnot-replicated.html

Note: For Active/Passive setup, the target will be READONLY. This would mean that in a Disaster Recovery setup for a customer, if the customer loses site A they can point themselves to site B to perform READONLY actions until site A comes back up. Although they could write to site B these writes will not be replicated to site A (one of the many reasons being site A may have data that hasn’t been fully replicated to site B when the failure occurred).

SECURITY

OSS LICENSES

STATISTICS FOR SLA, METRICS etc

There are two ways we can approach this. Pub/Sub events can be used in addition to the MetaData log in a separate topic (let’s call it statistics). The records in this topic can be leveraged by comparing to the active queue to generate statistics like

Use a decoupled topic in addition to the queue topic. This will be managed by the producers/consumers adding records for non-replicated and replicated entries. Since each entry would have a sequence number, calculating the difference between the sequence numbers of the latest non-replicated and replicated records would give us the required statistics.

WRITING EXTENSIONS

A backbeat extension allows to add more functionality to the core backbeat asynchronous processor. E.g. Asynchronous Replication is one of the extensions available for backbeat.

Extensions are located in the extensions/ directory, with a sub-directory named after the extension name (lowercase).

Extending the Queue Populator

The queue populator is a core backbeat process which reads entries from the metadata log periodically and provides them to all running extensions through the filter() method, while maintaining the offsets of the latest processed log entries.

Extensions can publish messages to one or more kafka topics on reception of particular log entry events.

To achieve this, in the extension module directory, create an ‘index.js’ file exporting the attributes defining the extension metadata and entrypoints, e.g. to create a “mycoolext” extension, put in extensions/mycoolext/index.js:

module.exports = {
    name: 'my cool extension',
    version: '1.0.0',
    queuePopulatorExtension: require('./MyCoolExtensionPopulator.js'),
};

Here, MyCoolExtensionPopulator.js exports a class that:

Extensions are enabled based on config.json having a sub-section in “extensions” matching the name of the extension directory.

E.g. this config.json enables mycoolext extension, and provides configuration param someparam as “somevalue” to extension classes.

{
    [...]
    "extensions": {
        "mycoolext": {
            "someparam": "somevalue"
        }
    }
}