Git Product home page Git Product logo

lightningstream's Introduction

Lightning Stream

User documentation can be found here

Go build Documentation build Go Reference

Lightning Stream is a tool to sync changes between a local LMDB (Lightning Memory-Mapped Database) and an S3 bucket in near real-time. If the application schema is compatible, this can be used in a multi-writer setup where any instance can update any data, with a global eventually-consistent view of the data in seconds.

Our main target application is the sync of LMDB databases in the PowerDNS Authoritative Nameserver (PDNS Auth). We are excited about how Lightning Stream simplifies running multiple distributed PowerDNS Authoritative servers, with full support for keeping DNSSEC keys in sync. Check the Getting Started section to understand how you can use Lightning Stream together with the PowerDNS Authoritative server.

Its use is not limited to the PowerDNS Authoritative server, however. Lightning Stream does not make any assumptions about the contents of the LMDB, and can be used to sync LMDBs for other applications, as long as the data is stored using a compatible schema.

Basic Operation

Lightning Stream is deployed next to an application that uses an LMDB for its data storage:

Overview

Its operation boils down to the following:

  • Whenever it detects that the LMDB has changed, it writes a snapshot of the data to an S3 bucket.
  • Whenever it sees a new snapshot written by a different instance in the S3 bucket, it downloads the snapshot and merges the data into the local LMDB.

The merge of a key is performed based on a per-record last-modified timestamp: the most recent version of the entry wins. Deleted entries are cleared and marked as deleted, together with their deletion timestamp. This allows Lightning Stream to provide Eventual Consistency across all nodes.

If the application uses a carefully designed data schema, this approach can be used to support multiple simultaneously active writers. In other instances, it can often be used to sync data from one writer to multiple read-only receivers. Or it can simply create a near real-time backup of a single instance.

Building

At the moment of writing, this project requires Go 1.19. Please check the go.mod file for the current version.

To install the binary in a given location, simply run:

GOBIN=$HOME/bin go install ./cmd/lightningstream/

Or run ./build.sh to install it in a bin/ subdirectory of this repo.

Easy cross compiling is not supported, because the LMDB bindings require CGo.

Example in Docker Compose

This repo includes an example of syncing the PowerDNS Authoritative Nameserver LMDB. It runs two DNS servers, each with their own syncer, syncing to a bucket in a MinIO server.

The Lightning Stream config used can be found in docker/pdns/lightningstream.yaml. Note that the config file contents can reference environment variables.

To get it up and running:

docker-compose up -d

You may need to rerun this command once, because of a race condition creating the LMDBs.

To see the services:

docker-compose ps

This should show output like:

         Name                        Command               State                                    Ports
-------------------------------------------------------------------------------------------------------------------------------------------
lightningstream_auth1_1   /run.sh                          Up      127.0.0.1:4751->53/tcp, 127.0.0.1:4751->53/udp, 127.0.0.1:4781->8081/tcp
lightningstream_auth2_1   /run.sh                          Up      127.0.0.1:4752->53/tcp, 127.0.0.1:4752->53/udp, 127.0.0.1:4782->8081/tcp
lightningstream_minio_1   /usr/bin/docker-entrypoint ...   Up      127.0.0.1:4730->9000/tcp, 127.0.0.1:4731->9001/tcp
lightningstream_sync1_1   /usr/local/bin/lightningst ...   Up      127.0.0.1:4791->8500/tcp
lightningstream_sync2_1   /usr/local/bin/lightningst ...   Up      127.0.0.1:4792->8500/tcp

Open one terminal with all the logs:

docker-compose logs

Then in another terminal call these convenience scripts, with a delay between them to allow for syncing:

docker/pdns/pdnsutil -i 1 create-zone example.org
docker/pdns/pdnsutil -i 1 secure-zone example.org
docker/pdns/pdnsutil -i 1 set-meta example.org foo bar
docker/pdns/pdnsutil -i 2 generate-tsig-key example123 hmac-sha512

sleep 2

docker/pdns/curl-api -i 2 /api/v1/servers/localhost/zones/example.org
docker/pdns/curl-api -i 2 /api/v1/servers/localhost/zones/example.org/metadata
docker/pdns/curl-api -i 1 /api/v1/servers/localhost/tsigkeys

To view a dump of the LMDB contents:

docker/pdns/dump-lmdb -i 1
docker/pdns/dump-lmdb -i 2

You can browse the snapshots in MinIO at http://localhost:4731/buckets/lightningstream/browse (login with minioadmin / minioadmin).

Open Source

This is the documentation for the Open Source edition of Lightning Stream. For more information on how we provide support for Open Source products, please read our blog post on this topic.

PowerDNS also offers an Enterprise edition of Lightning Stream that includes professional support, advanced features, deployment tooling for large deployments, Kubernetes integration, and more.

lightningstream's People

Contributors

ahouene avatar bobdeschot avatar bodenhaltung avatar dependabot[bot] avatar franklouwers avatar habbie avatar horazont avatar joel-ling avatar jsoref avatar nvaatstra avatar wojas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightningstream's Issues

PDNS hang issue with lightningstream and containers

I was playing with PowerDNS and Lightningstream in Kubernetes and ran into a very weird issue. It is a StatefulSet and storage comes from Longhorn.
PowerDNS auth and Lightningstream are both the latest versions and it is running on ARM64.

The issue is that a query to pdns (dns or rest api) hangs and returns nothing. Executing the pdnsutil tool on the pdns container hangs as well. Like pdnsutil list-all-zones hangs and the prompt never returns.
Even with debug logging, there is no error. It shows the incoming request and that it is processing, but no answer is returned.

First I thought it is related to the pid clashing issue for containers that is mentioned in the docs. But it does not matter what I use for --minimum-pid, the issue persisted.

The issue goes away, if I send a dns query to PDNS for a valid record/zone before lightningstream is started. So the command in my lightingstream container is a shell script, which first sends a DNS query to the pdns container and then starts lightning stream.

It still looks somewhat like a locking issue, but the question is what is different internally how pdns auth server accesses the LMDB backend when there was a dns request to pdns before lightningstream is started vs when there was no request?

It runs stable with no issues so far with the workaround.

Do you have any input on what could be going on here?

Configurations listed below.

My pdns.conf looks like this.

pdns@pdns-ss-0:/$ cat /etc/powerdns/pdns.conf
primary=yes
allow-notify-from=0.0.0.0
allow-axfr-ips=127.0.0.1
api=yes
api-key=secret
config-dir=/etc/powerdns
default-soa-content=a.misconfigured.dns.server.invalid hostmaster.@ 0 10800 3600 604800 3600
default-ttl=3600
default-ksk-algorithm=ed25519
default-zsk-algorithm=ed25519
include-dir=/etc/powerdns/pdns.d
load-modules=liblmdbbackend.so
launch=lmdb
lmdb-filename=/var/lib/powerdns/pdns.lmdb
lmdb-shards=1
lmdb-sync-mode=nometasync
lmdb-schema-version=5
lmdb-random-ids=yes
lmdb-map-size=1000
lmdb-flag-deleted=yes
lmdb-lightning-stream=yes
local-address=0.0.0.0,::
log-dns-details=yes
log-dns-queries=yes
log-timestamp=yes
loglevel=7
loglevel-show=yes
query-logging=yes
resolver=1.1.1.1
# server-id
version-string=anonymous
webserver=yes
webserver-allow-from=127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,192.0.0.0/24
webserver-hash-plaintext-credentials=yes
webserver-loglevel=detailed
webserver-address=0.0.0.0
webserver-port=8081
webserver-password=secret2
zone-cache-refresh-interval=0
zone-metadata-cache-ttl=0

My lightningstream.yaml looks like this.

/app $ cat /lightningstream.yaml
instance: lmdbsync
storage_poll_interval: 10s
lmdb_poll_interval: 10s
storage_force_snapshot_interval: 4h

lmdbs:
  main:
    path: /var/lib/powerdns/pdns.lmdb
    schema_tracks_changes: true
    options:
      no_subdir: true
      create: false
  shard:
    path: /var/lib/powerdns/pdns.lmdb-0
    schema_tracks_changes: true
    options:
      no_subdir: true
      create: false

storage:
  type: s3
  options:
    access_key: pdns
    secret_key: pdns
    bucket: lightningstream
    endpoint_url: http://10.0.3.194:9000
    create_bucket: true
  cleanup:
    enabled: true
    interval: 15m
    must_keep_interval: 24h
    remove_old_instances_interval: 168h

http:
  address: ":8500"

log:
  level: info
  format: human
  timestamp: short

My shell script with the workaround:

/app $ cat start.sh
#!/bin/sh

if [ ${PDNS_LSTREAM_SLEEP}_ == _ ]; then
    echo "no sleep time set"
else
    sleep ${PDNS_LSTREAM_SLEEP}
fi

if [ ${PDNS_LSTREAM_DNS_SERVER}_ == _ ]; then
    echo "no DNS server set"
else
    eval DNS=\$$PDNS_LSTREAM_DNS_SERVER
    echo "ServiceIP: $DNS"
    echo "PodIP: $PDNS_POD_IP"
    dig @$PDNS_POD_IP $PDNS_LSTREAM_DOMAIN $PDNS_QUERY_TYPE
fi

/app/lightningstream --config /app/lightningstream.yaml --minimum-pid 200 --instance ${HOSTNAME}-lstream sync

stats: optionally also show how many entries are deleted entries

It would be very helpful if the stats command can also output how many entries per DBI are marked as deleted.

Since this requires a full scan of the LMDB, this should be disabled by default and require a flag like --full.

As a workaround, you can find the number of deleted entries with something like:

lightningstream snapshots dump ... | grep flags=01 | wc -l

docs: clarity on LIST operations

The docs talk of use_update_marker to reduce S3 bills, but the obvious question here is what is the frequency of the LIST calls in the first place. The docs are silent on that ?

The different commercial S3 providers provide different levels of "free" LIST calls and so it would be nice to have an accurate guess at at which point people would exceed "free" and have a good idea of likely S3 bills.

Waiting for initial receiver listing: "file does not exist" error

When starting up a fresh docker compose setup, LS hangs on this error:

lightningstream-sync1-1  | level=info msg="[main          ] Waiting for initial receiver listing" 
                           db=main error="file does not exist" instance=instance-1

I suspect that Simpleblob is returning the "file does not exist" instead of an empty listing.

enhancement: encryption

From a cursory glance at the docs there currently appears to be no encryption support.

If lightningstream is used in a cloud scenario (vs self-hosted MiniIO) then it could be attractive to encrypt contents prior to upload to the cloud.

With modern algorithms I can't imagine this would add too much overhead ?

docs: add section for S3 migration ?

Please consider enhancing the docs by adding a section where you describe the process for migrating between S3 stores.

For example, say you start on MinIO and you wish to migrate to AWS S3, or vice-versa or any other number of examples.

It is not immediately clear how this would be safely done ?

Client side encryption

Hi,

we love this. But providing a geo-redundant s3 bucket in our infrastructure would pretty much replace the complexity we tried to remove.

And hosting our data in a public cloud isn't really an option.

Therefore I would like to throw out the idea to implement some client side encryption for the data stored in s3. Just a shared key symmetric encryption would be plenty goo.

docs: detail on integrity checks

Looking through the codebase, it looks like lightningstream does not use the Content-MD5 header in S3-compatible APIs in order to ensure end-to-end integrity of blobs uploaded.

It would be nice to have a couple of paragraphs in the docs as to how lightningstream approaches this.

Newer LMDB binary format are not supported

When testing lightningstream with my own project written in JS (See kriszyp/lmdb-js#267) I noticed that the files generated are not readable by lightningstream.

Apparently the LMDB libs were updated and new binary formats introduced. And lmdb-go is still using the old one. (kriszyp/lmdb-js#273)
Thankfully lmdb-js can fall-back to v1, but I wonder if it would be good to go with the new format here.

I guess this is more of a lmdb-go issue. Perhaps I should raise it there?

PS: In the first issue I posted I posted some code. Do you see any serious issues with it? It seems to work, but I don't have the feeling that lightningstream is yet much used outside the pdns project...

receiver-only: need s3 write permissions ?

Hi,
I'm trying to build a 1master - multiple read only slaves setup,

for security reasons, I would like to set the s3 access to readonly on the slaves nodes,but it don't seem to work,
I got a "access denied".

Setting write permissions is working, as they are creating shards on the s3
Is is expected for receiver only nodes ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.