Git Product home page Git Product logo

tableflip's Introduction

Graceful process restarts in Go

It is sometimes useful to update the running code and / or configuration of a network service, without disrupting existing connections. Usually, this is achieved by starting a new process, somehow transferring clients to it and then exiting the old process.

There are many ways to implement graceful upgrades. They vary wildly in the trade-offs they make, and how much control they afford the user. This library has the following goals:

  • No old code keeps running after a successful upgrade
  • The new process has a grace period for performing initialisation
  • Crashing during initialisation is OK
  • Only a single upgrade is ever run in parallel

tableflip works on Linux and macOS.

Using the library

upg, _ := tableflip.New(tableflip.Options{})
defer upg.Stop()

go func() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGHUP)
	for range sig {
		upg.Upgrade()
	}
}()

// Listen must be called before Ready
ln, _ := upg.Listen("tcp", "localhost:8080")
defer ln.Close()

go http.Serve(ln, nil)

if err := upg.Ready(); err != nil {
	panic(err)
}

<-upg.Exit()

Please see the more elaborate graceful shutdown with net/http example.

Integration with systemd

[Unit]
Description=Service using tableflip

[Service]
ExecStart=/path/to/binary -some-flag /path/to/pid-file
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/path/to/pid-file

See the documentation as well.

The logs of a process using tableflip may go missing due to a bug in journald, which has been fixed by systemd v244 release. If you are running an older version of systemd, you can work around this by logging directly to journald, for example by using go-systemd/journal and looking for the $JOURNAL_STREAM environment variable.

tableflip's People

Contributors

appleboy avatar arthurfabre avatar cbroglie avatar fasmide avatar gableroux avatar hazcod avatar hunts avatar jdesgats avatar kohenkatz avatar lmb avatar matthewmueller avatar nolith avatar pascaldekloe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tableflip's Issues

Keep foreground control with signal propagation after successful upgrade

First off: I'm loving tableflip, I use it to livereload a dev server for my homegrown static site generator server https://github.com/jschaf/b2/blob/master/cmd/server/server.go#L121.

This is more of question than an issue. If this isn't a good spot for it, I'm good with closing it.

After a successful tableflip.Upgrade, I'd like the new process to be the foreground process in a terminal or shell. The reason I want this is so I can forward SIGINT via ctrl-c in the terminal. Using systemd is a bit heavyweight for my simple local development use-case.

What happens now

  1. Start server in terminal with go run ./cmd/server.
  2. Server runs in foreground and output goes to stdout and stderr.
  3. Upgrade server which starts a new server process with a different PID in the background.
  4. Terminal shows prompt since foreground process exited.
  5. New server still writes output to terminal.
  6. Old server exits, leaving the new process parentless, so new process reparents to the PID 1 (systemd --user in my case).

What I'd like to happen

In step 3, the new server should keep running in the foreground and continue writing stdout and stderr of the new process.

I'm not quite sure how to go about this. Would something like the following work?

  • After successful tableflip.Upgrade
  • Keep old process alive and forward signals to all child processes in the process group (PGID).

Alternately, maybe get the new PID from the PID file and do some exec magic?

supervisord

supervisord requires non-daemonized processes, is there a way to work out?

Allow multiple back-to-back graceful upgrades

I think tableflip is a great tool. I am experimenting the tool with a TCP server which handles long-lived TCP connections. When an upgrade is started, the old process needs to wait until all the existing TCP connections are done before the old process can exit. It can take a long time to drain the long-lived connections. As a result, a further upgrade is blocked until the old process exits. I wonder if we can have a way to relax this constraint?

Anyway to refresh the environment?

This is a bit of a stretch, but is it possible to somehow refresh the environment in between SIGHUPs? It seems like you'd need to pass a config file in, but I'm wondering if there's some magic to potentially refresh os.Environ() in between reloads.

For context, I'm looking to setup something like heroku where you have heroku config:set key value, that would restart the app with a new environment, but it doesn't seem like there's anyway to do that without some special programming from the binary itself (e.g. load from this file)

sometimes got error like "listen tcp xxx: bind: address already in use" when Upgrade

I think the key of using tableflip is to replace net.Listen with upg.Fds.Listen, so that when Upgrade the listening socket will be inherited by child.

But I've got errors like below when Upgrade after the application has run for quite long in production environment.

{"level":"error","msg":"ListenAndServe err can't create new listener: listen tcp 0.0.0.0:8902: bind: address already in use","time":"2019-05-31T23:05:14+08:00"}

It seems that the parent has opened a listener on port 8902 but the child doesn't inherit that listener.
Any possible reason?

[Bug] Connections not closed on inherited net.Conn

After deep review of this great library, I come with a problem on inherited connections, let's me explain with a example.

I figure that in golang, to access the file descriptor of a net.Conn or viceversa a fd dup is in place, and has it's explanation, This library has 2 scenarios:

  1. Inherit a listener: This is done via Fds.Listen and creates a duplicate file descriptor on the process, one gets used in the listener, another one is added to be passed to new process. This seems ok, no really large number of listeners in one process in real life.
  2. Inherit a connection: This is done via Fds.Conn and if the connection is not present, not one is created, but when inherited the FD is duplicated, one used in the net.Conn and another added to used map, and here is the problem, when the inherited connection is closed the socket remain open, in the example below that only work with one connection, the conn get hang after update.

The problems seems to be fds.go:300 replace f.used[key] = file by file.Close(), but PacketConn seems to have the same problem. For large number of connections, the duplication of unused file descriptors, can be a problem too.

The sample code, that listen on 8080, and writes to the socket every second, and closes it after 30s. If the connection is fresh the connection is closed and client receives TCP close, but if inherited the connection will hang.

package main

import (
	"flag"
	"fmt"
	"log"
	"net"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/cloudflare/tableflip"
)

var stop = make(chan bool)
var done = make(chan bool)

func handleConn(conn net.Conn, upg *tableflip.Upgrader) {
	ticker := time.NewTicker(time.Second)
	timer := time.NewTimer(30 * time.Second)

	for {
		select {
		case <-stop:
			log.Printf("Updating...")
			ticker.Stop()
			timer.Stop()
			c := conn.(tableflip.Conn)
			upg.Fds.AddConn("tcp", "0", c)
			conn.Close()
			log.Printf("Done...")
			done <- true
			return

		case t := <-ticker.C:
			log.Printf("Tick: %+v", t)
			conn.SetDeadline(time.Now().Add(time.Second))
			conn.Write([]byte(fmt.Sprintf("It is not a mistake to think you can solve any major problems just with potatoes. [%d]\n", os.Getpid())))

		case t := <-timer.C:
			log.Printf("Clossing: %+v", t)
			ticker.Stop()
			timer.Stop()
			conn.Close()
			log.Printf("Closed conn")
			return
		}
	}

}

func main() {
	var (
		listenAddr = flag.String("listen", "localhost:8080", "`Address` to listen on")
		pidFile    = flag.String("pid-file", "", "`Path` to pid file")
	)

	flag.Parse()
	log.SetPrefix(fmt.Sprintf("%d ", os.Getpid()))

	upg, err := tableflip.New(tableflip.Options{
		PIDFile: *pidFile,
	})
	if err != nil {
		panic(err)
	}
	defer upg.Stop()

	// Do an upgrade on SIGHUP
	go func() {
		sig := make(chan os.Signal, 1)
		signal.Notify(sig, syscall.SIGHUP)
		for range sig {
			stop <- true
			log.Println("stopping service")
			<-done
			err := upg.Upgrade()
			if err != nil {
				log.Println("upgrade failed:", err)
			}
		}
	}()

	conn, err := upg.Fds.Conn("tcp", "0")
	if err != nil {
		log.Fatalln("Can't get conn:", err)
	}
	if conn != nil {
		log.Printf("Inherited conn: %+v", conn.RemoteAddr())
		go handleConn(conn, upg)
	}

	ln, err := upg.Fds.Listen("tcp", *listenAddr)
	if err != nil {
		log.Fatalln("Can't listen:", err)
	}

	go func() {
		defer ln.Close()

		log.Printf("listening on %s", ln.Addr())

		for {
			c, err := ln.Accept()
			if err != nil {
				log.Printf("Error on Accept: %+v", err)
				return
			}

			go handleConn(c, upg)
		}
	}()

	log.Printf("ready")
	if err := upg.Ready(); err != nil {
		panic(err)
	}
	<-upg.Exit()
	log.Printf("exiting, done, :)")
}

upg.Exit() do not effective in go routine ?

	go func(upg *tableflip.Upgrader) {
		for {
			select {
			case <-upg.Exit():
				fmt.Println("Exit111111111111111111111111111")
				break
			}
		}
	}(upg)

in this case , upg.Exit() not triggered ?

complete example

package main

import (
	"fmt"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/cloudflare/tableflip"
)

// 當前程序的版本
const version = "v0.0.1"

func main() {
	upg, err := tableflip.New(tableflip.Options{})
	if err != nil {
		panic(err)
	}
	defer upg.Stop()

	// 爲了演示方便,爲程序啓動強行加入 1s 的延時,並在日誌中附上進程 pid
	time.Sleep(time.Second)
	log.SetPrefix(fmt.Sprintf("[PID: %d] ", os.Getpid()))

	// 監聽系統的 SIGHUP 信號,以此信號觸發進程重啓
	go func() {
		sig := make(chan os.Signal, 1)
		signal.Notify(sig, syscall.SIGHUP)
		for range sig {
			// 核心的 Upgrade 調用
			err := upg.Upgrade()
			if err != nil {
				log.Println("Upgrade failed:", err)
			}
		}
	}()

	// 注意必須使用 upg.Listen 對端口進行監聽
	ln, err := upg.Listen("tcp", ":8080")
	if err != nil {
		log.Fatalln("Can't listen:", err)
	}

	// 創建一個簡單的 http server,/version 返回當前的程序版本
	mux := http.NewServeMux()
	mux.HandleFunc("/version", func(rw http.ResponseWriter, r *http.Request) {
		log.Println(version)
		rw.Write([]byte(version + "\n"))
	})
	server := http.Server{
		Handler: mux,
	}

	// 照常啓動 http server
	go func() {
		err := server.Serve(ln)
		if err != http.ErrServerClosed {
			log.Println("HTTP server:", err)
		}
	}()

	if err := upg.Ready(); err != nil {
		panic(err)
	}

	go func(upg *tableflip.Upgrader) {
		for {
			select {
			case <-upg.Exit():
				fmt.Println("Exit111111111111111111111111111")
				break
			}
		}
	}(upg)

	time.Sleep(10 * time.Hour)

	//<-upg.Exit()

}

Tests are failing for go 1.8 and go 1.10.x

I did not use Go a lot, but I found that tests are failing for go 1.8 and go 1.10.x.

go 1.8: https://travis-ci.com/GabLeRoux/tableflip/jobs/153029874

Using Go 1.5 Vendoring, not checking for Godeps
4.57s$ go get -t -v ./...
github.com/pkg/errors (download)
github.com/cloudflare/tableflip (download)
github.com/pkg/errors
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed. Retrying, 2 of 3.
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed. Retrying, 3 of 3.
github.com/GabLeRoux/tableflip
# github.com/GabLeRoux/tableflip
./fds.go:16: undefined: syscall.Conn
The command "eval go get -t -v ./... " failed 3 times.
The command "go get -t -v ./..." failed and exited with 2 during .

go 1.10.x: https://travis-ci.com/GabLeRoux/tableflip/jobs/153029876

--- FAIL: TestFdsListen (0.00s)
	fds_test.go:22: can't create new listener: listen unixgram : unknown network unixgram

I found this as part of #9 ✌️ I think it should be noted to the readme that this is only compatible with a few go versions.

[Question] Windows

Hello,

Do you know if the lib would work on Windows ? with GOOS=windows ?

Can't catch syscall.SIGINT after upgrade

I can catch syscall.SIGINT when I call upgrade(), but I can't catch syscall.SIGINT when upgrade() is called. What if I restart and shut down gracefully?

func main() {
upg, _ := tableflip.New(tableflip.Options{})
defer upg.Stop()

go func() {
	sig := make(chan os.Signal, 1)
	//
	signal.Notify(sig, syscall.SIGHUP, os.Interrupt, syscall.SIGTERM)
	for ch := range sig {
		switch ch {
		case syscall.SIGHUP:
			err := upg.Upgrade()
			if err != nil {
				log.Fatal(err)
			}
		default:
			upg.Interrupt()
		}
	}
}()

ln, err := upg.Listen("tcp", ":8080")
if err != nil {
	log.Fatalln("Can't listen:", err)
}
defer func(ln net.Listener) {
	_ = ln.Close()
}(ln)

count := 0
router := gin.Default()
router.GET("/", func(c *gin.Context) {
	time.Sleep(5 * time.Second)
	count++
	c.String(http.StatusOK, strconv.Itoa(count))
})

server := http.Server{
	Handler: router,
}
go func() {
	if err := server.Serve(ln); err != http.ErrServerClosed {
		log.Fatal("listen: ", err)
	}
}()

// Listen must be called before Ready

if err := upg.Ready(); err != nil {
	log.Fatal(err)
}

<-upg.Exit()

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
	log.Fatal("Server forced to shutdown:", err)
}
println("going to shutdown")

}

Not working with systemd as per docs

Ubuntu Xenial (16.04) is the OS of choice, the command run is service <name> restart. The systemd stops and starts the process on restart.

[Service]
User=www-data
Group=www-data
Type=simple
Restart=on-failure
RestartSec=5s
RuntimeDirectory=MA
ExecStart=/path/to/executable serve --port=4001 --pid-file=/var/run/MA/foo.pid
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/MA/foo.pid

After the trial and error the only way to do seamples upgrade is to also add ExecReload=/bin/kill -HUP $MAINPID and then issue service <name> reload.

Upgrade shutting down early for http2 connections but not http1.1

I'm running a web process on FreeBSD. Using curl to connect over http2 when I call SIGHUP on the process it shuts down and drops any connections without waiting for them to complete.

If I do the same but use the curl flag '--http1.1' the running connections complete before shutdown is called.

Any idea why http2 would not wait for the connections to complete? While http1.1 connections would?

Thank you,

Request for updating README.MD

Before anything, I'd like to thank you for this amazing repo.

I just want to say it's good to mention this fact (in readme.md) that reloading a systemd service will not update service environment vars.

Having such a systemd unit file:

[Unit]
Description=Service using tableflip

[Service]
EnvironmentFile=/path/to/config-file
ExecStart=/path/to/binary -some-flag /path/to/pid-file
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/path/to/pid-file

By updating the config-file content and executing systemctl reload service, the reloaded service will not get new/updated environment vars. The service should read its configs/envs itself.

Impossible to build for windows

Hi,

In an application I use cloudflare/tableflip lib.

I've got an issue when I try to build for windows/386:


# github.com/cloudflare/tableflip
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/env.go:13:2: cannot use syscall.CloseOnExec (type func(syscall.Handle)) as type func(int) in field value
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:36: not enough arguments in call to syscall.Syscall
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:37: undefined: syscall.SYS_FCNTL
../../../../go/pkg/mod/github.com/cloudflare/[email protected]/fds.go:344:60: undefined: syscall.F_DUPFD_CLOEXEC

Thanks

Way to share all parent connections

Hello everyone!
Is there a way to share all the active connections between processes?
I see only one way, use Fds.Conn() method, but i must know addr in parent process for this.
It may be worth adding methods that will return all parent connections and listeners?
It would be amazing.

Upgrade function documentation

The Upgrade function is documented as:

Upgrade triggers an upgrade.

It also waits for response and returns error if child process fails, which might be good to spell out in the description since the verb 'trigger' gives an impression of a function call that will complete immediately

Does tableflip work with systemd's socket activation?

Hi there, I'm wondering if this library can work alongside systemd's socket activation?

What I'm trying to do is basically use systemd's root access to listen on port 80 and pass that file descriptor into my Go application so my Go app doesn't have to use sudo.

Is this possible with tableflip? Does tableflip even make sense in this context? My thinking is that tableflip is would still be useful for doing the upgrade attempts.

Any feedback here would be greatly appreciated. Thanks!

Expose parent presence

There are situations in which is useful to detect the first invocation,
i.e. you may want to cleanup dangling unix sockets, but not during an upgrade.

We already have WaitForParent and it can be exploited to get this information, but it looks so hacky...

func isUpgrade(u *tableflip.Upgrader) bool {
	ctx, cancel := context.WithCancel(context.Background())
	// we use a canceled context because WaitForParent returns immediately, without errors, only on the parent process
	// an already expired context ensure us an immediate failure also inside the children process
	cancel()

	return u.WaitForParent(ctx) != nil
}

This can be easily implemented in Upgrader with 1 line of code + tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.