I have a toy framework that attempts re-registration (but uses a 20sec failover timeout). If I attempt to re-register once the frameworkID has been marked as completed, the following infinite loop occurs but my program does not terminate:
I0218 21:17:06.968481 23437 main.go:220] Attempting to resume with FrameworkID = 20150219-031126-177048842-5050-1714-0000
I0218 21:17:06.968760 23437 main.go:223] &FrameworkInfo{User:*,Name:*Test Framework (Go),Id:&FrameworkID{Value:*20150219-031126-177048842-5050-1714-0000,XXX_unrecognized:[],},FailoverTimeout:*20,Checkpoint:*true,Role:nil,Hostname:nil,Principal:nil,WebuiUrl:nil,XXX_unrecognized:[],}
I0218 21:17:06.970400 23437 scheduler.go:179] Initializing mesos scheduler driver
I0218 21:17:06.970699 23437 scheduler.go:449] Starting the scheduler driver...
I0218 21:17:07.971133 23437 scheduler.go:490] Mesos scheduler driver started with PID= scheduler(1)@10.141.141.1:33160
I0218 21:17:07.971243 23437 scheduler.go:597] Scheduler driver running. Waiting to be stopped.
I0218 21:17:07.978864 23437 scheduler.go:886] Aborting driver, got error ' Completed framework attempted to re-register '
I0218 21:17:07.978904 23437 scheduler.go:649] Aborting framework [20150219-031126-177048842-5050-1714-0000]
I0218 21:17:07.978928 23437 scheduler.go:655] Ignoring Abort, master is disconnected.
I0218 21:17:07.978959 23437 main.go:187] Scheduler received error: Completed framework attempted to re-register
I0218 21:17:09.189569 23437 scheduler.go:886] Aborting driver, got error ' Completed framework attempted to re-register '
I0218 21:17:09.189602 23437 scheduler.go:649] Aborting framework [20150219-031126-177048842-5050-1714-0000]
I0218 21:17:09.189616 23437 scheduler.go:655] Ignoring Abort, master is disconnected.
...forever...
The key here is this message "Ignoring Abort, master is disconnected". In situ:
func (driver *MesosSchedulerDriver) Abort() (mesos.Status, error) {
...(snip)...
if !driver.connected {
log.Infoln("Ignoring Abort, master is disconnected.")
return driver.Status(), fmt.Errorf("Unable to Abort, driver not connected.")
}
_, err := driver.Stop(true)
stat := mesos.Status_DRIVER_ABORTED
driver.setStatus(stat)
return stat, err
}
I don't think I understand the point of the if-block in this code. Any abort should probably just kill the driver, no?