Problem description
When I work in wired network, I config the hostName
of master.ini
to the wired network IP, then I switch network to wireless network, the IP is changed, then I reconfig the hostName
to wireless network IP , start tubemq master, It will infinite blocked, the log only output util following line:
(main) [INFO - com.tencent.tubemq.server.master.bdbstore.DefaultBdbStoreService.initEnvConfig(DefaultBdbStoreService.java:996)] ADD HELP HOST
// It's will blocked, no more output
(optional) Reproducer snippet
Analysis and debug the source code, find that, if machine ip is changed, RepUtils.ExceptionAwareCountDownLatch#awaitOrException()
will bloked:
public boolean awaitOrException(long timeout, TimeUnit unit)
throws InterruptedException,
DatabaseException {
// blocked
boolean done = super.await(timeout, unit);
...
}
Final, track the source code to RepNode#run()
:
public void run() {
...
if (nameIdPair.hasNullId() || !nodeType.isElectable()) {
queryGroupForMembership();
} else {
// here blocked
elections.initiateElection(group, electionQuorumPolicy);
...
}
...
}
Next Elections#initiateElection()
:
public synchronized void initiateElection(RepGroupImpl newGroup, QuorumPolicy quorumPolicy, int maxRetries) {
RetryPredicate retryPredicate =
new RetryPredicate(repNode, maxRetries, countDownLatch);
electionThread = new ElectionThread(quorumPolicy, retryPredicate,
envImpl,
(envImpl == null) ? null :
envImpl.getName());
electionThread.start();
try {
// here blocked
/* Wait until we hear of some "new" election result */
countDownLatch.await();
...
}
}
Next Elections.ElectionThread#run()
:
public void run() {
...
winningProposal =
proposer.issueProposal(quorumPolicy, retryPredicate);
...
}
Next Proposer#issueProposal()
:
public WinningProposal issueProposal(QuorumPolicy quorumPolicy, RetryPredicate retryPredicate) {
while (retryPredicate.retry()) {
try {
final Proposal proposal = nextProposal();
// Keep retrying
final Phase1Result result1 = phase1(quorumPolicy, proposal);
if (result1 == null) {
continue;
}
...
}
}
}
Note than the phase1(quorumPolicy, proposal)
is keep retrying, because this method always return null.
Next Proposer#phase1()
:
private Phase1Result phase1(QuorumPolicy quorumPolicy, Proposal proposal) {
...
Phase1Result result = tallyPhase1Results(proposal, compService);
// always false
if (haveQuorum(quorumPolicy, result.promisories.size())) {
return result;
}
phase1NoQuorum.increment();
// always return null
return null;
}
Next Proposer#tallyPhase1Results()
:
private Phase1Result tallyPhase1Results(Proposal currentProposal, final FutureTrackingCompService<MessageExchange> compService) {
...
new Utils.WithFutureExceptionHandler<MessageExchange>
(compService, 2 * elections.getProtocol().getReadTimeout(),
TimeUnit.MILLISECONDS, logger, elections.getRepImpl(), null) {
...
}
Focus on MessageExchange
, this is a task:
public void run() {
messageExchange();
}
Next TextProtocol.MessageExchange#messageExchange()
:
public void messageExchange() {
DataChannel dataChannel = null;
BufferedReader in = null;
PrintWriter out = null;
try {
dataChannel =
// when in wireless network use the IP of wired network, the connection will fail
// will throw java.net.ConnectException: Connection refused: no further information
channelFactory.connect(
target,
new ConnectOptions().
setTcpNoDelay(true).
setOpenTimeout(openTimeoutMs).
setReadTimeout(readTimeoutMs).
setBlocking(true).
setReuseAddr(true));
...
} catch (java.net.SocketTimeoutException e){
this.exception = e;
} catch (SocketException e) {
this.exception = e;
} catch (IOException e) {
this.exception = e;
} catch (TextProtocol.InvalidMessageException ime) {
...
this.exception = ime;
} catch (ServiceConnectFailedException e) {
this.exception = e;
} catch (Exception e) {
...
} finally {
Utils.cleanup(logger, repImpl, formatter, dataChannel, in, out);
}
}
Here connection fail, throw java.net.ConnectException: Connection refused: no further information
, It catch exception, no any error messages.
Although I reconfig the hostName
to wireless network IP, but the target
still use the wired network IP, I guess it may use the meta of bdbEnvHome
.
(optional) Suggestions for an imporvement
We should introduce some mechanisms for inspection.In addition I feel the method of
sleepycat(Berkeley DB) lib seems unreasonable.