[PATCH] WL#3704 mgmapi timeouts: Return sane errors for timeout in mgmapi

In ndb_mgm_call, add checks for expired timeout in (Input|Output)Stream.
In case of timeout, we set NdbMgmHandle->last_error and return NULL.

In api calls not using ndb_mgm_call (or using it in conjunction with
own IO), they'll need to check for timeouts manually. Macros are provided
to do this.

Add ndb_mgm_disconnect_quiet(h) to disconnect without checking errors
(so we don't clobber NdbMgmHandle->last_error). This helps us provide
the *consistent* semantic that on timeout we leave the NdbMgmHandle
*disconnected*. We check for this in testMgm.

Change CHECK_REPLY in mgmapi to also check for set error in handle->last_error
This will pick up the ETIMEDOUT errors and return them to client (through
returning correct failure code for API call and setting NdbMgmHandle error).
Applications written to MGMAPI before this patch will behave as before,
and even hopefully check get_last_error and report the error back to the
end user!

Adding the last CHECK_TIMEDOUT_RET and delete in ndb_mgm_call() we
slightly change behaviour of mgmapi. Previously, if disconnect
midway through a reply, where there were only optional parameters left,
we'd get a Properties object from ndb_mgm_call() containing NULLs for
the optional parameters, leading to interesting error messages. This
enables the returning of the *real* message and actually improves the API
without breaking compatibility.

ndb_mgm_start_signallog
ndb_mgm_stop_signallog
ndb_mgm_log_signals
ndb_mgm_set_trace
ndb_mgm_insert_error
ndb_mgm_set_int64_parameter [1]
ndb_mgm_set_string_parameter [1]
ndb_mgm_purge_stale_sessions [2]
 - return error code on error during ndb_mgm_call

TODO:
ndb_mgm_report_event [2]

[1] marked for removal, unused.
[2] return codes incorrect in CHECK_HANDLE/CONNECTED. undocumented.


Server side:
 in Services (per session) add macro for injecting timeout error
 (just waiting 10 seconds before continuing... it does work!)

 We inject these errors in a number of critical places - including
 the tricky api functions that don't just use ndb_mgm_call but do
 their own thing (get_config, get_status and friends)

ATRT:
 Expand testMgm to add timout tests for API. Fully automated.
 *THEORETICALLY* timing dependent - an ultra-slow network will
 cause problems and "fake" failures... I welcome other solutions.

 Tests aren't exhaustive, but cover the generics and the tricky bits.
 Also test some calling semantics (incl disconnected on error).

 It is encouraged to add *more* mgmapi tests, not less :)

InputStream:
  Fix where timedout error is set


Index: ndb-work/storage/ndb/src/mgmapi/mgmapi.cpp
===================================================================
parent 64361ee3
......@@ -59,6 +59,7 @@ SocketInputStream::gets(char * buf, int bufLen) {
if(res == 0)
{
m_timedout= true;
buf[0]=0;
return buf;
}
......@@ -67,7 +68,6 @@ SocketInputStream::gets(char * buf, int bufLen) {
if(res == -1)
{
m_timedout= true;
return 0;
}
......
This diff is collapsed.
......@@ -68,6 +68,8 @@ extern "C" {
*/
NDB_SOCKET_TYPE ndb_mgm_convert_to_transporter(NdbMgmHandle *handle);
int ndb_mgm_disconnect_quiet(NdbMgmHandle handle);
#ifdef __cplusplus
}
#endif
......
......@@ -290,6 +290,8 @@ struct PurgeStruct
#define ERROR_INSERTED(x) (g_errorInsert == x || m_errorInsert == x)
#define SLEEP_ERROR_INSERTED(x) if(ERROR_INSERTED(x)){NdbSleep_SecSleep(10);}
MgmApiSession::MgmApiSession(class MgmtSrvr & mgm, NDB_SOCKET_TYPE sock, Uint64 session_id)
: SocketServer::Session(sock), m_mgmsrv(mgm)
{
......@@ -599,13 +601,23 @@ MgmApiSession::getConfig(Parser_t::Context &,
char *tmp_str = (char *) malloc(base64_needed_encoded_length(src.length()));
(void) base64_encode(src.get_data(), src.length(), tmp_str);
SLEEP_ERROR_INSERTED(1);
m_output->println("get config reply");
m_output->println("result: Ok");
m_output->println("Content-Length: %d", strlen(tmp_str));
m_output->println("Content-Type: ndbconfig/octet-stream");
SLEEP_ERROR_INSERTED(2);
m_output->println("Content-Transfer-Encoding: base64");
m_output->println("");
if(ERROR_INSERTED(3))
{
int l= strlen(tmp_str);
tmp_str[l/2]='\0';
m_output->println(tmp_str);
NdbSleep_SecSleep(10);
}
m_output->println(tmp_str);
free(tmp_str);
......@@ -748,6 +760,7 @@ MgmApiSession::endSession(Parser<MgmApiSession>::Context &,
m_allocated_resources= new MgmtSrvr::Allocated_resources(m_mgmsrv);
SLEEP_ERROR_INSERTED(1);
m_output->println("end session reply");
}
......@@ -998,12 +1011,16 @@ MgmApiSession::getStatus(Parser<MgmApiSession>::Context &,
while(m_mgmsrv.getNextNodeId(&nodeId, NDB_MGM_NODE_TYPE_MGM)){
noOfNodes++;
}
SLEEP_ERROR_INSERTED(1);
m_output->println("node status");
SLEEP_ERROR_INSERTED(2);
m_output->println("nodes: %d", noOfNodes);
SLEEP_ERROR_INSERTED(3);
printNodeStatus(m_output, m_mgmsrv, NDB_MGM_NODE_TYPE_NDB);
printNodeStatus(m_output, m_mgmsrv, NDB_MGM_NODE_TYPE_MGM);
SLEEP_ERROR_INSERTED(4);
printNodeStatus(m_output, m_mgmsrv, NDB_MGM_NODE_TYPE_API);
SLEEP_ERROR_INSERTED(5);
nodeId = 0;
......@@ -1118,8 +1135,10 @@ MgmApiSession::enterSingleUser(Parser<MgmApiSession>::Context &,
Properties const &args) {
int stopped = 0;
Uint32 nodeId = 0;
int result= 0;
args.get("nodeId", &nodeId);
int result = m_mgmsrv.enterSingleUser(&stopped, nodeId);
result = m_mgmsrv.enterSingleUser(&stopped, nodeId);
m_output->println("enter single user reply");
if(result != 0) {
m_output->println("result: %s", get_error_text(result));
......@@ -1616,12 +1635,11 @@ void
MgmApiSession::check_connection(Parser_t::Context &ctx,
const class Properties &args)
{
if(ERROR_INSERTED(1))
{
NdbSleep_SecSleep(10);
}
SLEEP_ERROR_INSERTED(1);
m_output->println("check connection reply");
SLEEP_ERROR_INSERTED(2);
m_output->println("result: Ok");
SLEEP_ERROR_INSERTED(3);
m_output->println("");
}
......@@ -1642,10 +1660,7 @@ MgmApiSession::get_mgmd_nodeid(Parser_t::Context &ctx,
{
m_output->println("get mgmd nodeid reply");
m_output->println("nodeid:%u",m_mgmsrv.getOwnNodeId());
if(ERROR_INSERTED(1))
{
NdbSleep_SecSleep(10);
}
SLEEP_ERROR_INSERTED(1);
m_output->println("");
}
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment