• malff/marcsql@weblab.(none)'s avatar
    Bug#27876 (SF with cyrillic variable name fails during execution (regression)) · 88e3abf5
    malff/marcsql@weblab.(none) authored
    The root cause of this bug is related to the function skip_rear_comments,
    in sql_lex.cc
    
    Recent code changes in skip_rear_comments changed the prototype from
    "const uchar*" to "const char*", which had an unforseen impact on this test:
      (endp[-1] < ' ')
    With unsigned characters, this code filters bytes of value [0x00 - 0x20]
    With *signed* characters, this also filters bytes of value [0x80 - 0xFF].
    
    This caused the regression reported, considering cyrillic characters in the
    parameter name to be whitespace, and truncated.
    Note that the regression is present both in 5.0 and 5.1.
    
    With this fix:
    - [0x80 - 0xFF] bytes are no longer considered whitespace.
    This alone fixes the regression.
    
    In addition, filtering [0x00 - 0x20] was found bogus and abusive,
    so that the code now filters uses my_isspace when looking for whitespace.
    
    Note that this fix is only addressing the regression affecting UTF-8
    in general, but does not address a more fundamental problem with
    skip_rear_comments: parsing a string *backwards*, starting at end[-1],
    is not safe with multi-bytes characters, so that end[-1] can confuse the
    last byte of a multi-byte characters with a characters to filter out.
    
    The only known impact of this remaining issue affects objects that have to
    meet all the conditions below:
    
    - the object is a FUNCTION / PROCEDURE / TRIGGER / EVENT / VIEW
    - the body consist of only *1* instruction, and does *not* contain a
      BEGIN-END block
    - the instruction ends, lexically, with <ident> <whitespace>* ';'?
      For example, "select <ident>;" or "return <ident>;"
    - The last character of <ident> is a multi-byte character
    - the last byte of this character is ';' '*', '/' or whitespace
    
    In this case, the body of the object will be truncated after parsing,
    and stored in an invalid format.
    
    This last issue has not been fixed in this patch, since the real fix
    will be implemented by Bug 25411 (trigger code truncated), which is caused
    by the very same code.
    The real problem is that the function skip_rear_comments is only a
    work-around, and should be removed entirely: see the proposed patch for
    bug 25411 for details.
    88e3abf5
sql_lex.h 41.1 KB