• bar@mysql.com's avatar
    Bug#22638 SOUNDEX broken for international characters · 4b3826ba
    bar@mysql.com authored
    Problem: SOUNDEX returned an invalid string for international
    characters in multi-byte character sets.
    For example: for a Chinese/Japanese 3-byte long character
    _utf8 0xE99885 it took only the very first byte 0xE9,
    put it into the outout string and then appended with three 
    DIGIT ZERO characters, so the result was 0xE9303030 - which
    is an invalide utf8 string.
    Fix: make SOUNDEX() multi-byte aware and - put only complete
    characters into result, thus return only valid strings.
    This patch also makes SOUNDEX() compatible with UCS2.
    4b3826ba
item_strfunc.cc 85.9 KB