• unknown's avatar
    Bug#22638 SOUNDEX broken for international characters · b5cc4fa6
    unknown authored
    Problem: SOUNDEX returned an invalid string for international
    characters in multi-byte character sets.
    For example: for a Chinese/Japanese 3-byte long character
    _utf8 0xE99885 it took only the very first byte 0xE9,
    put it into the outout string and then appended with three 
    DIGIT ZERO characters, so the result was 0xE9303030 - which
    is an invalide utf8 string.
    Fix: make SOUNDEX() multi-byte aware and - put only complete
    characters into result, thus return only valid strings.
    This patch also makes SOUNDEX() compatible with UCS2.
    
    
    mysql-test/r/ctype_ucs.result:
      Adding tests
    mysql-test/r/ctype_utf8.result:
      Adding tests
    mysql-test/t/ctype_ucs.test:
      Adding tests
    mysql-test/t/ctype_utf8.test:
      Adding tests
    sql/item_strfunc.cc:
      Making soundex multi-byte aware.
    b5cc4fa6
item_strfunc.cc 85.9 KB