vim - Same visible character but different bytes -
i have 2 files each same (hindi) word copied word each file different sources. while words both sources alike visually, bytes different. files here , here. not sure original encoding in both cases opening file utf-8 displays characters correctly.
it interesting when unique using uniq utility 1 entry returned when place them in file , did sort u in vim, both entries.
please explain what's going on.
update:
if not want open links, python literals: '\u091c\u0941\u095c\n' , '\u091c\u0941\u0921\u093c\n' , word looks like
095cdevanagari letter dddha: ड़0921devanagari letter dda: ड093cdevanagari sign nukta(dot below character): ़
you can see in python equivalent (python 3 syntax here):
import unicodedata unicodedata.normalize('nfc', '\u0921\u093c') == unicodedata.normalize('nfc', '\u095c') # => true you should able use :%!uconv -x any-nfc (with icu installed), or :%!ruby -ne 'puts $_.unicode_normalize(:nfc)' (with ruby installed) normalise file.

Comments
Post a Comment