vim - Same visible character but different bytes -

i have 2 files each same (hindi) word copied word each file different sources. while words both sources alike visually, bytes different. files here , here. not sure original encoding in both cases opening file utf-8 displays characters correctly.

it interesting when unique using uniq utility 1 entry returned when place them in file , did sort u in vim, both entries.

please explain what's going on.

update:

if not want open links, python literals: '\u091c\u0941\u095c\n' , '\u091c\u0941\u0921\u093c\n' , word looks like

095c devanagari letter dddha: ड़
0921 devanagari letter dda: ड
093c devanagari sign nukta (dot below character): ़

you can see in python equivalent (python 3 syntax here):

import unicodedata unicodedata.normalize('nfc', '\u0921\u093c') == unicodedata.normalize('nfc', '\u095c') # => true

you should able use :%!uconv -x any-nfc (with icu installed), or :%!ruby -ne 'puts $_.unicode_normalize(:nfc)' (with ruby installed) normalise file.

Search This Blog

Brent

vim - Same visible character but different bytes -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -