java - "Negating" a String gives unexpected behaviour -


i playing around string , constructor , noticed behaviour can't explain.

i created following method

public static string negate(string s) {     byte[] b = s.getbytes();     (int = 0; < b.length; i++) {         b[i] = (byte)(~b[i] + 1);     }     system.out.println(arrays.tostring(b));     return new string(b); } 

which 2's complement on each byte , returns new string that. when calling like

system.out.println(negate("hello")); 

i got output of

[-72, -101, -108, -108, -111] ����� 

which guess fine, since there no negative ascii values.
when nested calls so

system.out.println(negate(negate("hello"))); 

my output this

[-72, -101, -108, -108, -111] [17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67] acacacacac // 5 groups of 3 characters (1 ctrl-char , "ac") 

i expected output match input string "hello", instead got this. why? happens every other input string. after nesting, each single character input becomes ac.

i went farther , created method same thing, raw byte arrays

public static byte[] n(byte[] b) {     (int = 0; < b.length; i++) {         b[i] = (byte)(~b[i] + 1);     }     system.out.println(arrays.tostring(b));     return b; } 

here output expected.

system.out.println(new string(n(n("hello".getbytes())))); 

i get

[-72, -101, -108, -108, -111] [72, 101, 108, 108, 111] hello 

so guess has way strings created, since happened when called negate instance got negative bytes?

i walked down class tree @ internal classes couldn't find behaviour comes from.

also in docs of string there's following paragraph might explanation:

the behavior of constructor when given bytes not valid in default charset unspecified

can tell me why it's , happening here?

the issue you're taking inverted bytes , trying interpret them valid byte stream in default character set (remember, characters not bytes). string constructor docs quoted tell you, result unspecified, , involves error-correction, dropping invalid values, etc., etc. naturally, then, it's lossy process, , reversing not original string.

if bytes , double-negate them without converting intermediate bytes string, you'll original result.

this example demonstrates lossy nature of new string(/*invalid bytes*/):

string s = "hello"; byte[] b = s.getbytes(); (int = 0; < b.length; i++) {     b[i] = (byte)(~b[i] + 1); } // show negated bytes system.out.println(arrays.tostring(b)); string s2 = new string(b); // show bytes of string constructed them; note they're not same system.out.println(arrays.tostring(s2.getbytes())); 

on system, believe defaults utf-8, get:

 [-72, -101, -108, -108, -111] [-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67] 

note happened when took invalid byte stream, made string out of it, , got bytes of string.


Comments

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -

ios - Change Storyboard View using Seague -