java - "Negating" a String gives unexpected behaviour -
i playing around string , constructor , noticed behaviour can't explain.
i created following method
public static string negate(string s) { byte[] b = s.getbytes(); (int = 0; < b.length; i++) { b[i] = (byte)(~b[i] + 1); } system.out.println(arrays.tostring(b)); return new string(b); } which 2's complement on each byte , returns new string that. when calling like
system.out.println(negate("hello")); i got output of
[-72, -101, -108, -108, -111] ����� which guess fine, since there no negative ascii values.
when nested calls so
system.out.println(negate(negate("hello"))); my output this
[-72, -101, -108, -108, -111] [17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67] acacacacac // 5 groups of 3 characters (1 ctrl-char , "ac") i expected output match input string "hello", instead got this. why? happens every other input string. after nesting, each single character input becomes ac.
i went farther , created method same thing, raw byte arrays
public static byte[] n(byte[] b) { (int = 0; < b.length; i++) { b[i] = (byte)(~b[i] + 1); } system.out.println(arrays.tostring(b)); return b; } here output expected.
system.out.println(new string(n(n("hello".getbytes())))); i get
[-72, -101, -108, -108, -111] [72, 101, 108, 108, 111] hello so guess has way strings created, since happened when called negate instance got negative bytes?
i walked down class tree @ internal classes couldn't find behaviour comes from.
also in docs of string there's following paragraph might explanation:
the behavior of constructor when given bytes not valid in default charset unspecified
can tell me why it's , happening here?
the issue you're taking inverted bytes , trying interpret them valid byte stream in default character set (remember, characters not bytes). string constructor docs quoted tell you, result unspecified, , involves error-correction, dropping invalid values, etc., etc. naturally, then, it's lossy process, , reversing not original string.
if bytes , double-negate them without converting intermediate bytes string, you'll original result.
this example demonstrates lossy nature of new string(/*invalid bytes*/):
string s = "hello"; byte[] b = s.getbytes(); (int = 0; < b.length; i++) { b[i] = (byte)(~b[i] + 1); } // show negated bytes system.out.println(arrays.tostring(b)); string s2 = new string(b); // show bytes of string constructed them; note they're not same system.out.println(arrays.tostring(s2.getbytes())); on system, believe defaults utf-8, get:
[-72, -101, -108, -108, -111] [-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]
note happened when took invalid byte stream, made string out of it, , got bytes of string.
Comments
Post a Comment