android - How To Detect Is Text Human Readable? -
i wondering if there's way tell given text human readable. human readable, mean: has meanings, format article written somebody, or @ least generated software translator intended read human.
here's background story: making app allows user upload short text database. @ stage of deployment noticed user uploaded corrupted text due problem encoding. problem fixed later, leaves me wonder if there's way pick non human readable text before serving text users.
any advice appreciated. scope might large include other languages, @ moment let's limit discussion english only.
you can try language identification tool, or similar.
basically have count characters, or groups of character (character n-grams), , compare distribution of letters of text submitted distribution of letters of collection of texts written in english. (make sure such collection of texts representative of expected input).
in continuity of n-gram approach might want try dictionary based approach , check presence of 'stop words' (e.g. 'the', 'a', 'an', 'of') in input text.
Comments
Post a Comment