Online / 5 & 6 February 2022


Messing with unicode

A few possible attacks with unicode

Let's look at a few 'tricks' with unicode that can make a program look like it's doing (or not doing, for that matter) something it doesn't. Based on the findings in a recent publication, these are well worth being aware of; both from a security point of view and for simply being on your guard against friends who may be trying to pull a prank on you :-D.

These tricks are well suited for trojan attacks as it can be difficult to detect even with a manual code review thanks to aspects of unicode like bidirectional (bidi) control characters.

The talk is based on some of the possibilites described in the paper "Trojan Source: Invisible vulnerabilities" by Nicholas Boucher and Ross Anderson of University of Cambridge. The implications of this work with regard to Python has been outlined in PEP 672.

Examples of using/abusing unicode inlude: - Look-alike characters (homoglyphs) being used to make two different functions and make calls of one function look like that of the other (eg: Cyrillic ะต and Latin e are too similar for us to distinguish easily). - Use bidi control characters to make a part of the appear to be present when it's actually part of a comment. - Classic trick of naming files so that even an .exe file can look like a .pdf. - Use of invisible characters to make strings that look same when they aren't.


Julin Shaji