URL, URL, Little Do We Know Thee
By Razvan Peteanu for SecurityPortal
About Schemes and Men
Recently, many smiled and Microsoft got angry at a spoof of its Knowledge Base articles posted on a URL starting with "http://www.microsoft.com." Emails went around and people clicked on the link, possibly before looking closer at it. Surprised by the content, they may have checked the URL again, noticing the other "www"-like string in it and figured out it must have something to do with the real host; forwarded the email to friends and then returned to their work.
Today we will look closer at URLs and the associated security implications. "Interesting" ways of using them have been known by spammers for a while, but now the KB spoof and the February issue of Crypto-Gram have made the Internet community more aware of what URLs can do.
Although most Internet users will associate URLs with WWW addresses, or perhaps FTP, Uniform Resource Locators are more general in scope. URLs are standardized in RFC1738, and in their most generic form, they are defined as
The best-known scheme is the Common Internet, in which the <scheme> is the name of a protocol and the <scheme-specific-part> is defined as:
in which only the host part is mandatory. The ":" and "@" characters have a special meaning and thus the server can parse the entire string. If a user and a password are provided, the host part only comes after the @ character. In the KB spoof mentioned earlier, the link was
Understandably, it is no longer available. (In case you find a copy elsewhere, be aware that the page uses strong language and might trigger some content scanners as well.) As you have guessed, the real host of the page was www.hwnd.net. The string "www.microsoft.com" in this case is just a bogus username that is ignored by the web server.
Although perfectly valid syntactically, the above usage can be considered as having security relevance. While no technological resource is affected, the attack is targeted at the other (and often ignored) half of the picture: ourselves. At the end of most Internet nodes, beyond network cards, modems and computers, there are human users who, consciously or not, make security decisions every time they decide to trust what they see on the screen.
Trust is a fundamental security value. Crafting the URL as above exploits the trust we have in our understanding of what a URL is like and in whoever provided us the link. It also exploits the fact that our attention is focused on the content frame and not on the location although they are equally important in a decision of trust. In SSL-protected sites, the latter is in part taken care of by the browser, which compares the domain with the information in the SSL certificate; otherwise mere encryption would not provide much value if the destination is bogus.
The URL analyzed above is just superficially hiding its real destination. Let us look further into better ways of doing this. For some reasons (probably caused by the internal handling), some operating systems operate with IP addresses not only in the form we are used to, aaa.bbb.ccc.ddd, but also as the decimal equivalent.
The above generic address can also be written as the decimal value of aaa*256^3+bbb*256^2+cccc*256+ddd. Thus, 3633633987 is 220.127.116.11 (belonging to www.redhat.com). You can copy and paste 3633633987 into your browser, and you will find yourself browsing Red Hat's main site. The above works with Internet Explorer 5.x and also with Lynx on Linux, but I have not tested all operating systems, so your mileage may vary. Some applications may complain of invalid URLs if they parse the domain name for periods, but if you experiment with a few applications, including standard utilities like ping, you should be able to figure out whether the OS itself supports this usage.
Thus more obfuscation could be obtained by creating a URL such as http://www.toronto.com:ontario@3633633987 which still goes to Red Hat. Surfers are used to seeing strings of digits in a URL because many sites store the HTTP SessionID in the URL instead of in a cookie, so the above would not appear particularly suspicious. The password can be absent, so we end up having http://www.toronto.com@3633633987, "easy to read, easy to misunderstand" at a first glance.
Now, for the final touch, we can use a bit of HTML knowledge: the anchor tag allows the display text for a link to be different than the target itself, so the above link can appear as http://www.toronto.com. In IE 5.5, hovering with the mouse over it displays the number only in the status bar, not very indicative of a wrong target, so only clicking on it would show us the real target.
Yet another way of exploiting trust is by using the indirection provided by genuine websites. A number of well-known sites track if their visitors follow external links by first creating the links of the form http://www.thisisarespectablesite.com/outsidelinks/http://externalsite, trapping the request at the server side and then redirecting the user to the real destination.
The problem with this approach is that anyone can use their indirection, combined with URL obfuscation, in order to provide more legitimacy to false URLs. What this can lead to depends both on the attacker and on the victim. The HTTP REFERER field, limited as it is, can be of some value to reduce abuses, but not all sites seem to use it.
And if the above was not enough, the characters in the real destination can be obfuscated themselves through URL and Unicode encoding. so only the hex codes will be visible. URL encoding is required for many special characters, but can be applied to regular alphanumeric characters as well.
None of the above is new to knowledgeable spammers, but will likely be quite successful as an attack targeted to the average unsuspecting user.
Let's explore the security implications of the URL even further. One of the "standard" attacks would be to cause a buffer overflow. As far as the browsers go, however, by now this would be a very beaten path; many a hacker has tried to crash IE or Netscape. What about other protocols? Indeed, what other protocols are recognized on a machine?
To find out the answer for a Windows box, I turned to looking into the registry. The following keys contain such information: HKEY_LOCAL_MACHINE\SOFTWARE\Classes\PROTOCOLS\Handler and those keys under HKEY_CLASSES_ROOT\Shell that have a subkey named "URL Protocol." (You will have to do some searching for those in the latter category, but it does not take long.)
The search results proved interesting: apart from the expected ftp://, http://, https://, mailto://, news://, pnm:// and several others, I found some schemes I had never heard of before, such as msee://. A quick experiment showed that it is the scheme used by Microsoft Encarta, perhaps to refer to articles inside the encyclopedia. Whether Encarta is safe from buffer overflows and, if not, whether they can be practically exploited, well, this is something that would need investigation.
The story repeated with other URL schemes that were installed by various applications (such as copernic:// owned by the Copernic search tool). There have been other interesting discoveries, but have a look for yourself.
Apart from the possibility of remote exploitation of applications that are not otherwise remotely accessible, even more discomfort is caused by the absence of any administrative interface allowing inspection of the associations between a URL scheme and the application using it (apart from a very scope-limited dialog in Internet Explorer under Tools/Options/Programs which only displays a handful of standard protocols).
It turns out that registering a new URL scheme in Windows is trivial and the change takes place immediately. It is done by adding the necessary registry entries as described in this MSDN documentation. Unfortunately, this also means this can be done by scripted viruses such as KakWorm (which are executed by simply viewing an email on a vulnerable system).
Associating a benign protocol with a dangerous command is, well, dangerous. Granted, this is not a URL-specific attack. It can be done using file associating as well, but the risk is still there, and the existence of other attack paths does not mean this one will not be exploited. And, of course, nothing forces an attacker to use only the techniques described here.
Until there are more mechanisms to inform and protect us from such attacks, the best defense is to be cautious, and do not follow directions in emails you cannot trust. Sometimes, you just feel something isn't right.
Now, if you would only click this link for some free advice :-) ... Did you ?
SecurityPortal is the world's foremost on-line resource and services
provider for companies and individuals concerned about protecting their
information systems and networks.
The Focal Point for Security on the Net (tm)