Cyber-Criminal OPSEC – a Three-Part Series. Part II: Environmentals

Posted on 12 June 2012 by

3


In Part I of this three-part series, we discussed the most basic of attribution methods, IP address analysis. In Part II, we talk about computer environmentals, and building a device fingerprint. And in Part III, we talk about tools, techniques, tactics and procedures used by cyber criminals.

One of the things that mystifies us most is the general lack of comprehension exhibited by the vast majority of Internet users about something they do each and every day: surf the web. One of the most common misunderstandings is that, when you visit a web page, there’s a page there, like that you’d find in a newspaper or magazine.

In fact, contemporary websites are little more than a series of scripts which interact with your web browser, negotiating what is possible for your browser to display, and then producing dynamically the appropriate content for you. This interactivity, which is the basis of the whole, you know, Internet thing, is based on rapidly identifying what flavor your computer is, what it can do, what it can’t.

The sum of that process, it may disturb you to know, is a picture of a rare – and in some cases, unique – user: you.

For example, my computer has 495 installed fonts, from Abadi MT Condensed Extra Bold, through Apple Braille Outline 6 Dot, Braggadocio, Gill Sans MT Bold Italic, Hiragino Sans GB W3, the Microsoft family (Times New Roman, Verdana, Trebuchet, etc), STIXSizeOneSym Bold all the way to Zapf Dingbats. What are the chances that you have precisely the same fonts installed that I have?

Really low.

How do I know I have 495 fonts installed? Because a website asked me, and my computer ratted me out.

It also dropped a dime on the fact that I’m running a MacBook Pro and the version of the operating system, every single browser plug-in I have installed from Chrome PDF Viewer to WebEx64 General Plugin Container Version 202; that my monitor was set to 1280×800 and 24-bit color-depth, that I’ll accept zipped content and, actually, a whole bunch more information including the fact that, today, I’m coming in from Lower Manhattan, New York City, New York State, USA.

How long did it take my computer to pass on that information? Less time than it takes to blink your eye.

In this article, we’ll talk about how these attributes can be – and frequently are – used to profile visitors to a website. Like most things on the Internet, these are used for good, “a little creepy,” and bad.

Good Uses
As a few of literally millions of good examples, the kind of user fingerprinting or profiling we’re discussing here can be used to deliver your morning news page from a national site like Google or the New York Times in the order that makes sense for you. When you go to Google News you’ll see a vastly different page from when I go to the same site, because Google tracks a whole mess of information about who you are based on your computer environmentals, and even more if you’re signed in.

When you go to the AAA website to plan a roadtrip, you might notice that you’re directed to the branch office reasonably near you – again, this is geographical profiling bringing you to the place that AAA has deemed most likely you want to go.

When you go to multimedia sites and you get the high-end, super-groovy Flash and Java filled page, that’s because your browser was asked – and answered in the affirmative – whether it could support those things.

And the page was customized to give you what the site designers consider to be the “best” experience your equipment can support, and the contect tailored to your language settings, country settings and geographical region.

A Little Creepy
You’ve probably noticed that, as you surf the web, ads which speak directly to things you’re interested in keep showing up in the weirdest places. That’s based on a number of things starting with your environmentals and going through “cookies”, which are small text files containing information about stuff you’ve seen, places you’ve visited, things you’ve bought, forums you’ve been to and other really personal information.

This customization is highly personal, because it is based not just on the objective information about your hardware and software, but alson on totally subjective things such as subjects in which you have a demonstrated interest. To take things one step further, some of the sites you visit rent you out to others, who place third-party cookies on your browser – that is, people have decided for you that, if you like A, you also like B.

And B, by the way, throws lots of information about you around so that B (and anyone who pays B) has access to a wealth of information about you that can grow over time as you surf the web and visit A, B, and friends (that is, paid associates or partners) of B.

You could be, in fact you almost certainly are, lugging around a veritable sack of cookies on which personal information from thousands of places resides.

Bad Uses
Some bad uses – and there are gazillions of them – would include examination of your environmentals for the purposes of presenting your computer with malicious software designed to exploit the vulnerabilities of your setup. Using an old version of Office or Adobe Acrobat? Pa-POW! Here’s a nice file, just for you. Oh, and thanks for telling us that you’ll accept that kind of document right there in your HTTP request.

In another post we’ll get into how this kind of thing can install, in seconds and without you noticing, software on your computer which can allow criminals to remotely monitor and control it.

You won’t sleep for a week.

Below we’ll give a couple of resources for addressing some of this,  but this post is not really about that.

What Information Are You Handing Out?
Speaking of scaring the be-jeepers out of yourself, have a look at what the Electronic Frontier Foundation (EFF) has put together on you in a single click of your mouse. How do they do it?

They ask you.

To brush up, read the Wikipedia section on Hypertext Transfer Protocol (HTTP) and let’s look at a typical HTTP request from your browser to a server.

At its simplest, when you type an address into a web browser you’re doing a few things. First, you’re asking your computer to figure out that, for example, typing the URL

http://nickselby.com

is really making a request to visit the IP address

74.208.86.205

That little translation from name space to IP happens at a Domain Name Server which translates human readable names into a somewhat less human readable IPV4 address (and this step is an important one in the world of cyber-crime, which we will come back to in a later post).

The next thing that happens is, essentially, this:

You: Yo! How ya doin’? I’m Nick, I’m in the northeast USA, running a Mac in US English with Chrome and these {insert long list of plug-ins, file-formats, doodads and extras} things here and I want a page.
Website: Hey, Nick. God ta see ya. I see you like {same list}, but I gotta ask: do you use {whatever special thing it may want}?
You: Yes.
Website: OK {it says, while grabbing information from its database, from those of its partners and paid advertisers, then slapping them together}, here’s a page for you! Go ta town and lemme know when you want some more.

This happens, every single time you go to a website, in the time between your hitting the ENTER key or clicking the link and the time the page appears in your browser. That regularity is essential for you to understand: it never doesn’t happen.

Some of the basic stuff you’re sending
IP Address This was the subject of Part I.

User Agent Your user-agent is a string which describes your computer and browser to the website. Today mine looks like this:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.53 Safari/536.5

As you can see, I’m on a Mac, I’m visiting this website using Google Chrome.

Nota Bene It’s important to realize that, like the IP Address, the user-agent string is easily forged, and can’t be relied upon as a fingerprinting method. In fact, many more advanced security researchers and monitors use badly formed forged user-agent strings as a good indicator of malicious intent or actual malware. It’s not definitive, but it’s a good place to start: the real world equivalent is when you do a car stop and someone gives you a false name and date of birth. It’s not proof that something is up beyond the offense of failure to identify, but it’s a damned good indicator that something will prove to be not Kosher. Many of the environmentals that we discuss are like this. A fingerprint comprises a whole boatload of things that, taken together, amount to a profile or (if you’re very lucky) a unique and identifying collection of components.

Referer One interesting tidbit you bring along with you is the exact last page you came from (if you followed a link as opposed to, say, just typing in the URL). This is the HTTP-Referer field, and I discussed it in Part I, when Google.in sent me some traffic.

Content Negotiation This is where things start to get personal. Up until now, we’ve told on the purported IP address and the kind of computer and browser we claim to have. Here’s where we start telling stuff about us: what language we speak, what timezone we say we’re in, what applications we like, wether we’ll accept cookies, whether we run JavaScript, whether we run Flash, what images we can view, how we want you to send us messages, what kind of audio and video we can cope with, and a whole range of other attributes.

These look like:

(QuickTime HTML (QHTM); text/x-html-insertion; qht,qhtm) (MIDI; audio/mid; mid,midi,smf,kar) (QuickTime Player Movie; application/x-quicktimeplayer; qtl) (TIFF image; image/tiff; tif,tiff) (QuickTime Movie; video/quicktime; mov,qt,mqv) (MIDI; audio/x-midi; mid,midi,smf,kar) (JPEG2000 image; image/jpeg2000-image; jp2) (AIFF audio; audio/x-aiff; aiff,aif,aifc,cdda) (AC3 audio; audio/ac3; ac3) (MP3 playlist; audio/mpegurl; m3u,m3url) (3GPP2 media; video/3gpp2; 3g2,3gp2) (SMIL 1.0; application/smil; smi,sml,smil) (GSM audio; audio/x-gsm; gsm); …..

and can go on and on and on. You can get a fairly complete list of HTTP Header statements here.

Later, some interesting analysis can be done by comparing these things which don’t add up – like an IP address resolving to Boston, MA, USA, a language setting of Polish and a timezone setting of GMT+2.

But let’s stay in capture for the time being.

Supercookies Many people know about cookies, and try to delete them. They remind me somewhat of the Windows De-frag, which was heavily relied upon by lazy call-center workers to get people off the phone by telling them to defrag their drives, and is now the first thing people do when their computer starts running slowly. To combat the wholesale deletion of cookies from browsers, the Super cookie was invented. It is a file which is hidden in non-standard places by programs like Flash and Microsoft’s Silverlight, and you can read a good article about what they are here.

Other Environmentals
What I’ve just described is the most basic of the basics. Nothing in there is particularly sophisticated or difficult to elicit from your browser when you come a-knockin’. On the page above from the EFF, for example, they use a script from browserspy.dk to pull information about the fonts you have installed.

Browserspy also can pull, depending on how your browser is set up, a bunch of interesting cookie information showing where you’ve been lately.

They’re one of hundreds or thousands of groups tracking this kind of stuff. Firms like The 41st Parameter probe your system and its software and hardware components in a manner I have previously described as “diabolical”.

For a bit more information about the things discussed here, have a look at articles from the EFF on browser uniqueness and from How To Geek on user tracking, and the like.

So far this series has concentrated on giving the basics of what cyber criminals know about how people track them. In Part I we discussed IP addresses. In Part II we covered browser and computer environmentals and other things which can be used to create device fingerprints.

Then, as we get ready for part III, in which we will discuss methods of thwarting these for the purposes of committing cyber crimes without leaving, well, fingerprints. Put another way, now that we’ve covered the basics of online attribution, we will explore ways to subvert them.