RSS

Category Archives: Computer Enggineering

Trojan Horses- The menace.

What Is A Trojan Horse????

  • Unauthorized instructions contained within a legitimate program. These
    instrcutions perform functions unknown to (and probably unwanted by) the user.
  • A legitimate program that has been altered by the placement of anauthorized
    instructions within it. These instructions perform functions unknown to (and
    probably unwanted by) the user.
  • Any program that appears to perform a desirable and necessary function but
    that (because of unauthorized instructions within it) performs functions
    unknown to (and probably unwanted by) the user.
  • Under a restricted environment (a restricted Unix shell or a restricted
    Windows computer), malicious trojans can’t do much, since they are restricted
    in their actions. But on a home PC, trojans can be lethal and quite
    destructive.

The Name- TROJAN

In the 12th century B.C., Greece declared war on the city of Troy. The dispute
erupted when the prince of Troy abducted the queen of Sparta and declared that
he wanted to make her his wife, which made the Greeks and especially the queen
of Sparta quite furious.
The Greeks gave chase and engaged Troy in a 10-year war, but unfortunately
for them, all of their efforts went down the drain. Troy was simply too well
fortified.
In a last effort, the Greek army pretended to be retreating, leaving behind a
hude wooden horse. The people of Troy saw the horse, and, thinking it was some
kind of a present from the Greeks, pulled the horse into their city, without
knowing that the finest soldiers of Greece were sitting inside it, since the
horse was hollow.
Under the cover of night, the soldiers snuck out and opened the gates of the
city, and later, together with the rest of the army, killed the entire army of
Troy.

This is why such a program is called a trojan horse – it pretends to do
something while it does something completely different, or does what it is
supposed to be and hides it’s malicious actions from the user’s prying eyes.

Remote Administration Trojans

These trojans are the most popular trojans now. Everyone wants to have
them trojan because they let you have access to your victim’s hard
drive, and also perform many functions on his computer (open and close his
CD-ROM drive, put message boxes on his computer etc’), which will scare off
most computer users and are also a hell lot of fun to run on your friends or
enemies.
Modern RAT’S (remote administration trojans) are very
simple to use. They
come packaged with two files – the server file and the client file (if you
don’t know which is which, look for a help file, a FAQ, a readme or
instructions on the trojan’s homepage). Just fool someone into runnig the
server file and get his IP and you have FULL
control over his/her computer
(some trojans are limited by their functions, but more functions also mean
larger server files. Some trojans are merely ment for the attacker to use them
to upload another trojan to his target’s computer and run it, hence they take
very little disk space). You can also bind trojans into other programs
which appear to be legitimate.
RAT’S have the common remote access trojan functions like:
keylogging
(logging the target’s keystrokes (keyboard functions) and sometimes even
interfering with them, thus being able to use your keyboard to type
instead of the target and say weird things in chatrooms or scare the
hell out of people), upload and download function, make a screenshot of the
target’s monitor and so on.
Some people use the trojans for malicious purposes. They either use them to
irritate, scare or harm their enemies, scare the hell out of their friends or
enemies and seem like a “super hacker” to them, getting information about
people and spying on them or just get into people’s computers and delete
stuff. This is considered very lame.
There are many programs out there that detects the most common trojans (such
as Nemesis at blacksun.box.sk, which also detects people trying to access
your computer), but new trojans are
released every day and it’s pretty hard to
keep track of things.
Trojans would usually want to automatically start whenever you boot-up your
computer. If you use Windows, you can get b00tm0n from blacksun.box.sk (note:
at the time this tutrial was released, b00tm0n was not ready yet, but it
should be ready some time before year 2,000, so if you’re reading this after
Y2K, b00tm0n should probably be available at blacksun.box.sk). Under Unix, we
suggest getting some sort of an IDS (Intrusion Detection System) programs to
monitor your system.
Most Windows trojans hide
from the Alt+Ctrl+Del menu (we havn’t seen any Unix
program that had the ability to hide itself from the processes list yet, but
you can never know – one day someone might discover a way to do so. Hell,
someone might have already did). This is bad because there are people who use
the task list to see
which process are running. There are programs that will
tell me you exactly what processes are running on your computer (such as
Wintop, which is the Windows version of the popular Unix program called top).
Some trojans, however, use fake names and it’s a little harder for certain
people to realize that they are infected.
Also, some trojans might simply open an FTP server on your computer (usually
NOT on port 21, the default FTP port, in order to be less noticable). The FTP
server is, of course, unpassworded, or has a password which the attacker has
determined, and allows the attacker to download, upload and execute files
quickly and easily.

How RATS Work????

Remote administration trojans open a port on your computer and bind themselves
to it (make the server file listen to incoming connections and data going
through these ports). Then, once someone runs his client program and
enters the victim’s IP, the trojan starts receiving commands from the
attacker and runs them on the victim’s computer.
Some trojans let you
change this port
into any other port and also put a password so only the person
that infect this specific computer will be able to use the trojan. However,
some of these password protections can be cracked due to bugs in the trojan
(people who program RATs usually don’t have much knowledge in the field of
programming), and in some cases the creator of the trojan would also put a
backdoor (which can be sometimes detected, under certain conditions) within
the server file itself so he’ll be able to access any computer running his
trojan without the need to enter a password. This is called “a backdoor within
a backdoor”.

The most popular RATs are Netbus (because of it’s simplicity), BO (has many
functions and hides itself pretty good) and Sub7 (lots of functions and easy
to use). These are all Windows RATs.
If you havn’t done so already, it is advised to get some RAT and play around
with it, just to see how the whole thing works.

Legitimate purposes

Some people use RATs to remotely administer computers they are allowed to have
access to. This is all good and fine, but anyway, you should always be careful
while working with RATs. Make sure you have legal access and the right to
remotely administer a computer before using a RAT on it.

Password Trojans

Yes, password trojans. Password trojans scour your computer for password and
then send them to the attacker or the author of the trojan. Whether it’s your
Internet password, your Hotmail password, your ICQ password or your IRC
passwords, there is a trojan for every passsword.
These trojans usually send the information back to the attacker via Email.

Priviledges-Elevating Trojans

These trojans would usually be used to fool system administrators. They can
either be binded into a common system utility or pretend to be something
unharmful and even quite useful and appealing. Once the administrator runs it,
the trojan will give the attacker more priviledges on the system.
These trojans can also be sent to less-priviledges users and give the attacker
access to their account.

Keyloggers

These trojans are very simple. They log all of your keystrokes (including
passwords), and then either save them on a file or Email them to the attacker
once in a while.
Keyloggers usually don’t take much disk space and can masquerade as important
utilities, thus making them very hard to detect.
Some keyloggers can also highlight passwords found in text boxes with titles
such as ‘enter password’ or just the word password somewhere within the title
text.

Destructive Trojans

These little fellows do nothing but damaging your computer. These trojans can
destroy your entire hard drive, encrypt or just scramble important files and
basically make you feel very unpleasent. I wouldn’t want to bump into one in a
dark alley.
Some might seem like joke programs, while they are actually tearing every file
they encounter to pieces.

Joke Programs

Joke programs are nice, cute and unharmful. They can either pretend to be
formatting your hard drive, sending all of your passwords to some evil
cracker, self-destructing your computer, turning in all information about
illegal and pirated software you might have on your computer to the FBI etc’.
They are certainly no reason to worry about (except if you work in tech
support, since unexperienced computer users tend to get scared off pretty
easily by joke programs.

HOW TO PROTECT YOURSELF??

IN UNIX

If you are working on your PC, DO NOT work as root! If you run a trojan as
root, you can endanger your entire system! The whole point in multi-users on a
single-user system is limiting yourself in such cases (or in case you want to
prevent yourself from doing anything stupid). Switch to root only when you
NEED root, and when you know what you’re running. Also, remember that even if
you’re working on a restricted environment, you still put the passwords and
files you still have access to to risk. Also, if someone has a keylogger on
your system, and you type in some passwords (especially the root password),
they will be logged!
Also, DO NOT download any files from untrusted sources
(small websites, underground websites, Usenet newsgroups, IRC etc’), even if
it comes in the form of source code.

IN WINDOWS

Windows is a whole lot different in this aspect. Limiting yourself under
Windows is quite an annoyance. It is almost impossible to work like that, in
comparison to Unix.
Also, make sure you don’t run any untrusted software. There are much more evil
Windows trojans for Windows than Unix, since people are more motivated to
write trojans for Unix (because of all the security Unix imposes).
Also, when running on a restricted Windows environment, you cannot just act
like you’re so protected and all. Remember that people can still steal
passwords owned by the restricted user, and also, some trojans can break into
administrator priviledges and then compromise your entire system, since
Windows imposes such lame security.

Oh, and one last tip – you should try to download and use at least some of the
types of trojans listed above, so you could get to know them better and be
able to remove them in case you get infected.

Advertisements
 

Proxy Servers

 

Some home networks, corporate intranets, and Internet Service Providers (ISPs) use proxy servers (also known as proxies). Proxy servers act as a “middleman” or broker between the two ends of a client/server network connection by intercepting all requests to the real server to see if it can fulfill the requests itself. If not, it forwards the request to the real server. Proxy servers work well between Web browsers and servers, or other applications, by supporting underlying network protocols like HTTP.

Proxy servers have two main purposes. One thing it can do is that it can dramatically improve performance for groups of users. This is because it saves the results of all requests for a certain amount of time. Consider the case where both user X and user Y access the World Wide Web through a proxy server. First user X requests a certain Web page, which will be called Page 1. Sometime later, user Y requests the same page. Instead of forwarding the request to the Web server where Page 1 resides, which can be a time-consuming operation, the proxy server simply returns the Page 1 that it already fetched for user X. Since the proxy server is often on the same network as the user, this is a much faster operation. Real proxy servers support hundreds or thousands of users. The major online services such as America Online, MSNand Yahoo, for example, employ an array of proxy servers.

Another feature of proxy servers is that it can filter requests. For example, a company might use a proxy server to prevent its employees from accessing a specific set of Web sites.

Proxies can do many other things. For example, they could translate multiple languages. They could shrink the size of a response so it fits on ones mobile phone webscreen. They could also filter nasty language or subjects.

Firewalling and Filtering- Proxy Servers

Proxy servers work at the Application layer (Layer 7) of the OSI model. As such, they aren’t as popular as ordinary firewalls that work at lower layers and support application-independent filtering. Proxy servers are also more difficult to install and maintain than firewalls, as proxy functionality for each application protocol like HTTP, SMTP, or SOCKS must be configured individually. But, a properly configured proxy server improves network security and performance. Proxies have capability that ordinary firewalls simply cannot provide.

Some network administrators deploy both firewalls and proxy servers to work in together. To do this, they install both firewall and proxy server software on a server gateway.

Connection Sharing with Proxy Servers

Various software products for connection sharing on small home networks have appeared in recent years. In medium- and large-sized networks, however, actual proxy servers offer a more scalable and cost-effective alternative for shared Internet access. Rather than give each client computer a direct Internet connection, all internal connections can be funneled through one or more proxies that in turn connect to the outside.

Proxy Servers and Caching

The caching of Web pages by proxy servers can improve a network’s “quality of service” in three ways. First, caching may conserve bandwidth on the network, increasing scalability. Next, caching can improve response time experienced by clients. With an HTTP proxy cache, for example, Web pages can load more quickly into the browser. Finally, proxy server caches increase availability. Web pages or other files in the cache remain accessible even if the original source or an intermediate network link goes offline.

Types of Proxy servers

Web

Proxies that attempt to block offensive web content are implemented as web proxies. Other web proxies reformat web pages for a specific purpose or audience; for example, Skweezer reformats web pages for cell phones and PDAs. Network operators can also deploy proxies to intercept computer viruses and other hostile content served from remote web pages.

A special case of web proxies are “CGI proxies.” These are web sites that allow a user to access a site through them. They generally use PHP orCGI to implement the proxying functionality.CGIproxies are frequently used to gain access to web sites blocked by corporate or school proxies. Since they also hide the user’s own IP address from the web sites they access through the proxy, they are sometimes also used to gain a degree of anonymity, called “Proxy Avoidance.

Intercepting

Many organizations — including corporations, schools, and families — use a proxy server to enforce acceptable network use policies (see content-control software) or to provide security, anti-malware and/or caching services. A traditional web proxy is not transparent to the client application, which must be configured to use the proxy (manually or with a configuration script). In some cases, where alternative means of connection to the Internet are available (e.g. a SOCKS server or NAT connection),

the user may be able to avoid policy control by simply resetting the client configuration and bypassing the proxy. Furthermore administration of browser configuration can be a burden for network administrators.

An intercepting proxy, often incorrectly called transparent proxy (also known as a forced proxy) combines a proxy server with NAT. Connections made by client browsers through the NAT are intercepted and redirected to the proxy without client-side configuration (or often knowledge).

Intercepting proxies are commonly used in businesses to prevent avoidance of acceptable use policy, and to ease administrative burden, since no client browser configuration is required.

Intercepting proxies are also commonly used by Internet Service Providers in many countries in order to reduce upstream link bandwidth requirements by providing a shared cache to their customers.

It is often possible to detect the use of an intercepting proxy server by comparing the external IP address to the address seen by an external web server, or by examining the HTTP headers on the server side.

Some poorly implemented intercepting proxies have historically had certain downsides, e.g. an inability to use user authentication if the proxy does not recognize that the browser was not intending to talk to a proxy. Some problems are described in RFC 3143 (Known HTTP Proxy/Caching Problems). A well-implemented proxy should not inhibit browser authentication at all.

Open

An open proxy is a proxy server which will accept client

connections from any IP address and make connections to any Internet resource. Abuse of open proxies is currently implicated in a significant portion of e-mail spam delivery. Spammers frequently install open proxies on unwitting end users’ operating systems by means of computer viruses designed for this purpose. Internet Relay Chat (IRC) abusers also frequently use open proxies to cloak their identities.

Because proxies might be used for abuse, system administrators have developed a number of ways to refuse service to open proxies. IRC networks such as the Blitzed network automatically test client systems for known types of open proxy. Likewise, an

email server may be configured to automatically test e-mail senders for open proxies, using software such as Michael Tokarev’s “proxycheck.”

Groups of IRC and electronic mail operators run DNSBLs publishing lists of the IP addresses of known open proxies, such as AHBL,CBL, NJABL, and SORBS.

The ethics of automatically testing

clients for open proxies are controversial. Some experts, such as Vernon Schryver, consider such testing to be equivalent to an attacker portscanning the client host. Others consider the client to have solicited the scan by connecting to a server whose terms of service include testing.

Reverse

A reverse proxyis a proxy server that is installed in the neighborhood of one or more web servers. All traffic coming from the Internet and with a destination of one of the web servers goes through the proxy server. There are several reasons for installing reverse proxy servers:

  • Security: the proxy server is
  • An additional layer of defense and therefore protects the web servers further up the chain.
  • Encryption / SSL acceleration: when secure web sites are created, the SSL encryption is often not done by the web server itself, but by a reverse proxy that is equipped with SSL acceleration hardware. See Secure Sockets Layer.
  • Load balancing: the reverse proxy can distribute the load to several web servers, each web server serving its own application area. In such a case, the reverse proxy may need to rewrite the URLs in each web page (translation from externally known URLs to the internal locations)

Split

A split proxy is effectively a pair of proxies installed across two computers. Since they are effectively two parts of the same program, they can communicate with each other in a more efficient way than they can communicate with a more standard resource or tool such as a website or browser. This is ideal for compressing data over a slow link, such as a wireless or mobile data service and also for reducing the issues regarding high latency links (such as satellite internet) where establishing aTCP connection is

time consuming. Taking the example of web browsing, the user’s browser is pointed to a local proxy which then communicates with its other half at some remote location. This remote server fetches the requisite data, repackages it and sends it back to the user’s local proxy, which unpacks the data and presents it to the browser in the standard fashion.

Anonymous Proxy Servers 

Anonymous proxy servers hide ones IP address and thereby prevent unauthorized access to that computer through the Internet. They do not provide anyone with that IP address and effectively hide all information about the user at hand. Besides that, they don’t even let anyone know that you are surfing through a proxy server. Anonymous proxy servers can be used for all kinds of Web-services, such as Web-Mail (MSN Hot Mail, Yahoo mail), web-chat rooms, FTP archives, etc. ProxySite.com – a place where the huge list of public proxies is compiled. In a database you always can find the most modern lists, the Proxy is checked every minute, and the list is updated daily from various sources. The system uses the latest algorithm for set and sortings of servers by proxy, servers for anonymous access are checked. Results of Search always can be kept in file Excel.


 

Firewalls

What is a network firewall?

A firewall is a system or group of systems that enforces an access control policy between two or more networks. The actual means by which this is accomplished varies widely, but in principle, the firewall can be thought of as a pair of mechanisms: one which exists to block traffic, and the other which exists to permit traffic. Some firewalls place a greater emphasis on blocking traffic, while others emphasize permitting traffic. Probably the most important thing to recognize about a firewall is that it implements an access control policy. If you don’t have a good idea of what kind of access you want to allow or to deny, a firewall really won’t help you. It’s also important to recognize that the firewall’s configuration, because it is a mechanism for enforcing policy, imposes its policy on everything behind it. Administrators for firewalls managing the connectivity for a large number of hosts therefore have a heavy responsibility.

Need of Firewall??

The Internet, like any other society, is plagued with the kind of jerks who enjoy the electronic equivalent of writing on other people’s walls with spraypaint, tearing their mailboxes off, or just sitting in the street blowing their car horns. Some people try to get real work done over the Internet, and others have sensitive or proprietary data they must protect. Usually, a firewall’s purpose is to keep the jerks out of your network while still letting you get your job done.

Many traditional-style corporations and data centers have computing security policies and practices that must be followed. In a case where a company’s policies dictate how data must be protected, a firewall is very important, since it is the embodiment of the corporate policy. Frequently, the hardest part of hooking to the Internet, if you’re a large company, is not justifying the expense or effort, but convincing management that it’s safe to do so. A firewall provides not only real security–it often plays an important role as a security blanket for management.

What can a firewall protect against??

Some firewalls permit only email traffic through them, thereby protecting the network against any attacks other than attacks against the email service. Other firewalls provide less strict protections, and block services that are known to be problems.

Generally, firewalls are configured to protect against unauthenticated interactive logins from the “outside” world. This, more than anything, helps prevent vandals from logging into machines on your network. More elaborate firewalls block traffic from the outside to the inside, but permit users on the inside to communicate freely with the outside. The firewall can protect you against any type of network-borne attack if you unplug it.

Firewalls are also important since they can provide a single “choke point” where security and audit can be imposed. Unlike in a situation where a computer system is being attacked by someone dialing in with a modem, the firewall can act as an effective “phone tap” and tracing tool. Firewalls provide an important logging and auditing function; often they provide summaries to the administrator about what kinds and amount of traffic passed through it, how many attempts there were to break into it, etc.

Because of this, firewall logs are critically important data. They can be used as evidence in a court of law in most countries. You should safeguard, analyze and protect yoru firewall logs accordingly.

What can’t a firewall protect against?

Firewalls can’t protect against attacks that don’t go through the firewall. Many corporations that connect to the Internet are very concerned about proprietary data leaking out of the company through that route. Unfortunately for those concerned, a magnetic tape, compact disc, DVD, or USB flash drives can just as effectively be used to export data. Many organizations that are terrified (at a management level) of Internet connections have no coherent policy about how dial-in access via modems should be protected. It’s silly to build a six-foot thick steel door when you live in a wooden house, but there are a lot of organizations out there buying expensive firewalls and neglecting the numerous other back-doors into their network. For a firewall to work, it must be a part of a consistent overall organizational security architecture. Firewall policies must be realistic and reflect the level of security in the entire network. For example, a site with top secret or classified data doesn’t need a firewall at all: they shouldn’t be hooking up to the Internet in the first place, or the systems with the really secret data should be isolated from the rest of the corporate network.

Another thing a firewall can’t really protect you against is traitors or idiots inside your network. While an industrial spy might export information through your firewall, he’s just as likely to export it through a telephone, FAX machine, or Compact Disc. CDs are a far more likely means for information to leak from your organization than a firewall. Firewalls also cannot protect you against stupidity. Users who reveal sensitive information over the telephone are good targets for social engineering; an attacker may be able to break into your network by completely bypassing your firewall, if he can find a “helpful” employee inside who can be fooled into giving access to a modem pool. Before deciding this isn’t a problem in your organization, ask yourself how much trouble a contractor has getting logged into the network or how much difficulty a user who forgot his password has getting it reset. If the people on the help desk believe that every call is internal, you have a problem that can’t be fixed by tightening controls on the firewalls.

Firewalls can’t protect against tunneling over most application protocols to trojaned or poorly written clients. There are no magic bullets and a firewall is not an excuse to not implement software controls on internal networks or ignore host security on servers. Tunneling “bad” things over HTTP, SMTP, and other protocols is quite simple and trivially demonstrated. Security isn’t “fire and forget”.

Lastly, firewalls can’t protect against bad things being allowed through them. For instance, many Trojan Horses use the Internet Relay Chat (IRC) protocol to allow an attacker to control a compromised internal host from a public IRC server. If you allow any internal system to connect to any external system, then your firewall will provide no protection from this vector of attack.

What about viruses??

Firewalls can’t protect very well against things like viruses or malicious software (malware). There are too many ways of encoding binary files for transfer over networks, and too many different architectures and viruses to try to search for them all. In other words, a firewall cannot replace security-consciousness on the part of your users. In general, a firewall cannot protect against a data-driven attack–attacks in which something is mailed or copied to an internal host where it is then executed. This form of attack has occurred in the past against various versions of sendmail, ghostscript, scripting mail user agents like Outlook, and Web browsers like Internet Explorer.

Organizations that are deeply concerned about viruses should implement organization-wide virus control measures. Rather than only trying to screen viruses out at the firewall, make sure that every vulnerable desktop has virus scanning software that is run when the machine is rebooted. Blanketing your network with virus scanning software will protect against viruses that come in via floppy disks, CDs, modems, and the Internet. Trying to block viruses at the firewall will only protect against viruses from the Internet. Virus scanning at the firewall or e-mail gateway will stop a large number of infections.

 

Internet Coookies

Most of the press articles and many simple books define cookies as-

“Cookies are programs that Web sites put on your hard disk. They sit on your computer gathering information about you and everything you do on the Internet, and whenever the Web site wants to it can download all of the information the cookie has collected.”
The problem is, none of that information is correct. Cookies are not programs, and they cannot run like programs do. Therefore, they cannot gather any information on their own. Nor can they collect any personal information about you from your machine.
Here is a valid definition of a cookie: A cookie is a piece of text that a Web server can store on a user’s hard disk. Cookies allow a Web site to store information on a user’s machine and later retrieve it. The pieces of information are stored as name-value pairs.
Most Internet cookies are incredibly simple, but they are one of those things that have taken on a life of their own. Cookies started receiving tremendous media attention back in 2000 because of Internet privacy concerns, and the debate still rages.
On the other hand, cookies provide capabilities that make the Web much easier to navigate. The designers of almost every major site use them because they provide a better user experience and make it much easier to gather accurate information about the site’s visitors.
In this article, we will take a look at the basic technology behind cookies, as well as some of the features they enable.
If you use Microsoft’s Internet Explorer to browse the Web, you can see all of the cookies that are stored on your machine. The most common place for them to reside is in a directory called c:windowscookies. When I look in that directory on my machine, I find 165 files. Each file is a text file that contains name-value pairs, and there is one file for each Web site that has placed cookies on my machine.
You can see in the directory that each of these files is a simple, normal text file. You can see which Web site placed the file on your machine by looking at the file name (the information is also stored inside the file). You can open each file by clicking on it.
The vast majority of sites store just one piece of information — a user ID — on your machine. But a site can store many name-value pairs if it wants to.
A name-value pair is simply a named piece of data. It is not a program, and it cannot “do” anything. A Web site can retrieve only the information that it has placed on your machine. It cannot retrieve information from other cookie files, nor any other information from your machine.

How does cookie data move?

As you saw in the previous section, cookie data is simply name-value pairs stored on your hard disk by a Web site. That is all cookie data is. The Web site stores the data, and later it receives it back. A Web site can only receive the data it has stored on your machine. It cannot look at any other cookie, nor anything else on your machine.
The data moves in the following manner:

  • If you type the URL of a Web site into your browser, your browser sends a request to the Web site for the page (see How Web Servers Work for a discussion). For example, if you type the URL http://www.amazon.com into your browser, your browser will contact Amazon’s server and request its home page.
  • When the browser does this, it will look on your machine for a cookie file that Amazon has set. If it finds an Amazon cookie file, your browser will send all of the name-value pairs in the file to Amazon’s server along with the URL. If it finds no cookie file, it will send no cookie data.
  • Amazon’s Web server receives the cookie data and the request for a page. If name-value pairs are received, Amazon can use them.
  • If no name-value pairs are received, Amazon knows that you have not visited before. The server creates a new ID for you in Amazon’s database and then sends name-value pairs to your machine in the header for the Web page it sends. Your machine stores the name-value pairs on your hard disk.
  • The Web server can change name-value pairs or add new pairs whenever you visit the site and request a page.

There are other pieces of information that the server can send with the name-value pair. One of these is an expiration date. Another is a path (so that the site can associate different cookie values with different parts of the site).
You have control over this process. You can set an option in your browser so that the browser informs you every time a site sends name-value pairs to you. You can then accept or deny the values.

How do Web sites use cookies?

Cookies evolved because they solve a big problem for the people who implement Web sites. In the broadest sense, a cookie allows a site to store state information on your machine. This information lets a Web site remember what state your browser is in. An ID is one simple piece of state information — if an ID exists on your machine, the site knows that you have visited before. The state is, “Your browser has visited the site at least one time,” and the site knows your ID from that visit.
Web sites use cookies in many different ways. Here are some of the most common examples:

  • Sites can accurately determine how many people actually visit the site. It turns out that because of proxy servers, caching, concentrators and so on, the only way for a site to accurately count visitors is to set a cookie with a unique ID for each visitor. Using cookies, sites can determine:
    • How many visitors arrive
    • How many are new versus repeat visitors
    • How often a visitor has visited
  • The way the site does this is by using a database. The first time a visitor arrives, the site creates a new ID in the database and sends the ID as a cookie. The next time the user comes back, the site can increment a counter associated with that ID in the database and know how many times that visitor returns.
  • Sites can store user preferences so that the site can look different for each visitor (often referred to as customization). For example, if you visit msn.com, it offers you the ability to “change content/layout/color.” It also allows you to enter your zip code and get customized weather information. When you enter your zip code, the following name-value pair gets added to MSN’s cookie file:
  • WEAT  CC=NC%5FRaleigh%2DDurham&REGION=  www.msn.com/
  • Since I live in Raleigh, N.C., this makes sense.
  • Most sites seem to store preferences like this in the site’s database and store nothing but an ID as a cookie, but storing the actual values in name-value pairs is another way to do it (we’ll discuss later why this approach has lost favor).
  • E-commerce sites can implement things like shopping carts and “quick checkout” options. The cookie contains an ID and lets the site keep track of you as you add different things to your cart. Each item you add to your shopping cart is stored in the site’s database along with your ID value. When you check out, the site knows what is in your cart by retrieving all of your selections from the database. It would be impossible to implement a convenient shopping mechanism without cookies or something like them.

­ In all of these examples, note that what the database is able to store is things you have selected from the site, pages you have viewed from the site, information you have given to the site in online forms, etc. All of the information is stored in the site’s database, and in most cases, a cookie containing your unique ID is all that is stored on your computer.

Problems with Cookies

Cookies are not a perfect state mechanism, but they certainly make a lot of things possible that would be impossible otherwise. Here are several of the things that make cookies imperfect.

  • People often share machines – Any machine that is used in a public area, and many machines used in an office environment or at home, are shared by multiple people. Let’s say that you use a public machine (in a library, for example) to purchase something from an online store. The store will leave a cookie on the machine, and someone could later try to purchase something from the store using your account. Stores usually post large warnings about this problem, and that is why. Even so, mistakes can happen. For example, I had once used my wife’s machine to purchase something from Amazon. Later, she visited Amazon and clicked the “one-click” button, not realizing that it really does allow the purchase of a book in exactly one click.
  • On something like a Windows NT machine or a UNIX machine that uses accounts properly, this is not a problem. The accounts separate all of the users’ cookies. Accounts are much more relaxed in other operating systems, and it is a problem.
  • If you try the example above on a public machine, and if other people using the machine have visited HowStuffWorks, then the history URL may show a very long list of files.
  • Cookies get erased – If you have a problem with your browser and call tech support, probably the first thing that tech support will ask you to do is to erase all of the temporary Internet files on your machine. When you do that, you lose all of your cookie files. Now when you visit a site again, that site will think you are a new user and assign you a new cookie. This tends to skew the site’s record of new versus return visitors, and it also can make it hard for you to recover previously stored preferences. This is why sites ask you to register in some cases — if you register with a user name and a password, you can log in, even if you lose your cookie file, and restore your preferences. If preference values are stored directly on the machine (as in the MSN weather example above), then recovery is impossible. That is why many sites now store all user information in a central database and store only an ID value on the user’s machine.
  • If you erase your cookie file for HowStuffWorks and then revisit the history URL in the previous section, you will find that HowStuffWorks has no history for you. The site has to create a new ID and cookie file for you, and that new ID has no data stored against it in the database. (Also note that the HowStuffWorks Registration System allows you to reset your history list whenever you like.)
  • Multiple machines – People often use more than one machine during the day. For example, I have a machine in the office, a machine at home and a laptop for the road. Unless the site is specifically engineered to solve the problem, I will have three unique cookie files on all three machines. Any site that I visit from all three machines will track me as three separate users. It can be annoying to set preferences three times. Again, a site that allows registration and stores preferences centrally may make it easy for me to have the same account on three machines, but the site developers must plan for this when designing the site.
  • If you visit the history URL demonstrated in the previous section from one machine and then try it again from another, you will find that your history lists are different. This is because the server created two IDs for you, one on each machine.

There are probably not any easy solutions to these problems, except asking users to register and storing everything in a central database.

Cookies on the Internet: Privacy Issues

If you have read the article to this point, you may be wondering why there has been such an uproar in the media about cookies and Internet privacy. You have seen in this article that cookies are benign text files, and you have also seen that they provide lots of useful capabilities on the Web.
There are two things that have caused the strong reaction around cookies:

  • The first is something that has plagued consumers for decades. Let’s say that you purchase something from a traditional mail order catalog. The catalog company has your name, address and phone number from your order, and it also knows what items you have purchased. It can sell your information to others who might want to sell similar products to you. That is the fuel that makes telemarketing and junk mail possible.
  • On a Web site, the site can track not only your purchases, but also the pages that you read, the ads that you click on, etc. If you then purchase something and enter your name and address, the site potentially knows much more about you than a traditional mail order company does. This makes targeting much more precise, and that makes a lot of people uncomfortable.
  • Different sites have different policies. HowStuffWorks has a strict privacy policy and does not sell or share any personal information about our readers with any third party except in cases where you specifically tell us to do so (for example, in an opt-in e-mail program). We do aggregate information together and distribute it. For example, if a reporter asks me how many visitors HowStuffWorks has or which page on the site is the most popular, we create those aggregate statistics from data in the database.
  • The second is unique to the Internet. There are certain infrastructure providers that can actually create cookies that are visible on multiple sites. DoubleClick is the most famous example of this. Many companies use DoubleClick to serve banner ads on their sites. DoubleClick can place small (1×1 pixels) GIF files on the site that allow DoubleClick to load cookies on your machine. DoubleClick can then track your movements across multiple sites. It can potentially see the search strings that you type into search engines (due more to the way some search engines implement their systems, not because anything sinister is intended). Because it can gather so much information about you from multiple sites, DoubleClick can form very rich profiles. These are still anonymous, but they are rich.
  • DoubleClick then went one step further. By acquiring a company, DoubleClick threatened to link these rich anonymous profiles back to name and address information — it threatened to personalize them, and then sell the data. That began to look very much like spying to most people, and that is what caused the uproar.
  • DoubleClick and companies like it are in a unique position to do this sort of thing, because they serve ads on so many sites. Cross-site profiling is not a capability available to individual sites, because cookies are site specific.
 
 
 
%d bloggers like this: