Computer
Organization
The question- ‘How does a computer work?’ is concerned with
computer organization. Computer organization encompasses all physical aspects
of computer systems. The way in which various circuits and structural
components come together to make up fully functional computer systems is the
way the system is organized.
fig-1.1
Computer
Architecture
Computer architecture is concerned with the structure and
behavior of the computer system and refers to all the logical aspects of system
implementation as seen through the eyes of a programmer. For example, in
multithreaded programming, the implementation of mutexes and semaphores to
restrict access to critical regions and to prevent race conditions among
processes is an architectural concern.
fig-1.2
Therefore,
What does
a computer do? = Architectural concern.
How does
a computer work? = Organizational concern.
Taking automobiles as an example, making a car is a
two-step process on a general level- 1) Making a logical design of the car, 2)
Making a physical implementation of that design. The designer has to decide on
everything starting from how to design the car to achieve maximum environmental
friendliness and what materials to use in order to make the car cost effective
and at the same time make it look good. All this falls under the category of architecture.
When this design is actually implemented in a car manufacturing plant and a
real car is built, we say that organization has taken place.
What are the
benefits of studying computer architecture and organization?
Before delving into the technicalities, commonsense tells us
that as a user it is almost perfectly all right to be operating a computer
without understanding what the computer does internally to make ‘magical’
things happen on the screen, or how it does it.
if a computer science student (with a lack of knowledge on
computer architecture and organization) were to write a code that does not
comply with the internal architecture and organization of his/her computer then
the computer would behave strangely and in the end the student would have to
take it to some service center and blindly rely on those guys to fix the
problems. If this is the case, then the only difference between a user and a
computer science student is that the latter knows how to display the words-
‘Hello World’ onto the computer screen with a couple of different programming
languages.
Say, for example, if a game developer were to design a game
with the ‘frames per second’ property set to somewhere above 30 fps, his
game-play would become much faster but his CPU usage would rise dramatically,
making it very difficult for the CPU to do much else while the game is running
and slowing down the computer noticeably. This is because the ‘frames per
second’ value specifies how many times the screen gets updated every second. If
the value is too large that would mean too many updates per second and
therefore more and more of the power and attention of the CPU would have to be
focused on running the game. If the value is too low, then CPU usage would fall
remarkably, but the game would become much slower. Therefore, a value of about
30 would make the game considerably fast without using up too much of the CPU.
Without this knowledge the game developer would ignorantly be making
inefficient games that would struggle to make any commercial impacts.
What are the factors that prevent us
from speeding up?
1) The fetch-execute cycle: The CPU
generally fetches an instruction from memory and then executes it
systematically. This whole process can be cumbersome if the computer does not
implement a system such as ‘pipelining’. This is because in general the
fetch-execute cycle consists of three stages:
a)
Fetch
b)
Decode
c) Execute
Without a pipelining system, if the
CPU is already dealing with a particular instruction then it will only consider
another instruction once it is completely done executing the current one; there
is no chance of concurrency. Therefore that slows down the computer
considerably as each instruction has to wait for the other to finish first.
If, on the other hand, a pipelining
system is implemented then the CPU could begin to fetch a new instruction while
already decoding another. By the time it finishes executing the initial
instruction, the second instruction will be ready to be immediately executed.
In that way the CPU would be able to fetch, decode and execute more than one instruction
per clock cycle.
2) Hardware limitations: Each simple CPU
traditionally consists of one ALU (Arithmetic Logic Unit), for example, and
this restricts it to only being able to handle one instruction per clock cycle.
With the rapid advancement in technology and the resultant rise in computing
demands this is clearly not fast enough. Therefore a possible solution is to
take the architecture to the superscalar
level.
A superscalar architecture is that which consists
of two or more functional units that can carry out two or more instructions per
clock cycle. A CPU could be made with two ALU’s, for example.
fig-1.3
The figure above shows the micro-architecture
of a processor. It consists of two ALU’s, an on-board FPU (Floating Point Unit)
with its own set of floating point registers, and a BPU (Branch Prediction
Unit) for handling branch instructions. The cache memory on top allows the
processor to fetch instructions much faster .Clearly such a processor is built
for speed. With its four execution units it could execute four instructions all
at once.
3) Parallelism: A typical computer with a
single CPU can be tweaked to perform faster, but that performance will always
be limited, because after all, how much can you really get out of a single
processor? But imagine having more than one processor in a particular computer
system.
Such systems do exist in the form of
multiprocessors and parallel computers. They usually have between two and a few
thousands of interconnected processors- each having its own private memory or
having to share a common memory. The obvious advantage with such a system is
that any task it undertakes, it can be shared equally amongst all the
processors by allocating a certain part of the task to each respective
processor.
For example, if a calculation were to take
ten hours to complete on a conventional computer with one CPU, it would take far
less time in a multiprocessor or parallel computer; i.e. - if the calculation
could be split up into ten chunks, each requiring one hour to complete, then on
a multiprocessor system with ten processors it would take only one hour for the
entire calculation to complete because the different chunks would be run in parallel.
4) Clock
Speed and The Von Neumann Bottleneck: The clock speed of a computer
tells us the rate at which the CPU operates. Some of the first microcomputers
had clock speeds in the range of MHz (Megahertz). Nowadays we have speeds in
the range of GHz (Gigahertz).
One possible solution to achieving greater
speed would be to increase the clock speed. But however, it turns out that
increasing the clock speed alone does not guarantee noticeable gains in speed
and performance. This is because the speed of the processor is determined by
the rate at which it can retrieve data
and instructions from memory.
Suppose a particular task takes ten units
of time to complete- eight units of time are spent waiting on memory and the
remaining two are spent on processing. By doubling the clock speed without
improving the memory access time, the processing time would be reduced from two
to one unit of time (without any change in the memory access time). Therefore
the overall gain in performance will be only ten percent because the overall time
is now reduced from ten to nine units of time.
This limitation, caused by a mismatch in
speed between the CPU and memory, is known as the Von Neumann Bottleneck. One
simple solution to this problem is to install a cache memory between the CPU
and main memory, which effectively increases memory access time. Cache memory
has faster access time but comes with lower storage capacity as compared to
main memory and is more expensive.
5) Branch
prediction: The processor acts like a psychic and predicts which
branches or groups of instructions it has to deal with next by looking at the
instruction code fetched from memory. In the best case scenario if the
processor guesses correctly most of the time it can fetch the correct instructions
beforehand and buffer them- this will allow the processor to remain busy most
of the time. There are various algorithms for implementing branch prediction,
some very complex, often predicting multiple branches beforehand. All of this
is aimed at giving the CPU more work to do and therefore optimizing speed and
performance.