## **:: ASSIGNMENT # 2 ::**

**ID:** 11533

Name: Ashir Ali Khan

**Subject: Computer Architecture** 

**Teacher: Sir Muhammad Amin** 



**Iqra National University** 

| Assignment: 2 1-                                                                                                                                                                           |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Assign 1:                                                                                                                                                                                  |
| Question: 1:-                                                                                                                                                                              |
| Answer: - decktors applications that require                                                                                                                                               |
| Different deskhops applications that require  Different deskhops applications that require  He great power of contemporary microprocessor—  He great power of contemporary microprocessor— |
|                                                                                                                                                                                            |
| 22096                                                                                                                                                                                      |
| Speech VECSGITHIS                                                                                                                                                                          |
| Video confeiencing.  Mulhmedia authoring.                                                                                                                                                  |
| e voice and video annotation q files.                                                                                                                                                      |
| Simulation modeling.                                                                                                                                                                       |
| X                                                                                                                                                                                          |
| Part : B :- Answer :-                                                                                                                                                                      |
| The techniques used for contemporary                                                                                                                                                       |
| processor to increase speed are.                                                                                                                                                           |
| 1. Pipelining:                                                                                                                                                                             |
| pipelining is when computer, receives<br>multiple instructions and carry them out as                                                                                                       |
| they are received.                                                                                                                                                                         |
| 2. Branch Prediction:                                                                                                                                                                      |
| Errock prediction is the process of being                                                                                                                                                  |
| able to predict the next set of instruction                                                                                                                                                |
| so that frey can cauted our                                                                                                                                                                |
| Superscalar Execution:                                                                                                                                                                     |
| Superscalar Execution is when you are able to give more that than one set of instruction at line                                                                                           |
| at time                                                                                                                                                                                    |
|                                                                                                                                                                                            |

4. Data flow Analysis: Data flow analysis analyzes instruction that need each other 5. Speculative execution: speculative execution carry out instruction before they are actually executed Part: C:-Answer :clock speeds and logic density increase a number of obstacles become more significant including. · Power :-The power density increases with an increase with an increase in logic density and clock speed. One Challenge of this is the difficulty of dissipating the heat generated on high density, high - speed chips RC delay: The speed at which electrons can flow on a chip between transishors is limited by the resistance and capacitance of the metal wire cornecting Hem. Delay increases as the Re product intreases. As components on the chip decrease in size, the wifer are closes together, increasing capacitance Memory lakency:-Memory speeds lag processor speeds.

Paet: D:-

Answer: -

Amdahl's law deals with the potential speedup of a program using multiple processor compared to a single processor. Consider a program running on a single processor such that a fraction (1-f) of the execution time involves code that is inherently sequential, and a fraction of that involves code that is infinitely parallelizable with no scheduling avertead. Let The the total execution time of the program using a single processor. Then the speed up using a parallel processor with N processors that fully exploits the parallel portion of the program is as follows.

Speedup - Time to execute program on a single processor Time to execute program on M parallel processor

 $= \frac{T(1-f) + Tf}{T(1-f) + Tf} = \frac{1}{(1-f) + f}$ N

Two important conclusions can be diaun: 4) when f is small, the use of parallel processors has little effect. 2) As Napproaches infinity, speedup is bound

2) As Napproaches infinity, speedup is bound by 1/(1-f), so that Here are diminishing returns for using more processors.

## Part: E:-

Answer:-

Multicore :-

Multicore refers to an architecture is a single physical processor incorporate the core logic of more than one processor. A single integrated circuit is used he package or hold these processors. These single integrated circuit are known as a die Multicore architecture places multiple processors cores and bundles them as a single physical processors. The objectives is to create a system that can complete more tasks at the same time, thereby gaining better overall system performance. This technology is most commande multicure processors, where more processors chips or cores for concurrently as a single system. Multicorebased processors are used in mobile device, desletop, work stations and servers. The concept of multicore technology is mailly he possibility of parallel computing, which can significantly computer speed and efficiently two or more central processing units (CPUs) in a single chip. This reduces the system heat and power consumption. This mean much better performance with less or the some amount of energ(5)

chip manufacturers are now in the process

a making a huge leap forward in the

number of cores per chip, with more than

so cores per chip. The leap in performance

as well as the challenger in developing

software to exploit such a large

number of cores has led to the the

introduction of a new term known as many

Integrated core (MIC).

GPGPU:
A general - purpose GPU (GPGPU) is a

graphic processing unit (GPU) that perform

non-specialized calculations that would

A general - purpose GPU (GPGPU) is a graphic processing unit (GPU) Heat perform non-specialized calculations that would hypically be conducted by the (PU (central processing unit). Ordinarily, the GPU is dedicated to graphics lendering.

GPGPUS are used for tasks, that were formerly the domain of high-power CPUs, such as physics calculations, encryption, scientific computations, and the generation of crypto currencies such as Bit coin.

Because graphics couds are constructed for massive parallelism, they can dwarf the calculation reate of even the most powerful (PUs for many parallel processing trasts. The same shader cover that allows multiple pixels to be rendered simultaneously can similarly process multiple streams of date

| at the same time. Although a shade |
|------------------------------------|
| - 3/5 00019 00                     |
| liel and CDO miles have            |
| of contracts                       |
| might have eight or helve core     |
| V X                                |

Question: - 2:-Part: A:-

Given:

Clock speed of the processor = 60 MH, Number of instructions the executed program consist = 104,000

| Instruction type   | Instruction Count | Cyclespe introd |
|--------------------|-------------------|-----------------|
| Integer authenetic | 46000             | 1               |
| Data Transfer      | 33000             | 2               |
| Flughing point     | 16000             | 2               |
| Control transfer   | 910810            | 2               |

To Find .-

CP1 = ?

MIPS rate = 2

Execution time = ?

Solution

Calculating the CPI is:

CPI - Instruction count x (yeles per second

Number of instruction the executed program

consist.

Substitute the values of "instruction count" and

"cycles per second" from the above table in



| H | Frecultor hime                                                                                                                                                           |                   |             |  |  |  |
|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|-------------|--|--|--|
| r | Part: B:- X                                                                                                                                                              |                   |             |  |  |  |
|   | Answer:                                                                                                                                                                  |                   |             |  |  |  |
|   | Given :-                                                                                                                                                                 |                   |             |  |  |  |
| 1 | 7,                                                                                                                                                                       |                   |             |  |  |  |
| 1 | lockuction type                                                                                                                                                          | Instruction Count | Cycles per- |  |  |  |
| 1 | Machine A.                                                                                                                                                               | (millions)        |             |  |  |  |
| 1 | Arithmetic and logic                                                                                                                                                     |                   | 1           |  |  |  |
| 1 | load and store                                                                                                                                                           | <b>2</b> 4        | 3           |  |  |  |
| 1 | Branch .                                                                                                                                                                 | 2                 | 4           |  |  |  |
| 1 | others.                                                                                                                                                                  | 4                 | 3           |  |  |  |
| 1 | Machine A.                                                                                                                                                               |                   |             |  |  |  |
| - | Authoretic and logic                                                                                                                                                     | 10                | 1           |  |  |  |
| - | load and share                                                                                                                                                           | 8                 | 2           |  |  |  |
| - | Branch                                                                                                                                                                   | 2                 | 4           |  |  |  |
| - | offers.                                                                                                                                                                  | 1 4               | 3           |  |  |  |
|   | Solution                                                                                                                                                                 | T. (0.1           |             |  |  |  |
| - | CPIA = SCPIEX                                                                                                                                                            |                   |             |  |  |  |
| - |                                                                                                                                                                          | (8+4+             | 2+4) x10    |  |  |  |
|   |                                                                                                                                                                          | 22.               | 6 00        |  |  |  |
|   | MIPS = F = 200 x 106 = 90                                                                                                                                                |                   |             |  |  |  |
|   | (PIA X106 2.22 X 106                                                                                                                                                     |                   |             |  |  |  |
|   | CPUA = IX CPIA = 18 x 104 x 2.2 - 0.25                                                                                                                                   |                   |             |  |  |  |
|   | E 2100 x 10°                                                                                                                                                             |                   |             |  |  |  |
|   | $CPI_{B} = \frac{200 \times 10^{6}}{(10 \times 1 + 8 \times 2 + 2 \times 4 + 4 \times 3) \times 10^{6}}$ $I_{B} = \frac{10 + 8 + 2 + 4}{(10 + 8 + 2 + 4) \times 10^{6}}$ |                   |             |  |  |  |
|   | J. (10+8+2+4) x 10°                                                                                                                                                      |                   |             |  |  |  |
|   |                                                                                                                                                                          | = 1.92.           |             |  |  |  |
|   |                                                                                                                                                                          |                   |             |  |  |  |
| - |                                                                                                                                                                          |                   |             |  |  |  |

| 0    | , | - |    |
|------|---|---|----|
| Pai  | - |   | 1- |
| · uc | - | / | •  |

Answer:-

(A) Determine the average (P1:Since we have the same instruction mix,
that means the additional instructions
for each task could be allocated appropriate
between the instruction types. Therefore
the following table be gotten:

| н                    |                      |               |                 |
|----------------------|----------------------|---------------|-----------------|
|                      | Instruction type     | CPT           | Instruction Mix |
| -                    | Arithmetic and logic | 1             | 60%             |
| -                    | 100d/store with      | 1-5 // 1-4    | 181/            |
| -                    | cache hit            |               |                 |
| -                    | Branch               | -4            | 12/             |
| -                    | Memory               | 12            | 10%             |
| -                    | reference with       | Lagrania III. |                 |
| Section and Personal | cache miss           |               |                 |
|                      |                      |               |                 |

The average CPI = (1x0.6) + (2x0.18) + (4x0.12) + (12x0.1) = 2.64

Therefore, the CPI has been increased since the time for memory access is also increased

(B) Determine the average MIPS rate:

MIPS = 400/2.64 = 152.

MIPS = 152

There is a corresponding drop in the MIPS rate.

(C) Calculate He speedup factor:

The speedup factor equals to the ratio of the execution times. The execution time is calculated as the following: T = 1c/NIPS x 186 For the one processor, T, = (2×106)/(178×10,) For the 8 processor, each processor executes 1/8 of the 2 million instruction plus the 1-8 ms 152 X106 Therefore we have Speedup - time to execute program in single processor time to execute program on N parallel processor = 11 = 6.11 1.8 (d) Compare the actual speedup Factor with the theoretical speedup factor determined by Amdahl's law. In fact, there are two inefficiencies in the parallel system The first one is that there are more additional instructions which is added to roordinate between Heads. The second one is that there is contention for memory access. Thus, none of the code

(12) is inherently serial, and all git is parallelisable but with scheduling overhead. It could be said that the membry access conflict means some extent memory reference instroctions are not parallelizable By depending on the information given, it is not obvious how to quantify this effect in Amdahl's equation. Therefore, if it is supposed that the brackion of code, which is parallelisable, is F=1, Hen Andahi's law decreases speed up = N = 8. Therefore, the actual speedup is only about 75%, of the theoretical