Generic Code Generation Algorithm in Compilers

Generic Code Generation Algorithm

Assume that for each operator in the statement, there is a corresponding target language operator

The computed results can be left in registers as long as possible, storing them only if the register is needed for another computation or just before a procedure call, jump or labeled statement

Register and Address Descriptors

These are the primary data structures used by the code generator. They keep track of what values are in each registers, as well as where a given value resides

Each register has a register descriptor containing the list of variables currently stored in this register. At the start of the basic block, all register descriptors are empty. It keeps track of recent/current variable in each register. It is constructed whenever a new register is needed
Each variable has an address descriptor containing the list of locations where this variable is currently stored. Possibilities are its memory location and one or more registers
The memory location might be in the static area, the stack, r presumably the heap. The register descriptors can be computed from the address descriptors. For each name of the block, an address descriptors is maintained that keep track of location where current value of name is found at runtime

There are basically three aspects to be considered in code generation:

Choosing registers

Generating instructions

Managing descriptors

Register allocation is done in a function getReg(Instruction). The instruction generation algorithms uses getReg() and the descriptors

The generation of machine instruction is done as follows:

Given a TAC, OP x, Y(i.e., x=x OP y), generation of machine instructions proceeds as follows:

Step 1: Call getReg(OP, x, y) to get R_x and R_y, the registers to be used for x and y respectively. GetReg merely selects the registers, it does not gurantee that the desired values are present in these registers.

Step 2: Check the register descriptor for R_y. If y is not present in Ry, check the address descriptor for y and issue LD R_y, y

Step 3: Similar treatment is done for R_x

Step 4: Generate the instruction OP R_x, R_y

When the TAC is x = y, step 1 and step 2 are same (getReg() will set Rx =Ry). Step 3 is empty and step 4 is omitted. If ‘y’ was already in a register before the copy instruction, no code is generated at this point. Since the value of ‘y’ is not in its memory location, we may need to store this value back into ‘y’ at block exit.

All variables needed by (dynamically) subsequent blocks (i.e., that live-on-exit) have their current values in their memory locations. Such live variables are identified as follows:

Temporaries never live beyond a basic block. Hence, they are ignored
Variables dead on exit are also ignored
All live on exit variables need to be stored in their memory location on exit from the block. So, check the address descriptor for each live on exit variable. If its own memory location is not listed, generate ST X, R, where R is a register listed in the address descriptor

The management of register and address descriptor is performed as follows:

For a register R, let Desc(R) be is register descriptor. For a program variable x, let Desc(x) be its address descriptor. The management of descriptor for load, store, operation and copy are given below

Load: LD R, x

Desc(R) = x(removing everything else from Desc(R))

Add R to Desc(x) (leaving alone everything else in Desc(x))

Remove R from Desc(w) for all w ≠ x

Store: ST x, R

Add the memory location of x to Desc(x)

Operation: OP Rx, Ry implementing the quas OP x, y

Desc(R_x) = x

Desc(x) = R_x

After operation R_x has R_x OP R_y

Copy: for x = y after processing the load

Add x to Desc(R_y) (note y not x)

Desc(x) = R_y

Minimize the number of registers used:

When a register holds a temporary value and there are no subsequent uses for this value, we reuse that register
When a register holds the value of a program variable and there are no subsequent uses of this value, we reuse that register providing this value also in the memory location for the variable
When a register holds the values of a program variable and all subsequent uses of this value are preceded by a redefinition, we could reuse this register. But to know about all subsequent uses, one may require live/dead-on-exit knowledge

Assume a, b, c and d are program variables and t, u, v are compiler generated temporaries. These are represented as t$1, t$2 and t$3. The code generated for different TACs is given below:

t = a –b

LD R1, a

LD R2, b

SUB R1, R2

U = a – c

LD R3, a

LD R2, c

SUB R3, R2

v = t + u

ADD R1, R3

a = d

LD R2, d

ST a, R2

d = v + u

ADD R1, R3

ST d, R1

Exit

About Data Sciences by Venu

Hi, My name is Venugopala Chary and I'm Currently working as Associate Professor in Reputed Engineerng College, Hyderabad. I have B.Tech and M.tech in regular from JNTU Hyderabad. I have 11 Years of Teaching Experience for both B.Tech and M.Tech Courses.

1 comments:

Alphabin17 February 2025 at 01:14
Just as the 'Generic Code Generation Algorithm in Compilers' optimizes the process of converting high-level instructions into machine code, automation testing streamlines the testing process in software development. It eliminates manual errors, enhances efficiency, and ensures robust software performance. Both play critical roles in achieving precision and quality in their fields.