Generic Code Generation Algorithm
Assume that for each operator in the statement, there is a corresponding target language operator
The computed results can be left in registers as long as possible, storing them only if the register is needed for another computation or just before a procedure call, jump or labeled statement
Register and Address Descriptors
These are the primary data structures used by the code generator. They keep track of what values are in each registers, as well as where a given value resides
- Each register has a register descriptor containing the list of variables currently stored in this register. At the start of the basic block, all register descriptors are empty. It keeps track of recent/current variable in each register. It is constructed whenever a new register is needed
- Each variable has an address descriptor containing the list of locations where this variable is currently stored. Possibilities are its memory location and one or more registers
- The memory location might be in the static area, the stack, r presumably the heap. The register descriptors can be computed from the address descriptors. For each name of the block, an address descriptors is maintained that keep track of location where current value of name is found at runtime
There are basically three aspects to be considered in code generation:
Choosing registers
Generating instructions
Managing descriptors
Register allocation is done in a function getReg(Instruction). The instruction generation algorithms uses getReg() and the descriptors
The generation of machine instruction is done as follows:
Given a TAC, OP x, Y(i.e., x=x OP y), generation of machine instructions proceeds as follows:
Step 1: Call getReg(OP, x, y) to get Rx and Ry, the registers to be used for x and y respectively. GetReg merely selects the registers, it does not gurantee that the desired values are present in these registers.
Step 2: Check the register descriptor for Ry. If y is not present in Ry, check the address descriptor for y and issue LD Ry, y
Step 3: Similar treatment is done for Rx
Step 4: Generate the instruction OP Rx, Ry
When the TAC is x = y, step 1 and step 2 are same (getReg() will set Rx =Ry). Step 3 is empty and step 4 is omitted. If ‘y’ was already in a register before the copy instruction, no code is generated at this point. Since the value of ‘y’ is not in its memory location, we may need to store this value back into ‘y’ at block exit.
All variables needed by (dynamically) subsequent blocks (i.e., that live-on-exit) have their current values in their memory locations. Such live variables are identified as follows:
- Temporaries never live beyond a basic block. Hence, they are ignored
- Variables dead on exit are also ignored
- All live on exit variables need to be stored in their memory location on exit from the block. So, check the address descriptor for each live on exit variable. If its own memory location is not listed, generate ST X, R, where R is a register listed in the address descriptor
The management of register and address descriptor is performed as follows:
- For a register R, let Desc(R) be is register descriptor. For a program variable x, let Desc(x) be its address descriptor. The management of descriptor for load, store, operation and copy are given below
Load: LD R, x
Desc(R) = x(removing everything else from Desc(R))
Add R to Desc(x) (leaving alone everything else in Desc(x))
Remove R from Desc(w) for all w ≠ x
Store: ST x, R
Add the memory location of x to Desc(x)
Operation: OP Rx, Ry implementing the quas OP x, y
Desc(Rx) = x
Desc(x) = Rx
After operation Rx has Rx OP Ry
Copy: for x = y after processing the load
Add x to Desc(Ry) (note y not x)
Desc(x) = Ry
Minimize the number of registers used:
- When a register holds a temporary value and there are no subsequent uses for this value, we reuse that register
- When a register holds the value of a program variable and there are no subsequent uses of this value, we reuse that register providing this value also in the memory location for the variable
- When a register holds the values of a program variable and all subsequent uses of this value are preceded by a redefinition, we could reuse this register. But to know about all subsequent uses, one may require live/dead-on-exit knowledge
Assume a, b, c and d are program variables and t, u, v are compiler generated temporaries. These are represented as t$1, t$2 and t$3. The code generated for different TACs is given below:
t = a –b
LD R1, a
LD R2, b
SUB R1, R2
U = a – c
LD R3, a
LD R2, c
SUB R3, R2
v = t + u
ADD R1, R3
a = d
LD R2, d
ST a, R2
d = v + u
ADD R1, R3
ST d, R1
Exit
0 comments:
Post a Comment