Lexical Analyzer in Compiler Design

Lexical Analyzer

Interaction between lexical analyzer and parser,Lexical analyzer, phases of compilation,compiler design jntuh, jntuh compiler design course file

  • The task of first phase of a compiler is to read the input characters of the source code and group them into sequence of characters with a collective meaning is known as token.
  • Lexical Analyzer reads the source program and performs the following tasks
Produce stream of tokens
Ignore white spaces(blank, new line, tab)
Ignore comments if any
Definition of a token:
  • The sequence of characters with a logical meaning is known as token
(or)
  • The smallest individual unit of a program is known as token
Definition of pattern rule:
  • A pattern rule is a description of the form that the lexeme of a token may take
Definition of Lexeme:
  • A lexeme is sequence of characters in the source program that matches the pattern for a token
(or)
  • The actual representation of a token
  • Each lexeme is categorized by its name called a token
  • The general form of a token is <token-name, attribute-value>
  • where token-name is an abstract symbol that is used during next phase(syntax analyzer) of a compiler and attribute-value points to an entry in the symbol table
Example:
DO 5 I = 1.12;
  • The output would be <DO> <number> <id, I> <assign_op> <number> <semicolon>
  • When the lexical analyzer recognizes tokens as identifier (id), it needs to enter into the symbol table along with their attributes
  • Lexical Analyzer is also known as Scanner
Reasons why lexical analyzer is also a scanner
  • Scanners don't require tokenization of the input, such as deletion of comments and white spaces
  • Where Lexical analyzer produces tokens from output of the scanner
Why to separate lexical analyzer and parsing?
  • Simplicity of design
  • Compiler efficiency is improved
  • Compiler portability is enhanced
Specification of a token
  • Specification of tokens can be done by using regular expressions

Identifier
Identifier is collection of alphanumeric characters and identifier beginning character should be necessarily a letter
Rules for being valid identifiers
  • The name of the identifier should not begin with a letter or any special character. For example, 1index, $currency amount_count are invalid identifiers but index1 is valid one
  • There should not be any space in the identifier name. For example, int total amount is invalid identifier
  • The name of the identifier must not be a keyword. For example, int switch is an invalid identifier

Share on Google Plus

About Data Sciences by Venu

Hi, My name is Venugopala Chary and I'm Currently working as Associate Professor in Reputed Engineerng College, Hyderabad. I have B.Tech and M.tech in regular from JNTU Hyderabad. I have 11 Years of Teaching Experience for both B.Tech and M.Tech Courses.
    Blogger Comment
    Facebook Comment

0 comments:

Post a Comment