Parsing and Grammar


Parsingand Grammar

Parsingand Grammar

Question1:For programming assignment 1, you were required to scan a file andfind the words, whiledisregardinganything within &lt&gt. The following questions relate to thisproblem. You can use some special symbols: &gt and &lt canrepresent the terminals &lt and &gt. You may also use symbols fromJava’s Pattern class, e.g., p{Lower}, p{Upper}, p{Alpha},p{Digit} and p{Punct}.

  1. Give a grammar and a regular expression for the language consisting of words (which consist of just letters) separated by punctuation or whitespace (use the special symbols from Java’s Pattern class for letters, numbers, punctuation and whitespace).


&quotif&nbsp($string1&nbsp=~&nbspm/w/)&nbsp{&nbsp&nbspprint&nbsp&quotThereis one alphanumeric &quot&nbsp&nbspprint&nbsp&quotcharacterin $string1 (A-Z, a-z, _)

  1. Assuming we decided to identify numbers, give a grammar and a regular expression for identifying numbers both integers and floats (e.g., 1.2 365.492).


  1. Assuming we decided to recognize headings in HTML (e.g., &lth1&gt…&lt/h1&gt), give a grammar for recognizing headings h1 through h3 where the heading is “h” followed by a digit such that the beginning and end digits match (your grammar does not need to include nested headings).

Stringpattern = &quot(?i)(&lth1.*?&gt)(.+?)(&lt/h1&gt)&quot

Stringpattern = &quot(?i)(&lth2.*?&gt)(.+?)(&lt/h2&gt)&quot

Stringpattern = &quot(?i)(&lth3.*?&gt)(.+?)(&lt/h3&gt)&quot

Question2: For each of the following languages on Τ={a, b, c}, construct thecorresponding regular expression and regular grammar.

Τ={a,b, c} T=a*b*c*

Allstrings containing at most three b’s.

T=a*| (a*ba*ba*ba*)*