Spiga

Lex An Introduction

Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program that reads an input stream, copying it to an output stream and partitioning the input into strings that match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.
The lexical analysis programs written with Lex accept ambiguous specifications and choose the longest match possible at each input point. If necessary, substantial look ahead is performed on the input, but the input stream will be backed up to the end of the current partition, so that the user has general freedom to manipulate it.
Lex is a program generator designed for lexical processing of character input streams. It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings matching the expressions. At the boundaries between strings program sections provided by the user are executed. The Lex source file associates the regular expressions and the program fragments. As each expression appears in the input to the program written by Lex, the corresponding fragment is executed.
Lex is not a complete language, but rather a generator representing a new language feature which can be added to different programming languages, called ``host languages.'' Just as general purpose languages can produce code to run on different computer hardware, Lex can write code in different host languages. The host language is used for the output code generated by Lex and also for the program fragments added by the user. Compatible run-time libraries for the different host languages are also provided. This makes Lex adaptable to different environments and different users. Each application may be directed to the combination of hardware and host language appropriate to the task, the user's background, and the properties of local implementations.
Several tools have been built for constructing lexical analyzers from special purpose notations based on regular expression. Lex is widely used tool to specify lexical analyzers for a variety of languages. We refer to the tool as the Lex compiler and to its input specification as the Lex language.

Lex Regular Expressions
A regular expression specifies a set of strings to be matched. It contains text characters (which match the corresponding characters in the strings being compared) and operator characters (which specify repetitions, choices, and other features). The letters of the alphabet and the digits are always text characters; thus the regular expression integer matches the string integer wherever it appears and the expression A123Ba looks for the string A123Ba.
For a trivial example, consider a program to delete from the input all blanks or tabs at the ends of lines
%%
[ \t]+$ ;
is all that is required. The program contains a %% delimiter to mark the beginning of the rules, and one rule. This rule contains a regular expression which matches one or more instances of the characters blank or tab (written \t for visibility, in accordance with the C language convention) just prior to the end of a line. The brackets indicate the character class made of blank and tab; the + indicates “one or more ...''; and the $ indicates “end of line''. No action is specified, so the program generated by Lex (yylex) will ignore these characters. Everything else will be copied. To change any remaining string of blanks or tabs to a single blank, add another rule:
%%
[ \t]+$ ;
[ \t]+ printf(" ");
The finite automation generated for this source will scan for rules at once, observing at the termination of the string of blanks or tabs whether or not there is a newline character, and executing the desired rule action. The first rule matches all strings of blanks or tabs at the end of lines, and the second rule all remaining strings of blanks or tabs.
In the program written by Lex, the user's fragments (representing the actions to be performed as each regular expression is found) are gathered as cases of a switch. The automaton interpreter directs the control flow. Opportunity is provided for the user to insert either declarations or additional statements in the routine containing the actions, or to add subroutines outside this action routine.
Operators: The operator characters are:
" \ [ ] ^ - ? . * + | ( ) $ / { } % < >”
and if they are to be used as text characters, an escape should be used. The quotation mark operator (") indicates that whatever is contained between a pair of quotes is to be taken as text characters. Thus
xyz"++"
matches the string xyz++ when it appears. Note that a part of a string may be quoted. It is harmless but unnecessary to quote an ordinary text character; the expression
"xyz++"
is the same as the one above. Thus by quoting every non-alphanumeric character being used as a text character, the user can avoid remembering the list.
An operator character may also be turned into a text character by preceding it with \ as in
xyz\+\+
which is another, less readable, equivalent of the above expressions. Another use of the quoting mechanism is to get a blank into an expression; normally, as explained above, blanks or tabs end a rule. Any blank character not contained within [] must be quoted. Several normal C escapes with \ are recognized: \n is new line, \t is tab, and \b is backspace. To enter \ itself, use \\. Since new line is illegal in an expression, \n must be used; it is not required to escape tab and backspace. Every character but blank, tab, new line and the list above is always a text character.

Lex is generally used in the manner depicted in fig. 9.1. First a specification of a lexical analyzer is prepared by creating a program Lex-l in the Lex language. Lex-l is run through the Lex compiler to produce a C program Lex.YY.C. The program Lex.YY.C consists of a tabular representation of a transition diagram constructed from the regular expressions of lex.l, together with a standard routine that uses the table to recognize LEXEMER. The lexical analyses phase reads the characters in the source program and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as an identifier, a keyword (if, while, etc.) a punctuation character or a multi-character operator like : = . The character sequence forming a token is called the lexeme for the token. The actions associated with regular expressions in lex - a are pieces of C code and are carried over directly to lex. YY.C. Finally, lex .YY.C is run through the C compiler to produce an object program a.out.

Lex Source Program
Lex - 1 Lex.YY.C



Lex.YY.C a. out

Fig. 9.1 Creating a Lexical Analyzer with Lex
Lex Source
The general format of Lex source is:
{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional, but the first is required to mark the beginning of the rules. The absolute minimum Lex program is thus
%%
(no definitions, no rules) which translates into a program which copies the input to the output unchanged.
In the outline of Lex programs shown above, the rules represent the user's control decisions; they are a table, in which the left column contains regular expressions and the right column contains actions, program fragments to be executed when the expressions are recognized. Thus an individual rule might appear integer printf ("found keyword INT"); to look for the string integer in the input stream and print the message “found keyword INT'' whenever it appears. In this example the host procedural language is C and the C library function printf is used to print the string. The end of the expression is indicated by the first blank or tab character. If the action is merely a single C expression, it can just be given on the right side of the line; if it is compound, or takes more than a line, it should be enclosed in braces. As a slightly more useful example, suppose it is desired to change a number of words from British to American spelling. Lex rules such as
colour printf("color");
mechanise printf("mechanize");
petrol printf("gas");
would be a start. These rules are not quite enough, since the word petroleum would become gas; a way of dealing with this will be described later.



Lex Regular Expressions
A regular expression specifies a set of strings to be matched. It contains text characters (which match the corresponding characters in the strings being compared) and operator characters (which specify repetitions, choices, and other features). The letters of the alphabet and the digits are always text characters; thus the regular expression integer matches the string integer wherever it appears and the expression A123Ba looks for the string A123Ba.
For a trivial example, consider a program to delete from the input all blanks or tabs at the ends of lines
%%
[ \t]+$ ;
is all that is required. The program contains a %% delimiter to mark the beginning of the rules, and one rule. This rule contains a regular expression which matches one or more instances of the characters blank or tab (written \t for visibility, in accordance with the C language convention) just prior to the end of a line. The brackets indicate the character class made of blank and tab; the + indicates “one or more ...''; and the $ indicates “end of line''. No action is specified, so the program generated by Lex (yylex) will ignore these characters. Everything else will be copied. To change any remaining string of blanks or tabs to a single blank, add another rule:
%%
[ \t]+$ ;
[ \t]+ printf(" ");
The finite automation generated for this source will scan for rules at once, observing at the termination of the string of blanks or tabs whether or not there is a newline character, and executing the desired rule action. The first rule matches all strings of blanks or tabs at the end of lines, and the second rule all remaining strings of blanks or tabs.
In the program written by Lex, the user's fragments (representing the actions to be performed as each regular expression is found) are gathered as cases of a switch. The automaton interpreter directs the control flow. Opportunity is provided for the user to insert either declarations or additional statements in the routine containing the actions, or to add subroutines outside this action routine.

Character Classes
Classes of characters can be specified using the operator pair []. The construction [abc] matches a single character, which may be a, b, or c. Within square brackets, most operator meanings are ignored. Only three characters are special: these
are \ - and ^. The - character indicates ranges. For example,
[a-z0-9<>_]
indicates the character class containing all the lower case letters, the digits, the angle brackets, and underline. Ranges may be given in either order. Using - between any pair of characters which are not both upper case letters, both lower case letters, or both digits is implementation dependent and will get a warning message. (E.g., [0-z] in ASCII is many more characters than it is in EBCDIC). If it is desired to include the character - in a
character class, it should be first or last; thus
[-+0-9]
matches all the digits and the two signs.
In character classes, the ^ operator must appear as the first character after the left bracket; it indicates that the resulting string is to be complemented with respect to the
computer character set. Thus
[^abc]
matches all characters except a, b, or c, including all special or control characters; or
[^a-zA-Z]
is any character which is not a letter. The \ character provides the usual escapes within character class brackets.

FOREX-Dollar, euro fall vs yen as recession worries mount

NEW YORK, Oct 15 (Reuters) - The dollar fell against the yen on Wednesday as a sharp slide in September retail sales left investors fretting the government's $250 billion injection into troubled banks may not keep the economy out of recession.

The euro and high-yielding currencies also slid as world stocks fell, ending a rally seen earlier this week after U.S. and European governments announced sweeping bank rescue plans.

By midday, the dollar and euro were both at least 1 percent lower against the yen, which rises when risk appetite wanes.

Data showing U.S. retail sales posted their biggest monthly decline in more than three years last month 'really highlights the problems we are seeing in the U.S. economy,' said Kathy Lien, director of currency research at GFT Forex in New York

'The question on everyone's minds is how deep of a recession. Today's number indicates a very strong chance of negative growth for the third quarter' and hints at more interest rate cuts from the Federal Reserve, she added.

The Fed cut the key federal funds rate by half a percentage point last week in concert with other major central banks, and Fed Chairman Ben Bernanke was set to speak in New York at 12:15 p.m. (1615 GMT) on the economic outlook and financial markets.

Late morning, the dollar was changing hands at 101.40 yen , down 0.8 percent but off a session low of 100.88. The euro was down 1.2 percent at 137.57 yen and was off 0.4 percent against the dollar at $1.3571. Sterling rose 0.2 percent to $1.7440.

The low-yield Japanese currency rallies when risk appetite fades as investors rush to get out of trades in higher-yielding currencies and assets financed with cheaply borrowed yen.

The greenback has lately also tended to benefit in such an environment against the euro and high-yield currencies as investors seek relative safety in dollar-denominated assets.

The Australian dollar fell 2.4 percent to $0.6829, while the greenback rose 1.6 percent against its Canadian counterpart to C$1.1799 as oil prices fell.

The dollar rose 1.8 percent against the Norwegian crown after Norway cut interest rates by half a percentage point .

RECESSION, RATE CUTS IN FOCUS

Governments around the world in recent days have announced plans to kick-start lending and shock the financial system out of paralysis by injecting billions of dollars directly into banks and guaranteeing many types of bank borrowing.

Traders said fears about the financial crisis receded after short-term interest rates for dollars eased Wednesday, though analysts warn the economic fallout from the crisis is still likely to slow global growth sharply.

The U.S. sales data bolstered that view, as did remarks late Tuesday from San Francisco Fed President Janet Yellen, who said the United States 'appears to be in a recession' and 'virtually every major sector of the economy has been hit by the financial shock.'

The market will soon refocus on economics, 'and when it does, it will likely trade on the notion that global growth is likely to have a very soft 2009,' said Dustin Reid, head of FX strategy at RBS Global Banking & Markets in Chicago.

A U.S. report showing a relatively tame increase in core producer prices, which strip out food and energy, may provide cover for more Fed rate cuts by year end. For details, see

The European Central Bank is also expected to cut rates again after reducing its refinancing rate to 3.75 percent this month. Euro-zone data on Wednesday showed that slowing growth in energy and food prices helped to curb inflation in the region in September.


US Gov’t Injects Capital into Banks

The greenback recovered earlier losses, edging off its session lows against the euro at 1.3768 and sterling at 1.7628. Treasury Secretary Paulson announced today the government would inject $250 billion into US banks in exchange for senior preferred shares. The move effectively translates into a partial nationalization of these banks in an effort to recapitalize the ailing financial system. Although Treasury Secretary Paulson said nationalizing any private US company was objectionable, “the alternative of leaving businesses and consumers without access to financing is totally unacceptable”. Fed Chairman Ben Bernanke chimed in to assure markets of the move to stabilize the financial system, saying “Americans can be confident that every resource is being brought to bear, historical understanding, technical expertise, economic analysis and political leadership”.
Euro Pares Gains

The euro gave back earlier gains versus the yen and dollar as the initial excitement over the Treasury’s move to pump $250 billion into US financial institutions was tempered, with US equity bourses paring back from its initial rally. Although the recent capital injections have quelled fears over the solvency of global banks and may have staved off a meltdown in the financial system, the likelihood for a severe global economic slowdown remains.

Germany’s ZEW sentiment survey reflected the rapid deterioration in confidence at the height of the financial crisis, as the economic sentiment collapsed in September to -63.0 – far exceeding estimates for a decline to -51.1 from -41.1 in August. The current conditions indicator plunged to -35.9 from -1.0 a month earlier. The ZEW said that perspectives for economic development have significantly deteriorated due to the financial crisis but a separate analysis following the bank rescue package reveals a less pronounced decline in expectations.

In the coming session, Eurozone reports include Germany September CPI, HICP and Eurozone September inflation figures.

EURUSD holds steady near 1.3670, with support seen at 1.3640, followed by 1.36 and 1.3550. Subsequent floors are eyed at 1.3520, backed by 1.35 and 1.3470. On the upside, gains will encounter ceilings at 1.37, followed by 1.3740 and 1.3770. Additional ceilings will emerge at 1.38, followed by 1.3840 and 1.3870.