There are two productions. Use two separate functions. (There is no extra cost :-) )
def p_type_list_1(p):
''
'type_list : type'
''
p[0] = [p[1]]
def p_type_list_2(p):
''
'type_list : type_list COMMA type'
''
p[0] = p[1] + [p[3]]
Note: I fixed your grammar to use left-recursion. With bottom-up parsing, left-recursion is almost always what you want, because it avoids unnecessary parser stack usage, and more importantly because it often simplifies actions. In this case, I could have written the second function as:
def p_type_list_2(p):
''
'type_list : type_list COMMA type'
''
p[0] = p[1]
p[0] += [p[3]]
Or "simplify" p_type_list
to (you reduce by 1 line of code, not sure if that's worth it):
def p_type_list(p): '' 'type_list : type | type_list COMMA type '' ' if len(p) == 2: p[0] = [p[1]] else: p[0] = p[1] + [p[3]]
I'm currently building a list of types from a comma separated list (eg. int, float, string):,I'm using YACC for the first time and getting used to using BNF grammar.,The rules work, but I'm getting the sense that my p_type_list logic is a bit of a kludge and could be simplified into a one-liner.,Note: I fixed your grammar to use left-recursion. With bottom-up parsing, left-recursion is almost always what you want, because it avoids unnecessary parser stack usage, and more importantly because it often simplifies actions. In this case, I could have written the second function as:
I'm currently building a list
of type
s from a comma separated list (eg. int
, float
, string
):
def p_type(p): '' 'type : primitive_type | array | generic_type | ID '' ' p[0] = p[1] def p_type_list(p): '' 'type_list : type | type COMMA type_list '' ' if not isinstance(p[0], list): p[0] = list() p[0].append(p[1]) if len(p) == 4: p[0] += p[3]
There are two productions. Use two separate functions. (There is no extra cost :-) )
def p_type_list_1(p):
''
'type_list : type'
''
p[0] = [p[1]]
def p_type_list_2(p):
''
'type_list : type_list COMMA type'
''
p[0] = p[1] + [p[3]]
Note: I fixed your grammar to use left-recursion. With bottom-up parsing, left-recursion is almost always what you want, because it avoids unnecessary parser stack usage, and more importantly because it often simplifies actions. In this case, I could have written the second function as:
def p_type_list_2(p):
''
'type_list : type_list COMMA type'
''
p[0] = p[1]
p[0] += [p[3]]
PLY consists of two separate modules; lex.py and yacc.py, both of which are found in a Python package called ply. The lex.py module is used to break input text into a collection of tokens specified by a collection of regular expression rules. yacc.py is used to recognize language syntax that has been specified in the form of a context free grammar. , The identification of tokens is typically done by writing a series of regular expression rules. The next section shows how this is done using lex.py. , This will produce various sorts of debugging information including all of the added rules, the master regular expressions used by the lexer, and tokens generating during lexing. ,This function uses Python reflection (or introspection) to read the regular expression rules out of the calling context and build the lexer. Once the lexer has been built, two methods can be used to control the lexer.
x = 3 + 42 * (s - t)
'x', '=', '3', '+', '42', '*', '(', 's', '-', 't', ')'
'ID', 'EQUALS', 'NUMBER', 'PLUS', 'NUMBER', 'TIMES',
'LPAREN', 'ID', 'MINUS', 'ID', 'RPAREN'
('ID', 'x'), ('EQUALS', '='), ('NUMBER', '3'), ('PLUS', '+'), ('NUMBER', '42), (' TIMES ',' * '), ('LPAREN', '('), ('ID', 's'), ('MINUS', '-'), ('ID', 't'), ('RPAREN', ')'
#-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # calclex.py # # tokenizer for a simple expression evaluator for # numbers and + , -, *, / #-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- import ply.lex as lex # List of token names.This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r '\+' t_MINUS = r '-' t_TIMES = r '\*' t_DIVIDE = r '/' t_LPAREN = r '\(' t_RPAREN = r '\)' # A regular expression rule with some action code def t_NUMBER(t): r '\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r '\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters(spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print("Illegal character '%s'" % t.value[0]) t.lexer.skip(1) # Build the lexer lexer = lex.lex()
# Test it out data = '' ' 3 + 4 * 10 + -20 * 2 '' ' # Give the lexer some input lexer.input(data) # Tokenize while True: tok = lexer.token() if not tok: break # No more input print(tok)
To build the parser, call the yacc.yacc() function. This function looks at the module and attempts to construct all of the LR parsing tables for the grammar you have specified. The first time yacc.yacc() is invoked, you will get a message such as this: , PLY's error messages and warnings are also produced using the logging interface. This can be controlled by passing a logging object using the errorlog parameter. ,To resolve ambiguity, especially in expression grammars, yacc.py allows individual tokens to be assigned a precedence level and associativity. This is done by adding a variable precedence to the grammar file like this: , If any errors are detected in your grammar specification, yacc.py will produce diagnostic messages and possibly raise an exception. Some of the errors that can be detected include:
x = 3 + 42 * (s - t)
'x', '=', '3', '+', '42', '*', '(', 's', '-', 't', ')'
'ID', 'EQUALS', 'NUMBER', 'PLUS', 'NUMBER', 'TIMES',
'LPAREN', 'ID', 'MINUS', 'ID', 'RPAREN'
('ID', 'x'), ('EQUALS', '='), ('NUMBER', '3'), ('PLUS', '+'), ('NUMBER', '42), (' TIMES ',' * '), ('LPAREN', '('), ('ID', 's'), ('MINUS', '-'), ('ID', 't'), ('RPAREN', ')'
#-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- # calclex.py # # tokenizer for a simple expression evaluator for # numbers and + , -, *, / #-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- import ply.lex as lex # List of token names.This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r '\+' t_MINUS = r '-' t_TIMES = r '\*' t_DIVIDE = r '/' t_LPAREN = r '\(' t_RPAREN = r '\)' # A regular expression rule with some action code def t_NUMBER(t): r '\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r '\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters(spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer lexer = lex.lex()
# Test it out data = '' ' 3 + 4 * 10 + -20 * 2 '' ' # Give the lexer some input lexer.input(data) # Tokenize while True: tok = lexer.token() if not tok: break # No more input print tok