parser Package

parser Package

This package contains the actual wikicode parser, split up into two main modules: the tokenizer and the builder. This module joins them together under one interface.

class mwparserfromhell.parser.Parser(text)[source]

Represents a parser for wikicode.

Actual parsing is a two-step process: first, the text is split up into a series of tokens by the Tokenizer, and then the tokens are converted into trees of Wikicode objects and Nodes by the Builder.

parse()[source]

Return a string as a parsed Wikicode object tree.

builder Module

class mwparserfromhell.parser.builder.Builder[source]

Combines a sequence of tokens into a tree of Wikicode objects.

To use, pass a list of Tokens to the build() method. The list will be exhausted as it is parsed and a Wikicode object will be returned.

_handle_argument()[source]

Handle a case where an argument is at the head of the tokens.

_handle_attribute()[source]

Handle a case where a tag attribute is at the head of the tokens.

_handle_comment()[source]

Handle a case where a hidden comment is at the head of the tokens.

_handle_entity()[source]

Handle a case where an HTML entity is at the head of the tokens.

_handle_heading(token)[source]

Handle a case where a heading is at the head of the tokens.

_handle_parameter(default)[source]

Handle a case where a parameter is at the head of the tokens.

default is the value to use if no parameter name is defined.

_handle_tag(token)[source]

Handle a case where a tag is at the head of the tokens.

_handle_template()[source]

Handle a case where a template is at the head of the tokens.

_handle_token(token)[source]

Handle a single token.

Handle a case where a wikilink is at the head of the tokens.

_pop(wrap=True)[source]

Pop the current node list off of the stack.

If wrap is True, we will call _wrap() on the list.

_push()[source]

Push a new node list onto the stack.

_wrap(nodes)[source]

Properly wrap a list of nodes in a Wikicode object.

_write(item)[source]

Append a node to the current node list.

build(tokenlist)[source]

Build a Wikicode object from a list tokens and return it.

contexts Module

This module contains various “context” definitions, which are essentially flags set during the tokenization process, either on the current parse stack (local contexts) or affecting all stacks (global contexts). They represent the context the tokenizer is in, such as inside a template’s name definition, or inside a level-two heading. This is used to determine what tokens are valid at the current point and also if the current parsing route is invalid.

The tokenizer stores context as an integer, with these definitions bitwise OR’d to set them, AND’d to check if they’re set, and XOR’d to unset them. The advantage of this is that contexts can have sub-contexts (as FOO == 0b11 will cover BAR == 0b10 and BAZ == 0b01).

Local (stack-specific) contexts:

  • TEMPLATE

    • TEMPLATE_NAME
    • TEMPLATE_PARAM_KEY
    • TEMPLATE_PARAM_VALUE
  • ARGUMENT

    • ARGUMENT_NAME
    • ARGUMENT_DEFAULT
  • WIKILINK

    • WIKILINK_TITLE
    • WIKILINK_TEXT
  • HEADING

    • HEADING_LEVEL_1
    • HEADING_LEVEL_2
    • HEADING_LEVEL_3
    • HEADING_LEVEL_4
    • HEADING_LEVEL_5
    • HEADING_LEVEL_6
  • COMMENT

Global contexts:

  • GL_HEADING

tokenizer Module

class mwparserfromhell.parser.tokenizer.Tokenizer[source]

Creates a list of tokens from a string of wikicode.

END = <object object at 0x1c98550>
MARKERS = [u'{', u'}', u'[', u']', u'<', u'>', u'|', u'=', u'&', u'#', u'*', u';', u':', u'/', u'-', u'!', u'\n', <object object at 0x1c98550>]
START = <object object at 0x1c98540>
_context[source]

The current token context.

_fail_route()[source]

Fail the current tokenization route.

Discards the current stack/context/textbuffer and raises BadRoute.

_handle_argument_end()[source]

Handle the end of an argument at the head of the string.

_handle_argument_separator()[source]

Handle the separator between an argument’s name and default.

_handle_heading_end()[source]

Handle the end of a section heading at the head of the string.

_handle_template_end()[source]

Handle the end of a template at the head of the string.

_handle_template_param()[source]

Handle a template parameter at the head of the string.

_handle_template_param_value()[source]

Handle a template parameter’s value at the head of the string.

Handle the end of a wikilink at the head of the string.

Handle the separator between a wikilink’s title and its text.

_parse(context=0)[source]

Parse the wikicode string, using context for when to stop.

_parse_argument()[source]

Parse an argument at the head of the wikicode string.

_parse_comment()[source]

Parse an HTML comment at the head of the wikicode string.

_parse_entity()[source]

Parse an HTML entity at the head of the wikicode string.

_parse_heading()[source]

Parse a section heading at the head of the wikicode string.

_parse_template()[source]

Parse a template at the head of the wikicode string.

_parse_template_or_argument()[source]

Parse a template or argument at the head of the wikicode string.

Parse an internal wikilink at the head of the wikicode string.

_pop(keep_context=False)[source]

Pop the current stack/context/textbuffer, returing the stack.

If *keep_context is True, then we will replace the underlying stack’s context with the current stack’s.

_push(context=0)[source]

Add a new token stack, context, and textbuffer to the list.

_push_textbuffer()[source]

Push the textbuffer onto the stack as a Text node and clear it.

_read(delta=0, wrap=False, strict=False)[source]

Read the value at a relative point in the wikicode.

The value is read from self._head plus the value of delta (which can be negative). If wrap is False, we will not allow attempts to read from the end of the string if self._head + delta is negative. If strict is True, the route will be failed (with _fail_route()) if we try to read from past the end of the string; otherwise, self.END is returned. If we try to read from before the start of the string, self.START is returned.

_really_parse_entity()[source]

Actually parse an HTML entity and ensure that it is valid.

_stack[source]

The current token stack.

_textbuffer[source]

The current textbuffer.

_verify_safe(unsafes)[source]

Verify that there are no unsafe characters in the current stack.

The route will be failed if the name contains any element of unsafes in it (not merely at the beginning or end). This is used when parsing a template name or parameter key, which cannot contain newlines.

_write(token)[source]

Write a token to the end of the current token stack.

_write_all(tokenlist)[source]

Write a series of tokens to the current stack at once.

_write_first(token)[source]

Write a token to the beginning of the current token stack.

_write_text(text)[source]

Write text to the current textbuffer.

_write_text_then_stack(text)[source]

Pop the current stack, write text, and then write the stack.

regex = <_sre.SRE_Pattern object at 0x19a8030>
tokenize(text)[source]

Build a list of tokens from a string of wikicode and return it.

exception mwparserfromhell.parser.tokenizer.BadRoute[source]

Raised internally when the current tokenization route is invalid.

tokens Module

This module contains the token definitions that are used as an intermediate parsing data type - they are stored in a flat list, with each token being identified by its type and optional attributes. The token list is generated in a syntactically valid form by the Tokenizer, and then converted into the :py:class`~.Wikicode` tree by the Builder.

class mwparserfromhell.parser.tokens.Token(**kwargs)[source]

A token stores the semantic meaning of a unit of wikicode.

class mwparserfromhell.parser.tokens.Text(**kwargs)
class mwparserfromhell.parser.tokens.TemplateOpen(**kwargs)
class mwparserfromhell.parser.tokens.TemplateParamSeparator(**kwargs)
class mwparserfromhell.parser.tokens.TemplateParamEquals(**kwargs)
class mwparserfromhell.parser.tokens.TemplateClose(**kwargs)
class mwparserfromhell.parser.tokens.ArgumentOpen(**kwargs)
class mwparserfromhell.parser.tokens.ArgumentSeparator(**kwargs)
class mwparserfromhell.parser.tokens.ArgumentClose(**kwargs)
class mwparserfromhell.parser.tokens.WikilinkOpen(**kwargs)
class mwparserfromhell.parser.tokens.WikilinkSeparator(**kwargs)
class mwparserfromhell.parser.tokens.WikilinkClose(**kwargs)
class mwparserfromhell.parser.tokens.HTMLEntityStart(**kwargs)
class mwparserfromhell.parser.tokens.HTMLEntityNumeric(**kwargs)
class mwparserfromhell.parser.tokens.HTMLEntityHex(**kwargs)
class mwparserfromhell.parser.tokens.HTMLEntityEnd(**kwargs)
class mwparserfromhell.parser.tokens.HeadingStart(**kwargs)
class mwparserfromhell.parser.tokens.HeadingEnd(**kwargs)
class mwparserfromhell.parser.tokens.CommentStart(**kwargs)
class mwparserfromhell.parser.tokens.CommentEnd(**kwargs)
class mwparserfromhell.parser.tokens.TagOpenOpen(**kwargs)
class mwparserfromhell.parser.tokens.TagAttrStart(**kwargs)
class mwparserfromhell.parser.tokens.TagAttrEquals(**kwargs)
class mwparserfromhell.parser.tokens.TagAttrQuote(**kwargs)
class mwparserfromhell.parser.tokens.TagCloseOpen(**kwargs)
class mwparserfromhell.parser.tokens.TagCloseSelfclose(**kwargs)
class mwparserfromhell.parser.tokens.TagOpenClose(**kwargs)
class mwparserfromhell.parser.tokens.TagCloseClose(**kwargs)

Table Of Contents

Previous topic

extras Package

This Page