|
Luddite is the language Komodo uses for user-defined syntax
highlighting and multi-language lexing. An introduction to the
Luddite language can be found in the User-Defined Language Support
documentation.
The following is a list of Luddite keywords - not to be
confused with the keywords described by
the UDL file itself for the target language definition. The
Luddite keyword is always given followed by its argument list,
where applicable; optional arguments are given in square brackets
([ ... ]).
- accept
name-or-string-list | all
- Used in token_check definitions to disambiguate
characters. If
all is specified, then a
token_check test will always succeed. If the
previous token falls in the name-or-string list, the
token_check test succeeds. Otherwise Luddite will
look at the results of any skip and reject statements in the current family to decide
which action to take.
- all
- Used in accept,
reject, and skip statements in token_check definitions.
This means that the
token_check test will follow
whatever directive is associated with this occurrence of
all.
- at_eol(
state-name)
- Used to specify the Luddite state to transition to when the
end of the current line is reached. This is useful for
processing languages like the Perl-based template language,
Mason, in which one can embed Perl code on a single line by
putting a '%' character at the start of the line, as in this
example:
-
Good
% if ($time > 12) {
afternoon
% } else {
morning
% }
- This is easily handled with this Luddite code:
-
state IN_M_DEFAULT:
/^%/ : paint(upto, M_DEFAULT), paint(include, TPL_OPERATOR), \
at_eol(IN_M_DEFAULT), => IN_SSL_DEFAULT
- If there's a percent character at the start of a line in
markup mode, cause it to be lexed as a template
operator, and switch to Perl lexing (IN_SSL_DEFAULT),
and switch back to markup-based lexing when it reaches the end
of the line.
- clear_delimiter
- Used in conjunction with keep_delimiter, this action
explicitly clears the current delimiter (useful in recognizing
terminating identifiers in here documents).
- delimiter
- Used in place of a pattern or match-string in transitions,
this will succeed if there's a current delimiter defined, and
it can be matched at the current point in the buffer. When it's
matched, the current delimiter will be unset, unless the
keep_delimiter
action is given in the transition.
- family
csl | markup | ssl | style | tpl
-
The sub-type of this sub-language. The
meaning of each family name is:
- csl
- client-side language (e.g. usually JavaScript)
- markup
- markup language (e.g. HTML, XHTML, other XML
dialects)
- ssl
- server-side language (e.g. Perl, Python, Ruby)
- style
- stylesheet language (usually CSS)
- tpl
- a "template-oriented" language, usually geared to
either adding template-side processing to the server-side
language, or making it easier to inject the server-side
language into other parts of the full multi-language
document, as in CSS or JS. Examples include Smarty for PHP
processing.
- fold
"name-or-string" style
+|-
- Used to describe which tokens start and end a code-folding
block. For example:
fold "{" CSL_OPERATOR +
fold "if" SSL_WORD +
fold "end" SSL_WORD -
- keep_delimiter
- Used when the
delimiter directive was used in
place of a pattern or string, this tells the UDL lexer to
retain that directive for subsequent matching. This is useful
for matching a construct like Perl's s,/,\\,
construct, where we would be using ',' as the regex delimiter,
instead of the normal '/'.
- keywords
name-or-string-list
- List the words that should be colored as keywords (as
opposed to identifiers) in the defined language. Keywords for
the target language that are the same as reserved words in
Luddite must be quoted.
keywords ["en-US\luddite.html", "193", BEGIN, class ensure nil ]
- keyword_style
style-name =>
style-name
- This directive works with keywords to
tell UDL which identifier-type style needs to be promoted to a
keyword style. So all tokens listed in keywords that
are colored as the first style normally will be promoted to the
second style.
keyword_style SSL_IDENTIFIER => SSL_WORD
- include
[path/]file
- Include a UDL file. The include path starts with the
current directory; if you include a file in a different
directory, that file's directory is appended to the include
path.
include "html2ruby.udl"
- include
- When used as an argument to the paint keyword, include a style.
- initial
state-name
- The state processing for the current family should start
at.
initial IN_SSL_DEFAULT
- language
name
- The name of the language being defined. The language
directive must be given. It may be given more than once, but
only with the same value is given each time. This value shows
up in the Komodo UI in places like Language Preferences and the
"New File" command.
language RHTML
- namespace XML Namespace identifier
- The namespace declaration tells Komodo to load the language
service described by the current Luddite program when it sees
an XML file that uses the specified namespace as a default
attribute value.
-
namespace "http://www.w3.org/2001/XMLSchema"
namespace "http://www.w3.org/1999/XMLSchema"
# Be prepared if a new version is released in a few centuries.
namespace "http://www.w3.org/2525/XMLSchema"
- no_keyword
- The no_keyword command is used to prevent identifier =>
keyword promotion when an identifier is recognized. This is
useful for programming languages that allow any token, even
keywords, to be used in certain contexts, such as after the "."
in Python, Ruby, and VBScript, or after "::" or "->" in
Perl.
-
state IN_SSL_DEFAULT:
/\.(?=[a-z])/ : paint(upto, SSL_DEFAULT), \
paint(include, SSL_OPERATOR), => IN_SSL_NON_KEYWORD_IDENTIFIER_1
state IN_SSL_NON_KEYWORD_IDENTIFIER_1:
/[^a-zA-Z0-9_]/ : paint(upto, SSL_IDENTIFIER), redo, no_keyword, \
=> IN_SSL_DEFAULT
- paint(
include|upto, style-name)
- These directives are used in state transitions, and are key
to the way UDL generates lexers. They appear in a context like
this:
-
state IN_SSL_COMMENT_1 :
/$/ : paint(include, SSL_COMMENT) => IN_SSL_DEFAULT
- This means that when we're in a state with name
"IN_SSL_COMMENT_1" (remember that state names are completely
arbitrary in your Luddite programs), and we find the end of the
line, we should paint everything from the last "paint point" to
the end of the line with the style "SCE_UDL_SSL_COMMENT". This
will also move the paint-point to the point after the last
character this command styles.
- A transition can have a paint(include...) command,
a paint(upto...) command, both of them, or neither.
- pattern
name = string
- These are used as macro substitution inside patterns used
in state tables. The names should contain only letters, and are
referenced in patterns and the right-hand side of Luddite
patterns by putting a "$" before them.
-
pattern NMSTART = '_\w\x80-\xff' # used inside character sets only
pattern NMCHAR = '$NMSTART\d' # used inside character sets only
...
state IN_SSL_SYMBOL_1:
/[$NMCHAR]+/ : paint(include, SSL_STRING), => IN_SSL_DEFAULT
- public_id DOCTYPE public identifier
- The public_id declaration tells Komodo to load the language
service described by the current Luddite program when it sees
an XML or SGML file that uses the specified public declaration
in its Doctype declaration.
-
public_id "-//MOZILLA//DTD XBL V1.0//EN"
publicid "-//MOZILLA//DTD XBL V1.1//EN" # Be prepared
# "publicid" is a synonym for "public_id"
- redo
- The redo command indicates that the lexer should not
advance past the current characters being matched, but should
stay there in the new state.
-
state IN_SSL_NUMBER_3:
...
/[^\d]/ : paint(upto, SSL_NUMBER), redo, => IN_SSL_DEFAULT
This specifies that when we're processing numbers
(assuming the arbitrary name "IN_SSL_NUMBER_3" reflects the
idea that we're trying to recognize numeric terms in the
input), and find a non-digit, we should paint everything up
to but not including the start of the matched text with the
SCE_UDL_SSL_NUMBER style, change to the default state for
this family, and retry lexing the same character.
Since having both a redo action and a
paint(include...) action in the same transition
would be contradictory, the Luddite compiler gives an error
message when it detects this condition and stops
processing.
- reject
name-or-string-list | all
- Used in token_check definitions to disambiguate
characters. If all is specified, then a token_check test will always fail. If
the previous token falls in the name-or-string list, the token_check test fails.
Otherwise Luddite will look at the results of any reject and skip statements in the current family to
decide which action to take.
- set_delimiter
(number)
- This transition action takes a single argument, which must
be a reference to a grouped sub-pattern in the transition's
pattern. It sets the current delimiter to that contents of the
subpattern.
-
/m([^\w\d])/ : paint(upto, SSL_DEFAULT), set_delimiter(1), \
=> IN_SSL_REGEX1_TARGET
- set_opposite_delimiter
(number)
- This transition action takes a single argument, which must
be a reference to a grouped sub-pattern in the transition's
pattern. It sets the current delimiter to that contents of the
subpattern.
-
/m([\{\[\(\<])/ : paint(upto, SSL_DEFAULT), \
set_opposite_delimiter(1), => IN_SSL_REGEX1_TARGET
- skip
name-or-string-list | all
- Used in token_check definitions to disambiguate
characters. If all is specified, then a token_check test will retry with the
token preceding this one. If the previous token falls in the
name-or-string
list, the token_check test keeps going. Otherwise
Luddite will look at the results of any accept and reject statements in the current family to decide
which action to take.
- state
state-name:
transition-list
- The core of Luddite programs. Each state block is in effect
upto the next Luddite keyword.
- sublanguage
name
- The common name of the language this family targets.
-
sublanguage ruby
- sub_language
- Synonym for sublanguage
- system_id DOCTYPE system identifier
- The system_id declaration tells Komodo to load the language
service described by the current Luddite program when it sees
an XML or SGML file that uses the specified system declaration
in its Doctype declaration.
-
system_id "http://www.w3.org/2001/XMLSchema"
systemid "http://www.w3.org/1999/XMLSchema"
# "systemid" is a synonym for "system_id"
- token_check:
token-recognition-list
-
A set of directives that define the token
checking set for this family.
Used in state transitions immediately after a
pattern or string (match target) is given. This is used for
token disambiguation. For example in Ruby, Perl, and
JavaScript, a '/' can either be a division operator or the
start of a regular expression. By looking at how the previous
token was styled, we can determine how to interpret this
character.
If the token_check test fails, the pattern is
deemed to fail, and UDL tries subsequent transitions in the
state block.
- start_style
style-name
- The lowest style in the list of styles Luddite will use for
this family (in terms of the values given in Scintilla.iface).
This is used for token_check processing while lexing, so it's
not required otherwise.
- end_style
- The highest style in the list of styles Luddite will use
for this family (in terms of the values given in
Scintilla.iface). This is used for token_check processing while lexing, so it's
not required otherwise.
- upto
- See paint
- spop_check
- See spush_check
- spush_check(
state-name)
- Used to specify the Luddite state to transition to at a
later point.
-
state IN_SSL_DSTRING:
'#{' : paint(include, SSL_STRING), spush_check(IN_SSL_DSTRING), \
=> IN_SSL_DEFAULT
...
state IN_SSL_OP1:
'{' : paint(include, SSL_OPERATOR), spush_check(IN_SSL_DEFAULT) \
=> IN_SSL_DEFAULT
...
state IN_SSL_DEFAULT:
'}' : paint(upto, SSL_DEFAULT), paint(include, SSL_OPERATOR), \
spop_check, => IN_SSL_DEFAULT
- This code is used to account for the way the Ruby lexer is
intended to color expressions inside #{...} brackets in strings
as if they weren't in strings. When the "}" is reached, the
lexer needs to know whether it should go back to lexing a
string, or to the default state. We use the spush_check and
spop_check directives to specify this.
The style names are all listed in Scintilla.iface, in the section for
UDL.
The ones for client-side, server-side, style, and template
languages are generic, and shouldn't need much explanation
here. This section will focus on the markup section in
specifics.
First, we are limited to about 56 styles, as UDL partitions
the style byte associated with each character by 6 bits for
styles, 1 bit for error squigging, and 1 bit for warnings.
Additionally, scintilla reserves styles 32 through 39 for its
own use, leaving us with 56.
This reduces the number of available styles for each family
down to the basics. This includes the default style (which
should only be used for white-space in most cases), comments,
numbers, strings, keywords, identifiers, operators, and regular
expressions. We add a style for variable names for the
server-side, and leave comment-blocks in as well, although
their usefulness is in question.
Some of the styles in the markup block reflect terms used in
ISO 8859, the SGML standard. These are:
- SCE_UDL_M_STAGO
- Start-tag-open character, as in "<"
- SCE_UDL_M_TAGNAME
- The name immediately after a tag-open character.
- SCE_UDL_M_TAGSPACE
- White-space in start-tags, end-tags, and empty tags.
- SCE_UDL_M_ATTRNAME
- Attribute names in start-tags.
- SCE_UDL_M_OPERATOR
- Operator characters, which is essentially just the '=' in
attribute assignments in start-tags.
- SCE_UDL_M_STAGC
- Start-tag-close character (">")
- SCE_UDL_M_EMP_TAGC
- Tag-close sequence for an empty start-tag ("/>")
- SCE_UDL_M_STRING
- String attribute values in start-tags
- SCE_UDL_M_ETAGO
- End-tag-open character ("</")
- SCE_UDL_M_ETAGC
- End-tag-close character (">"). Distinguished from an
"STAGC" only by context.
- SCE_UDL_M_ENTITY
- A complete entity reference, from the leading "&"
through the closing ";".
- SCE_UDL_M_PI
- Processing instructions, including the leading XML
declaration, from the leading "<?" through the closing
"?>".
- SCE_UDL_M_CDATA
- CDATA marked sections.
- SCE_UDL_M_COMMENT
- SGML comments.
- name-or-string
- Either a quoted string, delimited by single- or
double-quote characters, with -escape sequences, or a
sequence of characters starting with a letter, followed by
zero or more alphanumeric characters, hyphens, or
underscores.
- name-or-string-list
- List of strings or non-keyword-names surrounded by square
brackets. Commas are optional.
- state-name
- A symbolic name for a UDL state. These names are are
completely arbitrary, but we recommend the convention that
they start with "IN_", then contain a sequence that reflects
the family they belong to ("SSL", "CSL", "CSS", "TPL", or "M"
for "Markup"), and finally a descriptive part.
- Example:
- IN_TPL_BLOCK_COMMENT_1
- style-name
- A symbolic name that reflects the style names used in .h
files derived from the Scintilla.iface file. These names all
start with the prefix "SCE_UDL_", which may always be
omitted.
- Example:
- SSL_STRING
- token-recognition-list
- A line-oriented list of directives containing a
style-name, a colon, and then an accept, reject, or skip directive.
- transition-list
- Zero or more transition, separated by
newlines
- transition
- Consists of a string or pattern to be matched, an
optional token_check, ":" (optional), an optional
list of actions (comma-separated), and optionally, an "=>"
followed by a style-name. Actions include paint, redo, spush_check, and spop_check.
- Order of token_check statements
-
The statements in a token_check directive are followed in
order, and are family-specific. The default action, if no
term is chosen, depend on what kinds of statements are
specified:
- Only reject list: accept all others
- Only accept list: reject everything else
- Only skip list: reject everything else, but
- having only a set of skip items is redundant, as
everything will be
- rejected anyway.
- If a style has two of the lists, the missing one is
the default
- If a style has all three lists, the default action is
'reject'.
- Order of state transitions
-
UDL walks the list of transitions for the current state.
As soon as it finds one, it does whatever painting is
called for, switches to the specified state, and then
restarts at the next state.
The target state is optional. A transition that contains
both a "paint(include...)" and a "redo" directive would
loop indefinitely, so UDL contains a limit of 1000 before
it bypasses the current character and continue with the
next one.
|