Disallow the use of unescaped special characters

Rule ID:
no-raw-characters
Category:
HTML Syntax and concepts
Standards:
  • HTML5

Some characters hold special meaning in HTML and must be escaped using character references (html entities) to be used as plain text:

Additionally, unquoted attribute values further restricts the allowed characters:

Quotes attributes must escape only the following characters:

Rule details

Examples of incorrect code for this rule:

<p>Fred & Barney</p>
<p class=foo's></p>
error: Raw "&" must be encoded as "&amp;" (no-raw-characters) at inline:1:9:
> 1 | <p>Fred & Barney</p>
    |         ^
  2 | <p class=foo's></p>


error: Raw "'" must be encoded as "&apos;" (no-raw-characters) at inline:2:13:
  1 | <p>Fred & Barney</p>
> 2 | <p class=foo's></p>
    |             ^


2 errors found.

Examples of correct code for this rule:

<p>Fred &amp; Barney</p>
<p class=foo&apos;s></p>
<p class="'foo'"></p>

Parser

Note that documents using unescaped < might not parse properly due to the strict parsing of HTML-validate. This is intentional.

For instance, in the following case <3 is misinterpreted as a tag <3> followed by a boolean attribute Barney.

<p>Fred <3 Barney</p>
error: failed to tokenize "</p>", expected attribute, ">" or "/>" (parser-error) at inline:1:18:
> 1 | <p>Fred <3 Barney</p>
    |                  ^


1 error found.

Options

This rule takes an optional object:

{
  "relaxed": false
}

relaxed

HTML5 introduces the concept of ambiguous ampersands and relaxes the rules slightly. Using this options ampersands (&) only needs to be escaped if the context is ambiguous (applies to both text nodes and attribute values).

This is disabled by default as explicit encoding is easier for readers than implicitly having to figure out if encoding is needed or not.

Examples of correct code with this option:

<!-- Not ambiguous: & is followed by whitespace -->
<p>Fred & Barney</p>

<!-- Not ambiguous: &Barney is not terminated by ; -->
<p>Fred&Barney</p>

<!-- Not ambiguous: = and " both stops the character reference -->
<a href="?foo&bar=1&baz"></p>

<!-- Not ambiguous: even unquoted & is understood to be stopped by > -->
<a href=?foo&bar></p>