From http://www.w3.org/html/wg/drafts/html/master/single-page.html#serializing-html-fragments
Escaping a string (for the purposes of the algorithm* above) consists of running the following steps:
- Replace any occurrence of the "&" character by the string "&".
- Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string " ".
- If the algorithm was invoked in the attribute mode, replace any occurrences of the """ character by the string """.
- If the algorithm was not invoked in the attribute mode, replace any occurrences of the "<" character by the string "<", and any occurrences of the ">" character by the string ">".
*Algorithm is the built-in serialization algorithm as called e.g. by the innerHTML
getter.
Strictly speaking, this is not exactly an aswer to your question, since it deals with serialization rather than parsing. But on the other hand, the serialized output is designed to be safely parsable. So, by implication, when writing markup:
- The
&
character should be replaced by&
- Non-breaking spaces should be escaped as
(surprise!...) - Within attributes,
"
should be escaped as"
- Outside of attributes,
<
should be escaped as<
and>
should be escaped as>
I'm intentionaly writing "should", not "must", since parsers may be able to correct violations of the above.