If a non-toplevel macro has the Note how those rules leave non-ASCII Unicode characters untouched, except for:as capitalization and determining if something "is a letter or not" in those cases can be tricky.
title argument is present but no explicit id argument is given, an Element ID is created automatically from the title, by applying the following transformations:- do a
idoutput format conversion on the title to remove for example any HTML tags that would be present in the conversion output - convert all characters to lowercase. This uses JavaScript case conversion. Note that this does convert non-ASCII characters to lowercase, e.g.
Étoé. - if
idnormalizelatinistrue(the default) do Latin normalization. This converts e.g.étoe. - if
idnormalizepunctuationistrue(the default) do Punctuation normalization. This converts e.g.+toplus. - convert consecutive sequences of all non
a-z0-9ASCII characters to a single hyphen-. Note that this leaves non-ASCII characters untouched. - strip leading or trailing hyphens
- capitalization changes wher applicable, e.g.
Étoé
For toplevel headers, see: the ID of the first header is derived from the filename.
So for example, the following automatic IDs would be generated: Table 2. "Examples of automatically generated IDs".
Table 2.
Examples of automatically generated IDs
. | title | id | latin normalization | punctuation normalization | comments |
|---|---|---|---|---|
| My favorite title | my-favorite-title | |||
| Ciro's markdown is awesome | ciro-s-markdown-is-awesome | ' is an ASCII character, but it is not in a-z0-9, therefore it gets converted to a hyphen - | ||
| É你 | e你 | true | The Latin acute accented e, É, is converted to its lower case form é as per the JavaScript case conversion.The Chinese character 你 is left untouched as Chinese characters have no case, and no ASCII analogue. | |
| É你 | é你 | false | Same as the previous, but é is not converted to e since Latin normalization is turned off. | |
| C++ is great | c-plus-plus-is-great | true | This is the effect of Punctuation normalization. | |
| I love dogs. | i-love-dogs | love is extracted from the italic tags <i>love</i> with id output format conversion. | ||
| β Centauri | beta-centauri | Our Latin normalization is amazing and knows Greek! |
For the toplevel header, its ID is derived from the basename of the OurBigBook file without extension instead of from the
title argument.TODO:
- maybe we should also remove some or all non-ASCII punctuation. All can be done with
\\p{IsPunctuation}: stackoverflow.com/questions/13925454/check-if-string-is-a-punctuation-character but we need to check that we really want to remove all of them.