If a non-toplevel macro has the Note how those rules leave non-ASCII Unicode characters untouched, except for:as capitalization and determining if something "is a letter or not" in those cases can be tricky.
title
argument is present but no explicit id
argument is given, an Element ID is created automatically from the title
, by applying the following transformations:- do a
id
output format conversion on the title to remove for example any HTML tags that would be present in the conversion output - convert all characters to lowercase. This uses JavaScript case conversion. Note that this does convert non-ASCII characters to lowercase, e.g.
É
toé
. - if
id
normalize
latin
istrue
(the default) do Latin normalization. This converts e.g.é
toe
. - if
id
normalize
punctuation
istrue
(the default) do Punctuation normalization. This converts e.g.+
toplus
. - convert consecutive sequences of all non
a-z0-9
ASCII characters to a single hyphen-
. Note that this leaves non-ASCII characters untouched. - strip leading or trailing hyphens
- capitalization changes wher applicable, e.g.
É
toé
For toplevel headers, see: the ID of the first header is derived from the filename.
So for example, the following automatic IDs would be generated: Table 2. "Examples of automatically generated IDs".
title | id | latin normalization | punctuation normalization | comments |
---|---|---|---|---|
My favorite title | my-favorite-title | |||
Ciro's markdown is awesome | ciro-s-markdown-is-awesome | ' is an ASCII character, but it is not in a-z0-9 , therefore it gets converted to a hyphen - | ||
É你 | e你 | true | The Latin acute accented e , É , is converted to its lower case form é as per the JavaScript case conversion.The Chinese character 你 is left untouched as Chinese characters have no case, and no ASCII analogue. | |
É你 | é你 | false | Same as the previous, but é is not converted to e since Latin normalization is turned off. | |
C++ is great | c-plus-plus-is-great | true | This is the effect of Punctuation normalization. | |
I love dogs. | i-love-dogs | love is extracted from the italic tags <i>love</i> with id output format conversion. | ||
β Centauri | beta-centauri | Our Latin normalization is amazing and knows Greek! |
For the toplevel header, its ID is derived from the basename of the OurBigBook file without extension instead of from the
title
argument.TODO:
- maybe we should also remove some or all non-ASCII punctuation. All can be done with
\\p{IsPunctuation}
: stackoverflow.com/questions/13925454/check-if-string-is-a-punctuation-character but we need to check that we really want to remove all of them.