Automatic ID from title

If a non-toplevel macro has the title argument is present but no explicit id argument is given, an Element ID is created automatically from the title, by applying the following transformations:

do a id output format conversion on the title to remove for example any HTML tags that would be present in the conversion output
convert all characters to lowercase. This uses JavaScript case conversion. Note that this does convert non-ASCII characters to lowercase, e.g. É to é.
if id normalize latin is true (the default) do Latin normalization. This converts e.g. é to e.
if id normalize punctuation is true (the default) do Punctuation normalization. This converts e.g. + to plus.
convert consecutive sequences of all non a-z0-9 ASCII characters to a single hyphen -. Note that this leaves non-ASCII characters untouched.
strip leading or trailing hyphens

Note how those rules leave non-ASCII Unicode characters untouched, except for:

capitalization changes wher applicable, e.g. É to é

as capitalization and determining if something "is a letter or not" in those cases can be tricky.

For toplevel headers, see: the ID of the first header is derived from the filename.

So for example, the following automatic IDs would be generated: Table 2. "Examples of automatically generated IDs".

Table 2.

Examples of automatically generated IDs

title	id	latin normalization	punctuation normalization	comments
My favorite title	my-favorite-title
Ciro's markdown is awesome	ciro-s-markdown-is-awesome			`'` is an ASCII character, but it is not in `a-z0-9`, therefore it gets converted to a hyphen `-`
É你	e你	true		The Latin acute accented `e`, `É`, is converted to its lower case form `é` as per the JavaScript case conversion. Then, due to Latin normalization, `é` is converted to `e`. The Chinese character `你` is left untouched as Chinese characters have no case, and no ASCII analogue.
É你	é你	false		Same as the previous, but `é` is not converted to `e` since Latin normalization is turned off.
C++ is great	c-plus-plus-is-great		true	This is the effect of Punctuation normalization.
I love dogs.	i-love-dogs			`love` is extracted from the italic tags `<i>love</i>` with `id` output format conversion.
β Centauri	beta-centauri			Our Latin normalization is amazing and knows Greek!

For the toplevel header, its ID is derived from the basename of the OurBigBook file without extension instead of from the title argument.

TODO:

maybe we should also remove some or all non-ASCII punctuation. All can be done with \\p{IsPunctuation}: stackoverflow.com/questions/13925454/check-if-string-is-a-punctuation-character but we need to check that we really want to remove all of them.

Automatic ID from title

 Ancestors (6)

 Incoming links (14)