OurBigBook
ASCII normalization is a custom OurBigBook defined normalization that converts many characters that look like Latin characters into Latin characters.
For now, we are using the deburr method of Lodash: lodash.com/docs/4.17.15#deburr, which only affects Latin-like characters.
In addition to deburr we also convert:
  • en-dash and em-dash to simple ASCII dash -. Wikipedia Loves en-dashes in their article titles!
  • greek letters are replaced with their standard latin names, e.g. α to alpha
One notable effect is that it converts variants of ASCII letters to ASCII letters. E.g. é to e removing the accent.
This operation is kind of a superset of Unicode normalization acting only on Latin-like characters, where Unicode basically only removes things like diacritics.
OurBigBook normalization on the other also does other natural transformations that Unicode does not do, e.g. æ to ae as encoded by deburr and further custom replacements.
TODO lodash.deburr:
Bibliography:

Ancestors

  1. id normalize latin
  2. ourbigbook.json id
  3. ourbigbook.json
  4. OurBigBook CLI
  5. OurBigBook Project