words: 3k articles: 63
One day we'll enable it, e.g. with Cloudfare:
During --web upload.
The exact name of that article is a somewhat difficult question, if we go non-magical then:
My deleted articles
will do. But we could go more magical:
or polluting a bit without _:
Perhaps such a namespace would be useful for stuff like my/contact, my/body, my/hardware and so on.
Having the same effect as whatever we decide to make Section 1.2. "Move articles deleted locally to under a trash article on web" do.
We are better than HTML, we have arguments! This is just a style matter, HTML was wrong to add it to content model.
= path/to/myfile.txt
whould generate a breadcrumb like:
where {split} is a possibly new argument that ensures it links to split if there are split pages, and not the current:
This would make file autogen much more useful and visible. The general premise is that we should link to split {file} preferentially always.
Pre-requisite: Allow linking to auto-generated files
Like big files, only show on split pages.
Once this is done, we can entirely replace the custom directory listing generated in the ourbigbook executable by it, which will be the exact same code path as {file} generation.
There is an outstanding nested set index corruption going on which hasn't been identified yet. Running on Heroku:
blew up with:
        throw new ValidationError(`the parent choice "${newParentId}" would create an infinite loop`)

    at /app/web/convert.js:459:15
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /app/web/node_modules/sequelize/dist/lib/sequelize.js:463:24
    at async Object.convertArticle (/app/web/convert.js:176:3)
    at async /app/web/models/article.js:844:9
    at async /app/web/node_modules/sequelize/dist/lib/sequelize.js:463:24
    at async Article.rerender (/app/web/models/article.js:842:5)
    at async Article.rerender (/app/web/models/article.js:1615:9)
    at async /app/web/bin/rerender-articles.js:19:1 {
  info: undefined,
  errors: 'the parent choice "@cirosantilli/conceptual-model" would create an infinite loop',
  status: 422
and the DB check:
heroku run web/bin/normalize -c nested-set -u cirosantilli
failed with:
AssertionError [ERR_ASSERTION]: nested-set: (slug, nestedSetIndex, nestedSetNextSibling, depth): actual: (cirosantilli/natural-science, 419, 3414, 2) !== expected: (@cirosantilli/natural-science, 419, 3411, 2)
    at Object.normalize (/app/web/models/index.js:400:20)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /app/web/bin/normalize:28:3 {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: 3414,
  expected: 3411,
  operator: 'strictEqual'
The local source corresponding to that was:
= Conceptual model
{parent=Scientific method}

= Model

= Simulation
{parent=Conceptual model}
and it hadn't changed in a long time according to git log, also:
= Natural science


= Thermo Electron
{parent=Thermo Fisher Scientific}

= Natural science YouTube channel
{parent=Natural science}

= The Thought Emporium
{parent=Natural science YouTube channel}



= Scientific method
are three consecutive siblings.
Some related database lines via:
bin/psql -A -F' ' <<EOF >db.tmp
select "nestedSetIndex","nestedSetNextSibling",slug from "Article" where slug like 'cirosantilli/%' order by "nestedSetIndex"
  419 | 3414 | cirosantilli/natural-science

 3411 | 3412 | cirosantilli/thermo-electron
 3412 | 3414 | cirosantilli/natural-science-youtube-channel
 3413 | 3414 | cirosantilli/the-thought-emporium
 3414 | 3553 | cirosantilli/linguistics

 3551 | 3553 | cirosantilli/chinese-slang
 3552 | 3553 | cirosantilli/shabi
 3553 | 3864 | cirosantilli/scientific-method
Humm, that index looks correct, what's going on?
I hack:

@@ -392,6 +392,7 @@ async function normalize({
         const articles = await Article.treeFindInOrder({ username, transaction })
         if (check) {
           const nestedSetsFromRefs = await Article.getNestedSetsFromRefs(username, { transaction })
+          nestedSetsFromRefs.map(e => console.error(`${e.nestedSetIndex} ${e.nestedSetNextSibling} ${e.id}`))
and see:
3550 3864 @cirosantilli/scientific-method
There's an offset of 3 somewhere!
OK the first glaring error in the DB is social science right in the middle of physics things:
1497 1498 cirosantilli/physx
1500 1801 cirosantilli/social-science
1501 1503 cirosantilli/3d-ridig-body-dynamics-benchmark
1502 1503 cirosantilli/simbenchmark
Also ourbigbook.com/cirosantilli/social-science gave 500.
Possibly related:
1501 1503 cirosantilli/3d-ridig-body-dynamics-benchmark
was a recent change, and part of this complex source code move that can be simplified to:
--- a/science.bigb
+++ b/science.bigb
@@ -345,40 +345,7 @@ https://www.youtube.com/watch?v=H_H_TF5Kxks This Lab is RIDICULOUS (2021) gives
-= 3D physics engine
-{parent=Physics engine}
-= 3D physics engine benchmark
-{parent=3D physics engine}
@@ -512,3 +512,117 @@ This idealization does not seems to be possible at all in the context of <Maxwel
 = Rigid body
 {parent=Point particle}
+= Rigid body dynamics
+{parent=Rigid body}
+= 3D rigid body dynamics
+{parent=Rigid body dynamics}
+= 3D rigid body dynamics simulator
+{parent=3D rigid body dynamics}
+= 3D physics engine
+= PhysX
+{parent=3D rigid body dynamics simulator}
+{tag=C++ library}
+= 3D ridig body dynamics benchmark
+{parent=3D rigid body dynamics}
+= 3D physics engine benchmark
+= SimBenchmark
+{parent=3D ridig body dynamics benchmark}
so it contains two simultaneous renames, before:
= 3D physics engine
  = 3D physics engine benchmark
= 3D rigid body dynamics
  = 3D rigid body dynamics simulator (3D physics engine)
    = PhysX
  = 3D ridig body dynamics benchmark (3D physics engine benchmark)
    = SimBenchmark
Gotta try to make a minimal test reproduction for this mess.
cirosantilli.com/brazilian-music-split doesn't have nosplit
On ourbigbook.com like and unlike takes 4s-10s! Something is wrong. It must be because of the complex side effects like topic updating? Maybe those should be deferred? This only appears noticeable on a larger database.
Both web and CLI.
Both CLI and web. E.g.:
= Index

= Notindex

= Notindex2
ourbigbook .
the output notindex.html does not have an incoming links metadata section. With <notindex> it does have a metadata section. The outcome metadata section should be identical on both.
Same for tags that use the synonym.
Should only blow up if the \\x does not have a content explicitly set. See broken test cross reference from image title to previous non-header with content is allowed.
To fix we need to store some extra data on the Ref or Id table that determines if the reference needs the title or not to determine its own ID.
Maybe there is a valid use case for rows with different number of columns. But likely by default we should error unless the use explicitly allows this.
When hydrating JSON asts from server we just need to do that extra join.
Edit: done for CLI. On web, showing just IDs to user to start with. Attempted to render on fly but failing for now. That's the only missing thing.
Both web and CLI.
This is way more user friendly. We have currently done that for parent= which would be:
= Calculus
but now is just set on the Web UI.
But we should likely do this for every other metadata, e.g.:
  • synonym
  • id
  • scope
  • title2
  • c
and so on.
The underlying reason is that:
.getArticles({includeParentAndPreviousSibling: true
is broken. The singular version getArticle however is not.
We've started noticing this as we went along and became more familiar with proper database design:
  • Ref.from_id and to_id should point to Id
  • File should be removed when deleted: github.com/ourbigbook/ourbigbook/issues/216 Currently this can only happen locally. Edit: will also start happening on upstream with synonym moves.
  • toplevel_id
    • File.toplevel_id should point to an Id object via primary key. Currently done via idid text.
    • Id.toplevel_id should point to an Id object. No links at all apparently.
  • Article.topicId should point to Topic.id, not be TEXT
We could then consider removing several Ref.destroy and Id.destroy ON CASCADE with File and Id, rather than manually.
Would be cool, easily allowing full website download for offline viewing! One day.
= Index

Then the first:
ourbigbook .
fails as desired before any rendering takes place:
extract_ids: README.bigb
extract_ids: README.bigb finished in 45.4546590000391 ms
README.bigb:3:2: cross reference to unknown id: "asdf"
Then the second:
ourbigbook .
fails differently, incorrectly trying to render but failing:
extract_ids: README.bigb
extract_ids: README.bigb skipped by timestamp
render: README.bigb
error: README.bigb:3:2: cross reference to unknown id: "asdf" at render time
copy README.bigb -> out/html/_raw/README.bigb
render: README.bigb -> out/html/index.html finished in 51.55012200027704 ms
with an error at render time.
This is especially noticeable/confusing when you are converting a large number of files, and the second run will start converting a large number of files instead of failing early, until it eventually reaches the error when rendering the specific file.
The key code point is:
async function check_db(sequelize, paths_converted, { transaction }) {
We are only checking the DB for the paths converted, but then due to parse skipping we skip the paths and don't check them anymore.
Instead, we should check the entire database.
The question then is: is there a way to do this efficiently with a query, without bringing the entire Refs database into memory, notably conisdering inflections?
And also to full links, at least on ToC.
= my/file.txt

= Asdf
Same for tags.
Currently there is some confusion in the code on treating the <>{file} like the file in = Header{file}: one if about pointing to things, the other is about the current thing. We will disambiguate with parentFile.
Same for tag and tagFile.
On Firefox 109, tab lists such as those in the home page don't wrap if the screen is narrow.
This is due to:
.tab-item: { white-space: pre }
but does not make much sense, as it should only take effect inside .tab-item, not on the .tab-list itself, feels like a firefox bug.
We want the white-space: pre so that tab entries won't be broken up across lines.
Works fine in Chromium 109.
TODO can't reproduce on a minimal HTML page, so anoying!!!
<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>Min sane</title>
span {
  white-space: pre;
  <span>My item 1</span>
  <span>My item 2</span>
  <span>My item 3</span>
  <span>My item 4</span>
  <span>My item 5</span>
  <span>My item 6</span>
  <span>My item 7</span>
  <span>My item 8</span>
  <span>My item 9</span>
  <span>My item 10</span>
  <span>My item 11</span>
  <span>My item 12</span>
  <span>My item 13</span>
  <span>My item 14</span>
  <span>My item 15</span>
  <span>My item 16</span>
  <span>My item 17</span>
  <span>My item 18</span>
  <span>My item 19</span>
This is one of those things that require a smart algorithm otherwise it will be quickly useless.
Currently on web:
  • <@user-name> produces a working link, but with bad title "index"
  • <#some-topic> fails
What we want to work is either of:
  • @user-name
  • #some-topic
  • <#some topic>
Perfect topic rendering can be a bit trick because it might require fetching actual topic from DB to see its preferred title.
At mentions ideally bring the side-effect of notifications, but then we have to think about spam a bit too.
The constructs from at mention and topics on web should also just work locally, and redirect to ourbigbook.com by default.
Once they work, document them with something like:
= `\x` `href` argument
{parent=`\x` sargument}

If the `href` argument starts with certain prefixes, magic links are generated:
* `@`: link to <OurBigBook.com> user profiles, e.g.:
  I love \a[@cirosantilli], he is great!
  links to: https://ourbigbook.com/cirosantilli

  TODO make it work without the `\a`, just: `@cirosantilli`.
* `#`: link to <OurBigBook.com> <OurBigBook Web topics>[topics]:
  \a[#quantum-mechanics][Quantum mechanics] is very difficult to understand.
  links to: https://ourbigbook.com/go/topic/quantum-mechanics
It is not perfectly elegant to use <> for this, especially locally, since it means linking to IDs that don't exist (on Web, @username is an actualy regular ID on the DB. But #topic isn't). But perhaps just having the <> links to non-files is just the way to go.
This is a bit hard to to properly as it requires checking that a billion dependant objects are also deleted:
  • issues
  • comments of those issues
  • file
  • IDs defined in that article
  • change the parentId of all chidren to the parent article of the deleted article, and also updated nested set index
Some of those can go on cascades, but others will require side-effects.
I.e. save the output of katex.renderToString to JSON or some other format. This approach would ensure minimal load times no matter what KaTeX is doing, and possibly provide some good portability.
Off the bat JSON.stringify doesn't work due to circular references though that can be overcome: stackoverflow.com/questions/10392293/stringify-convert-to-json-a-javascript-object-with-circular-reference
Web upload breaks with duplicate ID if you rename a header and synonym the old one.
Working on static.
ourbigbook --web sometimes randomly times out on ourbigbook.com. First an ID extraction or render hangs, and then after a few seconds things blow up Usually happens around the thousands of articles uploaded.
I've seen it happen once or twice locally as well.
There are no server exceptions on heroku logs. I simply can't understand why it happens.
Once the error was ETIMEDOUT, but most times it was ECONNRESET.
Next time it happens I'm just going to add a timeout plus retry mechanism as it is rare enough that it shouldn't matter, and the problem does seem to go away if I try to continue the upload immediately afterwards: given the SHA2-based skips, restarting from the CLI we just start exactly where we had left off, so hopefully will also work from Js.
= Asdf

== Qwer
ourbigbook --split-headers README.bigb
leads to out/html/split.html that contains the Qwer header, and no qwer.html output.
This construct should just be forbidden by linting instead forcing the preferred:
= Asdf

== Qwer

Similar problem with preceeding paragraph:
= Asdf

== Qwer
The root failure case in both cases is that the header goes inside the paragraph.
Hmm, perhaps that is not a bad behaviour... OK so going back a bit further, the problem is the outcome of:
ourbigbook --web .
on such cases, which leads to errors.
This is one step beyond skip re-render from API if article was unchanged as it removes the requirement of actually uploading thousands of lines of content.
It requires negotiating with the server instead.
This would be particularly powerful if we included the descendants on the SHA of each parent, much like Git. This way we could skip enter unmodified subtrees, likely like Git.
Yes, we are somewhat re-implementing parts of Git with this. But at least it is simple, and works at a sub-blob level given our grater specialization to our specific use case.
OK, I need that, let's go. __ like Asciidoctor?

In quote

Another paragraph.

Maybe also just add inline (non-block) quotes now?
We could also consider an indent based method:
> In quote

  Another paragraph.

The cool thing about that is that it would save the sweet sweet one liners:

> In quote

but meh, too much indentation typing I think.
Prototype implemented on branch insane-quote with just the single underscore _ version to make it fully symmetric with code/math, which is easier to implemetn. Just by running the tests we saw some common conflicts with the single _ due to it appearing in some local file paths pieces of URLs, e.g.:
= My topic
= Notindex


== path/to/my_file.jpg
Some ideas:
  • generalize things a bit so that _ does not exist. Inline quotes go just with the usual "" ascii art
  • make some arguments literal by default to cover those common cases. Makes language a bit more insane, but perhaps it is for the best, we don't want HTML expanding in anything that won't end up in the HTML right. That makes the possible furture case of defining variables a bit harder. But we could overcome it by just making literals be non literals then when literal is the default, e.g.:
    = My topic
    = My topic
Maybe we should only do this after: github.com/ourbigbook/ourbigbook/issues/248 to prevent data loss.
One possibility is to prevent deletion/renaming of headers. We could just check the new ID list agains the previous ID list.
This was possible previously on Web, but we forbade it for simplicity of implementation sake.
We can then think about how the UI would look like, there might be a "Edit article and descendants" button on toplevel only for example.
Otherwise the difference in ToC line entry spacing is very unnerving.
Self headers done. ToC missing.
Article create and update slow on web was an extreme case of slowness, but it taught us that we do want some kind of immediate feedback as soon as users click a form submission, and one feedback blocks further action such as typing.
E.g. ourbigbook.com/cirosantilli/mathematics#cirosantilli/physics should redirect to ourbigbook.com/cirosantilli/physics
Is working on static website: cirosantilli.com/mathematics#physics does redirect to cirosantilli.com/physics
Current failure behaviour if use submits anyways is: shows API error previousSiblingId "@cirosantilli/physics" does not exist, is not a header or is not a child of parentId "@cirosantilli/test-article" under title, and it only goes away if you edit title, which is confusing as it is not title related. Also, while title error is visible, the submit button is inactive so the user is left a bit stuck.
The move to dynamic tree slowed things down a lot for large pages such as: ourbigbook.com/cirosantilli, making it is just unacceptably slow, and actually blocks any other page loads as the server does work.
These were at cirosantilli.github.io at aa60ccb934bf9646d548e6b761489d31aec1a341, which has almost 7k articles.
Some benchmarks on Chromium:
  • ping cirosantilli.com: 17 ms
  • cirosantilli.com GET /: 1.3s. Waiting for server: ping time only, the rest is content download. content-length from response: 300 kB zipped.
  • ourbigbook/cirosantilli GET /:
    • Waiting for server response: 3.5s to 4s. That's our problem!
    • Contend download: 2.5s
  • localhost:3000/cirosantilli npm run dev GET /:
    • Waiting for server response: between 2 and 3s. So we reproduce relatively well locally.
      curl time_starttransfer after a few stabilizing runs: 2.6s
    • Contend download: 1.6s
If we comment the single line in Article.tsx:
//html += renderTocFromEntryList({ entry_list })
TTFB falls from 2.6s to 0.77s.
Removing the renderRefCallback drops it to between 2.2 and 2.4.
Limiting the ToC to 1k articles on server side leads to 0.5s. Maybe that's the first workaround we have to do until something else is understood. It is a shame that we have to go so much lower than the static website.
Maybe we can use some of the techniques from: reactjs.org/docs/optimizing-performance.html#virtualize-long-lists to improve things.
This is closely related to: Reach the same performance as static website with dynamic tree. Performance considerations should guide if we actually want this or not.
No more need for:
for (const h of elem.querySelectorAll('.h')) {
on Article.tsx now that we have separate headers, we can just inject it one by one.
We should be able to write:
= Animal

= Dog
since the dog.bigb file should ideally be fully equivalent to
= Dog
Edit: a use case has come up for this: if we can find an existing article that the user is trying to update, we might be able to determine that it does not need to be converted in the first place: skip re-render from API if article was unchanged. But then of course we can't render the article to find its ID, as the hole point is to skip that render in the first place.
We likely want to get rid of the path parameter, and instead determine IDs fully from more "in-band" things like {id} and {scope}.
Both {scope} for subdirs and {id} for custom id basename !== from title should already be working, we just haven't setup ourbigbook CLI to inject {id} based on file path I think.
{scope} is however not really usable in general on the same source tree of cirosantilli.github.io due to github.com/ourbigbook/ourbigbook/issues/284.
This would forbid some constructs that are currently possible locally, e.g. scopes that are not children such as:
= Parent

== Child

=== Child 2
= Parent

= Subdir
but that is fine, it is saner if we enforce scopes to match the tree article tree hierarchy.
The links don't show without JavaScript, this can be seen by disabling Js.
The counts can be dynamic loaded, but the links we really want to do at compile time... any way?
We likely just have to set the path: API argument based on the has scope status of the parent article.
As of the commit that adds this line, it should likely be possible to do it on the backend. On the frontend however we convert / to - so it doesn't work on the existence checks. We need a more accurate ID conversion there.
Currently these do not affect the article's ID as there is a fundamental limitation of the convert function, which determines the "file path" based solely on the title content, and ignores the title arguments such as disambiguate. The ID of the first header, the toplevel ID, then just gets fixed to that (as is meant to happen, first toplevel gets Id fixed), ignoring disambiguate.
This is worked around on web upload by passing the id as the path explicitly as an argument. But this argument is not available on web UI, and it would be ugly anyways, what we want is for it to "just work" by default without the user having to explicitly set an ID on web UI.
For now I made them almost fully correct AFAIS:
  • no ID conflicts that would show on the same page, e.g. across issue IDs and comment IDs
  • links seem to go to where we want them to
The only known bug is: cannot link from comment to article
However, in order to achieve this easily we used scopes liberally, and so the fragments are horrendously long.
The ideal fragment setup for both comments and issues would be either:
  • we don't ever want to show multiple comments/issues from different issues on same page
    • issue IDs:
      • regular elements my-header
      • ToC IDs
        • the ToC: _toc
        • the links: _toc/my-header
    • comment IDs:
      • regular elements _comment/1/my-header
      • ToC IDs
        • the ToC: _comment/1/_toc
        • the links: _comment/1/_toc/my-header
  • we want to show multiple comments/issues from different issues on same page:
    • issue IDs:
      • regular elements _issue/barack-obama/article-topic/1/my-header
      • ToC IDs
        • the ToC: _issue/barack-obama/article-topic/1/_toc
        • the links: _issue/barack-obama/article-topic/1/_toc/my-header
    • comment IDs:
      • regular elements _comment/barack-obama/article-topic/<issue-id>/<comment-id>/my-header
      • ToC IDs
        • the ToC: _comment/barack-obama/article-topic/<issue-id>/<comment-id>/_toc
        • the links: _comment/barack-obama/article-topic/<issue-id>/<comment-id>/_toc/my-header
As of now, does work with a leading slash: </test data>.
Also: it does work if there is a header in the comment before the link.
By that we mean the hardcoded #n area with the metadata, not an h1.
However if you refresh the page, it highlights! Mystery.
These newlines were added for debugging purpoes, but debugging should just be done with:
npx js-beautify min.html
Newlines just add complexity to our codebase, and are not even getting removed from final output as things stand to take up a little bit of useless space.
Happens on CLI and Web, though the web one is a bit artificial.
E.g. cirosantilli.com/x86-paging#toc-x86-paging/sample-code should instead be just: cirosantilli.com/x86-paging#toc-sample-code. Links from headers to currently work however,
On web will require extra caution after we decided to initially stop culling scopes: missing header metadata such as like button, same topic and issue link on headers under a scope.
Either with scroll or a load more button. Slightly tempted by a load more button?
To implement, we just have to expose the ArticlePage.ts fetch in an API manner. The page then tracks current limit on a state variable, and just requests more from that point onwards.
Starting from the commit of this line, we are also going to limit the ToC, so a load more button on ToC would also be of interest: load more ToC entrie.
Likely also at same time do a source character count.
Likely would be easy to implement as it would reuse the exact same query that we already use to update ncestors of the nested set index.
Was removed at: remove word count on web because would require actually implementing properly but lazy.
We should likely not show it on link hover however, only headers, as doing so would mean having to update every single page that links to a header for correctness. If this is ever done, it should be Js runtime stuff only.
It seems that the third party library we are using is just a hack and doesn't properly provide the thing offline... OMG could it be so crap? stackoverflow.com/questions/59773190/monaco-editor-with-nextjs/68611592#68611592
Would require either moving htmlXExtension vs --no-html-x-extension processing out of index.js, or more ideally moving the redirection generation into index.js.
But aint't nobody got time for that!
Reasonable results can already be obtained with:
The main issue with that is the possibly changing _raw/js/matterjs/examples.html path depending on scopes, and it is also not very nice to have to write _raw explicitly.
Instead we should do the same handling as is currently done for \a[] and \Image[] on thes paths.

Closed issues

words: 4k articles: 63
Curently impossible on the API level: it just takes the logged in user and uses that as the new/edit target.
Current use case: allow admin to edit other user's articles e.g. as part of moderation.
= Index


Added a commented out test to test_bigb_output.bigb:
* p1
* p2
renders to just:
* p1
* p2
Also it might be possible to get an extra newline due to this which breaks web upload, but we don't have a min repro currently.
Currently files that are large don't render in either multi nor split headers.
But instead we want it to render on split headers because the _raw version does not always show on GitHub pages, but rather gets downloaded which is bad.
The {file} version is also cool as it allows easy navigation to other files, and comments to be added.
This is currently not so easy to implement because things are done at the ast tree level rather than at render time, which is bad. So the same ast ends up going for both split and nosplit renders.
This is a hard refactor that likely will never be done. But so be it.
Can start simple with either raw or contained, and then add both some day. GitHub copy.
Test repo source code size during tests: 7.2 MiB
Full ToC removal with hack:
 function renderTocFromEntryList({ add_test_instrumentation, entry_list, descendant_count_html, tocIdPrefix }) {
+  return ''
Test repo output size: 166.6 MB -> 114.2 MB, so ToC was 31 %
Let's check header knockout with:
        [Macro.HEADER_MACRO_NAME]: function(ast, context, opts={}) {
          return ''
down to 151.7 MiB, so headers were about 9%.
And finally removing the toplevel stuff:
       toplevel_child_modifier: function(ast, context, out) {
+        return 'out'
down to 161.8 MB, so these were only about 3%.
These should be the only bulk things we have really, everything else will likely be much harder to get wrong.
= Test data

== Tmp
ourbigbook --web-test
Then modify to:
= Test data

== Tmp2

= Tmp
and rerun:
ourbigbook --web-test
Error message:
param "bodySource" is mandatory when not rendering or when "path" to an existing article is not given. path="tmp"
Can be worked around by:
rm -rf out
therefore it is just a case of some outdated local state, thank God for that, should be simple to fix.
The root problem seems to be that sqlite3 out/web/web.sqlite3 .dump still contains tmp, we have to get rid of any synonym headers during ID extraction.
All dynamic.
Only happens when the title would fit in a single line:
For long titles that go over a single line, it doesn't happen.
Removing from ourbigbook.scss:
figure {
  overflow-x: auto;
fixes for some reason, but breaks everything else, as it adds a global vertical scrollbar to the page if there are any images wider than it (when above the mobile mode where images are just width 100%.
The fundamental issue seems to be: stackoverflow.com/questions/6421966/css-overflow-x-visible-and-overflow-y-hidden-causing-scrollbar-issue which we don't know how to work around. Omg.
To get a clearer effect edit ourbigbook.scss to:
.katex { font-size: 20.2em; }
Only the separation between a and its subscript b seems to matter.
Otherwise the following sequence leads to a hard to understand failure for the end user.
First the user uploads with:
== Header 1

== Header 2

\Image[img.png]{title=My image}
Then, Header 2 is completely removed from all source files and the image is moved to Header 1:
== Header 1

\Image[img.png]{title=My image}
Then, when the use tries to upload again, it fails because of duplicated id image-my-image.
This above sequence of events is not ideal from the users' perspective, as a synonym generation would lead to better URLs:
== Header 1

= Header 2

\Image[img.png]{title=My image}
In that sequence, the File for Header 2 would be effectively emptied of Ids, and there would be no duplicates.
But still, if the user deletes a header, it becomes very difficult to know it later on. So perhaps when the CLI downloads the SHA list, it could also check if there are articles on server that both:
  • are not present locally anymore
  • have a non-empty hash
and then procede to make any such headers empty to avoid ID duplication.
Aditionally, it would also be good to move the deleted articles to some predefined header to avoid cluttering the headers. E.g. we could start with a dummy "My deleted articles". Dedicated section: Section 1.2. "Move articles deleted locally to under a trash article on web".
We have recently implemented SHA-256 skips when article content hasn't changed.
But we also need to check if the parent or previous sibling has changed, and if it has then update that.
We could just return parent and previous sibling on the /hash endpoint.
Or we need to add that information to the SHA.
Ideally we should also have a way to change the tree without re-render, though we could start with re-render for simplicity.
This actually breaks uploads because it leads to inconsistencies when finding previousSiblingId.
E.g.: cirosantilli.com/education-level nosplit points to the page itself. Should instead point to cirosantilli.com/education#education-level
Possiby happens only with:
  "publishOptions": {
    "toSplitHeaders": true,
    "htmlXExtension": false,
    "xPrefix": "https://ourbigbook.com/cirosantilli/"
redirects enabled.
Further investigation shows that "toSplitHeaders": true, is the issue. Fixing this for now bu ust removing the split/nosplit in that case.
Otherwise uploads become irreproducible if you stop half way. Unacceptable.
E.g. from ourbigbook.com/cirosantilli/ciro-santilli:
renders as:
instead of the desired:
However the link to the non-synonym header:
<Python (programming language)>
renders correctly without the fragment
OK, this was also reproducible on CLI, links to toplevel synonyms had fragments, just it is infinitely more visible on web where everything is toplevel.
We should have:
  • _raw/path/to/main.py: raw file
for every file path/to/main.py in the repo to avoid URL clashes, e.g. between:
Will also serve as a mechanism to view .bigb source without GitHub.
OK, now afetr the big redirect from cirosantilli.com to ourbigbook.com, this is becoming more pressing.
We could start simple without any in-browser things, email only. Though having both would be ideal...
Edit: first implemented the _index thing. But actually noticed would be saner with a separate _dir/ prefix for directories. Otherwise a recursive wget/zip would not work out of box which makes me sad. It is a bit sad that you can't just remove a path from _raw/path/to/file.txt to go to _raw/path/to/ as you need _dir/path/to/ instead. But so bet it.
E.g. subdir/index.html which would show up under _raw/subdir/index.html gets overriden by the directory listing of subdir/ which goes to the same location.
One possibility would be to add an underscore: _raw/subdir/_index.html for the file. And another underscore for _index.html and so on.
E.g. web.archive.org/web/20230227073734im_/https://upload.wikimedia.org/wikipedia/commons/2/2b/STED_Mikroskop_PSFs.jpg to upload.wikimedia.org/wikipedia/commons/2/2b/STED_Mikroskop_PSFs.jpg.
Edit: OK that is useless. The source needs to be an HTML page, and we can't infer that from the archive links. Manual sources are necessary in that case.
We should have:
  • _file/path/to/main.py: OurBigBook section showing preview of file + comments/metadata
  • _raw/path/to/main.py: raw file
Done the _file part.
= Tmp

== Tmp 2
ourbigbook .
extract_ids: README.bigb
extract_ids: README.bigb finished in 43.47357300110161 ms
README.bigb:4:1: cross reference to unknown id: "tmp-2"
expected outcome:
README.bigb:4:1: cross reference to unknown id: "asdf"
E.g. for Twitter and LinkedIn.
Maybe a screenshot of the website?
If we could represent topics somehow that would be ideal...
Going to close it for now as irreproducible. Worked around it by fixing data manualy with the new nested-set CLI tool. Will try to debug further if it shows up again in the future.
On web now:
We can only reproduce locally by copying the database, we haven't managed to reach such state by a clean sequence of pure API calls, clean naive web/bin/generate-demo-data -C -u1 didn't reproduce either.
Upon quickly inspectig the DB we see that the nested set indexes are wrong:
[ 'barack-obama', 0, 36 ],
[ 'barack-obama/mathematics', 1, 9 ],
[ 'barack-obama/fundamental-theorem-of-calculus', 9, 11 ],
barack-obama/mathematics should stop something larger than 9 to include barack-obama/fundamental-theorem-of-calculus and other children.
The question is now if this is still reachable, of if it was due to a previous bug.
This is the only way to have portable maths across local and server.
The definitions are also useful by default to users, and should just be enabled out-of-box.
They're not enabled there, conversion just fails and user can't submit because button is grayed out. Via API already works however, supposing the user has defined them: enable web math defines default on non-web.
Edit: caching only fails if you edit on web, then somehow download it and change and reupload. But change implies rerender. Hard to see a case where this actually causes problems.
We are auto-formatting locally for splitting and to get rid of includes.
So we need to also format on web in order for content caching to work.
ourbigbook --web
stores the split renders under:
since it is a bigb output.
However, that bigb output is different from the one gnerated with:
ourbigbook -O bigb .
since the latter contains \Include which need to be removed from the web/ output.
Would radically speed sync up.
Would be a possibly good solution now to: github.com/ourbigbook/ourbigbook/issues/274 now that we already have link click capturing necessarily.
Index has no parent, so the line may be empty in that case.
For user cirosantilli, just pushed cirosantilli.github.io at aa60ccb934bf9646d548e6b761489d31aec1a341, which has almost 7k articles.
The POST to ourbigbook.com/api/articles is taking about 3s to 4s on ourbigbook.com "Waiting for server response", 3.44s seems like an common average exact value that often comes back.
This was after dynamic article tree, one suspicion is that it might be linked to maintaining the nested set state on a large set of articles. I really hope that's not it, as it would be hard to fix.
Doing it from the barack-obama test user which only has about 50 articles leads to the same result, but we believe it is because we are not indexing things by user properly (this was added later), so might still be due to nested set.
Locally on sqlite is only 800 ms to 900 ms.
Locally on postgresql with barack-obama user and no cirosantilli data, the POST is almost instantaneous however... 100ms or less!
Local postgresql with cirosantilli user after uploading cirosantilli.github.io with about 7k articles: usually around 2.4s, sometimes a bit less.
Same but with barack-obama: 2.16s! So the main slowdown is likely that we are not properly indexing things, as one users' article affects the other's!
Let's hack it up to see:
alter table "Article" add "authorId" integer;
update "Article" as a set "authorId" = f."authorId" from "File" as f where a."fileId" = f.id
create index idx_nested ON "Article" using btree ("authorId", "nestedSetIndex");
and patch:
diff --git a/web/convert.js b/web/convert.js
index 59870ae3..ff834708 100644
--- a/web/convert.js
+++ b/web/convert.js
@@ -568,6 +568,7 @@ async function convertArticle({
         const rendered_output = extra_returns.rendered_outputs[outpath]
         const renderFull = rendered_output.full
+          authorId: file.authorId,
           depth: newDepth,
           fileId: file.id,
           h1Render: renderFull.substring(0, rendered_output.h1RenderLength),
@@ -599,6 +600,7 @@ async function convertArticle({
           updateOnDuplicate: [
+            'authorId',
diff --git a/web/models/article.js b/web/models/article.js
index 40dbbdba..1c382acc 100644
--- a/web/models/article.js
+++ b/web/models/article.js
@@ -13,6 +13,10 @@ module.exports = (sequelize) => {
       // E.g. `johnsmith/mathematics`.
+      authorId: {
+        type: DataTypes.INTEGER,
+        allowNull: false,
+      },
       slug: {
         type: DataTypes.TEXT,
         unique: {
Didin't help :-(
A cleaner benchmarking can now be done with:
OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -C -u1 -a650 -i0 -c0
By this size, things are already unreasonably slow, and you can visibly see the latter renders being much slower than the early ones. We can then do a very minimal one off benchmark of a single slow article update with:
OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -u1 -a1 -i0 -c0
OK, now the test case is clearer.
By also enabling SQL logging:
time DEBUG='*:sql:*' OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -u1 -a1 -i0 -c0
which gave:
real    0m1.400s
user    0m0.643s
sys     0m0.068s
we will be able to see any slow queries. We reach the following two massively slow queries:
  "to"."id" AS "to.id",
  "to"."idid" AS "to.idid",
  "to"."path" AS "to.path",
  "to"."toplevel_id" AS "to.toplevel_id",
  "to"."ast_json" AS "to.ast_json",
  "to"."macro_name" AS "to.macro_name",
  "to"."createdAt" AS "to.createdAt",
  "to"."updatedAt" AS "to.updatedAt",
  "to->File"."id" AS "to.File.id",
  "to->File"."path" AS "to.File.path",
  "to->File"."toplevel_id" AS "to.File.toplevel_id",
  "to->File"."last_parse" AS "to.File.last_parse",
  "to->File"."last_render" AS "to.File.last_render",
  "to->File"."titleSource" AS "to.File.titleSource",
  "to->File"."bodySource" AS "to.File.bodySource",
  "to->File"."createdAt" AS "to.File.createdAt",
  "to->File"."updatedAt" AS "to.File.updatedAt",
  "to->File"."authorId" AS "to.File.authorId",
  "to->File->file"."id" AS "to.File.file.id",
  "to->File->file"."slug" AS "to.File.file.slug",
  "to->File->file"."topicId" AS "to.File.file.topicId",
  "to->File->file"."titleRender" AS "to.File.file.titleRender",
  "to->File->file"."titleSource" AS "to.File.file.titleSource",
  "to->File->file"."titleSourceLine" AS "to.File.file.titleSourceLine",
  "to->File->file"."render" AS "to.File.file.render",
  "to->File->file"."h1Render" AS "to.File.file.h1Render",
  "to->File->file"."h2Render" AS "to.File.file.h2Render",
  "to->File->file"."depth" AS "to.File.file.depth",
  "to->File->file"."score" AS "to.File.file.score",
  "to->File->file"."nestedSetIndex" AS "to.File.file.nestedSetIndex",
  "to->File->file"."nestedSetNextSibling" AS "to.File.file.nestedSetNextSibling",
  "to->File->file"."createdAt" AS "to.File.file.createdAt",
  "to->File->file"."updatedAt" AS "to.File.file.updatedAt",
  "to->File->file"."fileId" AS "to.File.file.fileId",
  "from"."id" AS "from.id",
  "from"."idid" AS "from.idid",
  "from"."path" AS "from.path",
  "from"."toplevel_id" AS "from.toplevel_id",
  "from"."ast_json" AS "from.ast_json",
  "from"."macro_name" AS "from.macro_name",
  "from"."createdAt" AS "from.createdAt",
  "from"."updatedAt" AS "from.updatedAt",
  "from->File"."id" AS "from.File.id",
  "from->File"."path" AS "from.File.path",
  "from->File"."toplevel_id" AS "from.File.toplevel_id",
  "from->File"."last_parse" AS "from.File.last_parse",
  "from->File"."last_render" AS "from.File.last_render",
  "from->File"."titleSource" AS "from.File.titleSource",
  "from->File"."bodySource" AS "from.File.bodySource",
  "from->File"."createdAt" AS "from.File.createdAt",
  "from->File"."updatedAt" AS "from.File.updatedAt",
  "from->File"."authorId" AS "from.File.authorId",
  "from->File->file"."id" AS "from.File.file.id",
  "from->File->file"."slug" AS "from.File.file.slug",
  "from->File->file"."topicId" AS "from.File.file.topicId",
  "from->File->file"."titleRender" AS "from.File.file.titleRender",
  "from->File->file"."titleSource" AS "from.File.file.titleSource",
  "from->File->file"."titleSourceLine" AS "from.File.file.titleSourceLine",
  "from->File->file"."render" AS "from.File.file.render",
  "from->File->file"."h1Render" AS "from.File.file.h1Render",
  "from->File->file"."h2Render" AS "from.File.file.h2Render",
  "from->File->file"."depth" AS "from.File.file.depth",
  "from->File->file"."score" AS "from.File.file.score",
  "from->File->file"."nestedSetIndex" AS "from.File.file.nestedSetIndex",
  "from->File->file"."nestedSetNextSibling" AS "from.File.file.nestedSetNextSibling",
  "from->File->file"."createdAt" AS "from.File.file.createdAt",
  "from->File->file"."updatedAt" AS "from.File.file.updatedAt",
  "from->File->file"."fileId" AS "from.File.file.fileId"
      "Ref" AS "Ref"
      "Ref"."to_id" = '@barack-obama/test-data'
      AND "Ref"."type" = 0
  ) AS "Ref"
  LEFT OUTER JOIN "Id" AS "to" ON "Ref"."to_id" = "to"."idid"
  LEFT OUTER JOIN "File" AS "to->File" ON "to"."idid" = "to->File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "to->File->file" ON "to->File"."id" = "to->File->file"."fileId"
  LEFT OUTER JOIN "Id" AS "from" ON "Ref"."from_id" = "from"."idid"
  LEFT OUTER JOIN "File" AS "from->File" ON "from"."idid" = "from->File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "from->File->file" ON "from->File"."id" = "from->File->file"."fileId";


  "File"."id" AS "File.id",
  "File"."path" AS "File.path",
  "File"."toplevel_id" AS "File.toplevel_id",
  "File"."last_parse" AS "File.last_parse",
  "File"."last_render" AS "File.last_render",
  "File"."titleSource" AS "File.titleSource",
  "File"."bodySource" AS "File.bodySource",
  "File"."createdAt" AS "File.createdAt",
  "File"."updatedAt" AS "File.updatedAt",
  "File"."authorId" AS "File.authorId",
  "File->file"."id" AS "File.file.id",
  "File->file"."slug" AS "File.file.slug",
  "File->file"."topicId" AS "File.file.topicId",
  "File->file"."titleRender" AS "File.file.titleRender",
  "File->file"."titleSource" AS "File.file.titleSource",
  "File->file"."titleSourceLine" AS "File.file.titleSourceLine",
  "File->file"."render" AS "File.file.render",
  "File->file"."h1Render" AS "File.file.h1Render",
  "File->file"."h2Render" AS "File.file.h2Render",
  "File->file"."depth" AS "File.file.depth",
  "File->file"."score" AS "File.file.score",
  "File->file"."nestedSetIndex" AS "File.file.nestedSetIndex",
  "File->file"."nestedSetNextSibling" AS "File.file.nestedSetNextSibling",
  "File->file"."createdAt" AS "File.file.createdAt",
  "File->file"."updatedAt" AS "File.file.updatedAt",
  "File->file"."fileId" AS "File.file.fileId"
      "Id" AS "Id"
      "Id"."idid" = '@barack-obama'
  ) AS "Id"
  LEFT OUTER JOIN "File" AS "File" ON "Id"."idid" = "File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "File->file" ON "File"."id" = "File->file"."fileId"

Hmmm, so no slow updates, only selects. Surprising!
The first query is the:
const oldRef = await sequelize.models.Ref.findOne({
OK, we manually reduced the first query to a subset:
  "From"."idid" AS "From.idid",
  "From->File"."titleSource" AS "From->File.titleSource"
FROM "Ref"
  "Ref"."from_id" = "From"."idid" AND
  "Ref"."to_id" = '@barack-obama/test-data' AND
  "Ref"."type" = 0
INNER JOIN "File" AS "From->File"
  ON "From"."idid" = "From->File"."toplevel_id"
INNER JOIN "Article" AS "From->Article"
  ON "From->File"."id" = "From->Article"."fileId"
which we were certain should not be slow, and then by commenting things out learnt that foreign keys are not automatically indexed, so the fileId finding was super slow!!! OMG.
That single one line change drops us down to half the creation time, amazing:
real    0m0.668s
user    0m0.638s
sys     0m0.035s
On heroku, after manually doing:
create index article_file_id ON "Article" using btree ("fileId");
new article time fell down to 400ms, which is amazing. One liner! Special thanks to sequelize for the timing info.
After this, the only DB activity that has more than 15ms is:
sequelize:sql:pg Executing (default): SELECT "id", "username", "ip", "displayName", "email", "image", "hash", "salt", "score", "followerCount", "admin", "verified", "verificationCode", "verificationCodeSent", "maxArticles", "maxArticleSize", "createdAt", "updatedAt" FROM "User" AS "User" WHERE "User"."username" = 'barack-obama'; +53ms
but we don't reproduce it in isolation, must be something else in play, e.g. something in parallel.
As of this commit did a bit further investigation with a better tooling and more understanding, notably now we run:
OURBIGBOOK_LOG_DB=1 num run dev-pg
Heroku is definitely slower than local, at around 1 t o2 s on the bit first ten pages:
ourbigbook --web --web-force-render --web-max-renders 10
but local was also rather slow when we have about the same number of articles for the user.
After some improved benchmarking setup, there seem to be two separate causes:
  • preventing: options.db_provider.fetch_header_tree_ids( on web. It is not necessary as we render the ToC dynamically.
    This matters the most for toplevel articles with many descendants.
  • the other problem we haven't solved yet: the nested index update querries are slow. We don't know how to solve that easily.
    Those querries simply update a huge number of rows.
    Maybe we could have a fallback mechanism to build that index on the background, and use the tree index temporarily?
    Hard call.
Just keep making viewport smaller an smaller, until it happen. Sample width that reproduces: 680px.
Removing white-space: pre-wrap solves it. But then the space between (beta) and OurBigBook.com gets removed.
OK: found out I had already previously solved the same issue with &nbsp;, redoing the "hack". Every header space has to be &nbsp;.
This might be something to do with us trying to have a dummy fallbak image when the image URL does not exist.
The request is:
GET https://static.productionready.io/images/smiley-cyrus.jpg net::ERR_INTERNET_DISCONNECTED
so it appears to be trying to infinitely fetch the default image.
For now we seem to have managed to stop it from going infinite by selecting an image that is stored locally in the website.
Otherwise too confusing what is what when fields are pre-filled, e.g. when editing existing, and in the future when clicking a "add here" button.
Maybe some will be list by default, but some will definitely be article show by default. Notably topic has to show the rendered body by default.
This is a superset of: github.com/ourbigbook/ourbigbook/issues/270
Likes will not go under header which does not need to be present, so gonna remove it.
We noticed this is hard to implement, because we want internal links to still work, and just adding a prefix to every ID does not take that into account.
We later noticed that what we actually want to solve the comment use case, is a custom toplevel scope, which we can easily implement with a custom named directory. So... scopes save the day for once?
Will be useful for comments on web, since a single author can make multiple comments, so prefixing by usernme won't be enough.
For topic pages, we can just prefix by username, and that is already currently done.
E.g.: cirosantilli.com/physics#how-to-teach-and-learn-physics
Broken ToC HTML render?
OK, understood the root cause: we moved to rendering the ToC from inside the H rendering function itself, and as a result there is a single toplevel_child_modifier which acts on that entire output.
We'll need to create something more custom to properly handle this case.
And therefore lead users to the toplevel page instead of a link to current header.
The links by clicking on the header itself are correct and go to a dedicated page with it on top. The problem is just for the on-hover links on the margin which we'd like to link to self in the current page.
OK, everything was reversed, I just hadn't noticed before because there was no numbered test data :-)
E.g. currently have: localhost:3000/barack-obama#toc-@barack-obama/mitochondrion It works, but is ugly.
E.g. in:
= x86

== Sample code

== x86 paging

=== Sample code
both Sample code headers have id="sample-code", which would lead to ID conflicts on the same page.
Also, as a result, the toc link from x86 intended to go to x86-paging/sample-code misses and opens on a separate page.
I don't know how to solve this besides always including scopes on every ID... This does however lead to ugly local IDs on individual pages which is a bit of a shame... oh cruel life.
We could also have two versions of every page, scoped and non scoped, but things likely go exponential when we start dealing with subscope.
This could mean that a lot of toplevel scope removal work will go to the trash! :-( But what can you do, it is the inevitable outcome of dynamic page fetch?
Happens on CLI, though was first noticed, and most important, on Web due to the all present user prefix.
Was already fully present on the previous deployment but we just completely missed it, e.g.: ourbigbook.com/cirosantilli/physics#physics-education-needs-more-focus-on-understanding-experiments-and-their-history
Minimal CLI example to reproduce:
= asdf
= qwer
Then in the rendering of subdir/qwer.html, the tag asdf appears twice.
The root cause is that scope resolution is finding the same thing twice, one as subdir/asdf and then once again with just asdf (which is then correctly resolved).
Maybe: stackoverflow.com/questions/10687099/how-to-test-if-a-url-string-is-absolute-or-relative
Happens on some pages but not others, e.g. barack-obama/ciro-santilli.
OK: that simply happens due to invalid HTML constructs:
and we had invalid or implicitly self closing HTML at: self links broken on /ciro-santilli starting at Budget transparency.
There is some kind of fundamentally wrong HTML content being rendered, not Web specific: cirosantilli.com/sponsor#budget-transparency
Resolution: was due to missing a close tag that appeared when we used \Quote with title. It was even valid HTML OMG, but wront semantic. What a stack.
Likely locally too then right. Will also be more uniform with h2 which now has parent link.
Also seems like empty line (no wiki) is showing: localhost:3000/barack-obama/x86-paging/sample-code
Both preview and render.
OK, was not the arguments in general, was {wiki} alone with which I was testing, thank God!
It is broken, and lazy to fix now.
Can be fixed later at: word count on web.
Fixed at: aca09f9485bcbc6c8cd184d61871f02e8a602981


words: 32 articles: 34


words: 5
Database specific tasks, usually refactoring.


words: 13 articles: 1
This tag is about handling non-OurBigBook files, notably related to using the \H file argument.


articles: 5


articles: 2


words: 14 articles: 9
This tag is about ourbigbook --web uploading from the local filesystem to OurBigBook Web.


articles: 1