TODO / Article create and update slow on web

ourbigbook.com/api/articles

For user cirosantilli, just pushed cirosantilli.github.io at aa60ccb934bf9646d548e6b761489d31aec1a341, which has almost 7k articles.

The POST to ourbigbook.com/api/articles is taking about 3s to 4s on ourbigbook.com "Waiting for server response", 3.44s seems like an common average exact value that often comes back.

This was after dynamic article tree, one suspicion is that it might be linked to maintaining the nested set state on a large set of articles. I really hope that's not it, as it would be hard to fix.

Doing it from the barack-obama test user which only has about 50 articles leads to the same result, but we believe it is because we are not indexing things by user properly (this was added later), so might still be due to nested set.

Locally on sqlite is only 800 ms to 900 ms.

Locally on postgresql with barack-obama user and no cirosantilli data, the POST is almost instantaneous however... 100ms or less!

Local postgresql with cirosantilli user after uploading cirosantilli.github.io with about 7k articles: usually around 2.4s, sometimes a bit less.

Same but with barack-obama: 2.16s! So the main slowdown is likely that we are not properly indexing things, as one users' article affects the other's!

Let's hack it up to see:

./bin/psql
alter table "Article" add "authorId" integer;
update "Article" as a set "authorId" = f."authorId" from "File" as f where a."fileId" = f.id
create index idx_nested ON "Article" using btree ("authorId", "nestedSetIndex");

and patch:

diff --git a/web/convert.js b/web/convert.js
index 59870ae3..ff834708 100644
--- a/web/convert.js
+++ b/web/convert.js
@@ -568,6 +568,7 @@ async function convertArticle({
         const rendered_output = extra_returns.rendered_outputs[outpath]
         const renderFull = rendered_output.full
         articleArgs.push({
+          authorId: file.authorId,
           depth: newDepth,
           fileId: file.id,
           h1Render: renderFull.substring(0, rendered_output.h1RenderLength),
@@ -599,6 +600,7 @@ async function convertArticle({
         articleArgs,
         {
           updateOnDuplicate: [
+            'authorId',
             'h1Render',
             'h2Render',
             'titleRender',
diff --git a/web/models/article.js b/web/models/article.js
index 40dbbdba..1c382acc 100644
--- a/web/models/article.js
+++ b/web/models/article.js
@@ -13,6 +13,10 @@ module.exports = (sequelize) => {
     'Article',
     {
       // E.g. `johnsmith/mathematics`.
+      authorId: {
+        type: DataTypes.INTEGER,
+        allowNull: false,
+      },
       slug: {
         type: DataTypes.TEXT,
         unique: {

Didin't help :-(

A cleaner benchmarking can now be done with:

OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -C -u1 -a650 -i0 -c0

By this size, things are already unreasonably slow, and you can visibly see the latter renders being much slower than the early ones. We can then do a very minimal one off benchmark of a single slow article update with:

OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -u1 -a1 -i0 -c0

OK, now the test case is clearer.

By also enabling SQL logging:

time DEBUG='*:sql:*' OURBIGBOOK_POSTGRES=1 ./bin/generate-demo-data.js -u1 -a1 -i0 -c0

which gave:

real    0m1.400s
user    0m0.643s
sys     0m0.068s

we will be able to see any slow queries. We reach the following two massively slow queries:

SELECT
  "Ref".*,
  "to"."id" AS "to.id",
  "to"."idid" AS "to.idid",
  "to"."path" AS "to.path",
  "to"."toplevel_id" AS "to.toplevel_id",
  "to"."ast_json" AS "to.ast_json",
  "to"."macro_name" AS "to.macro_name",
  "to"."createdAt" AS "to.createdAt",
  "to"."updatedAt" AS "to.updatedAt",
  "to->File"."id" AS "to.File.id",
  "to->File"."path" AS "to.File.path",
  "to->File"."toplevel_id" AS "to.File.toplevel_id",
  "to->File"."last_parse" AS "to.File.last_parse",
  "to->File"."last_render" AS "to.File.last_render",
  "to->File"."titleSource" AS "to.File.titleSource",
  "to->File"."bodySource" AS "to.File.bodySource",
  "to->File"."createdAt" AS "to.File.createdAt",
  "to->File"."updatedAt" AS "to.File.updatedAt",
  "to->File"."authorId" AS "to.File.authorId",
  "to->File->file"."id" AS "to.File.file.id",
  "to->File->file"."slug" AS "to.File.file.slug",
  "to->File->file"."topicId" AS "to.File.file.topicId",
  "to->File->file"."titleRender" AS "to.File.file.titleRender",
  "to->File->file"."titleSource" AS "to.File.file.titleSource",
  "to->File->file"."titleSourceLine" AS "to.File.file.titleSourceLine",
  "to->File->file"."render" AS "to.File.file.render",
  "to->File->file"."h1Render" AS "to.File.file.h1Render",
  "to->File->file"."h2Render" AS "to.File.file.h2Render",
  "to->File->file"."depth" AS "to.File.file.depth",
  "to->File->file"."score" AS "to.File.file.score",
  "to->File->file"."nestedSetIndex" AS "to.File.file.nestedSetIndex",
  "to->File->file"."nestedSetNextSibling" AS "to.File.file.nestedSetNextSibling",
  "to->File->file"."createdAt" AS "to.File.file.createdAt",
  "to->File->file"."updatedAt" AS "to.File.file.updatedAt",
  "to->File->file"."fileId" AS "to.File.file.fileId",
  "from"."id" AS "from.id",
  "from"."idid" AS "from.idid",
  "from"."path" AS "from.path",
  "from"."toplevel_id" AS "from.toplevel_id",
  "from"."ast_json" AS "from.ast_json",
  "from"."macro_name" AS "from.macro_name",
  "from"."createdAt" AS "from.createdAt",
  "from"."updatedAt" AS "from.updatedAt",
  "from->File"."id" AS "from.File.id",
  "from->File"."path" AS "from.File.path",
  "from->File"."toplevel_id" AS "from.File.toplevel_id",
  "from->File"."last_parse" AS "from.File.last_parse",
  "from->File"."last_render" AS "from.File.last_render",
  "from->File"."titleSource" AS "from.File.titleSource",
  "from->File"."bodySource" AS "from.File.bodySource",
  "from->File"."createdAt" AS "from.File.createdAt",
  "from->File"."updatedAt" AS "from.File.updatedAt",
  "from->File"."authorId" AS "from.File.authorId",
  "from->File->file"."id" AS "from.File.file.id",
  "from->File->file"."slug" AS "from.File.file.slug",
  "from->File->file"."topicId" AS "from.File.file.topicId",
  "from->File->file"."titleRender" AS "from.File.file.titleRender",
  "from->File->file"."titleSource" AS "from.File.file.titleSource",
  "from->File->file"."titleSourceLine" AS "from.File.file.titleSourceLine",
  "from->File->file"."render" AS "from.File.file.render",
  "from->File->file"."h1Render" AS "from.File.file.h1Render",
  "from->File->file"."h2Render" AS "from.File.file.h2Render",
  "from->File->file"."depth" AS "from.File.file.depth",
  "from->File->file"."score" AS "from.File.file.score",
  "from->File->file"."nestedSetIndex" AS "from.File.file.nestedSetIndex",
  "from->File->file"."nestedSetNextSibling" AS "from.File.file.nestedSetNextSibling",
  "from->File->file"."createdAt" AS "from.File.file.createdAt",
  "from->File->file"."updatedAt" AS "from.File.file.updatedAt",
  "from->File->file"."fileId" AS "from.File.file.fileId"
FROM
  (
    SELECT
      "Ref"."id",
      "Ref"."type",
      "Ref"."from_id",
      "Ref"."to_id",
      "Ref"."defined_at",
      "Ref"."defined_at_line",
      "Ref"."defined_at_col",
      "Ref"."inflected",
      "Ref"."to_id_index",
      "Ref"."createdAt",
      "Ref"."updatedAt"
    FROM
      "Ref" AS "Ref"
    WHERE
      "Ref"."to_id" = '@barack-obama/test-data'
      AND "Ref"."type" = 0
    LIMIT
      1
  ) AS "Ref"
  LEFT OUTER JOIN "Id" AS "to" ON "Ref"."to_id" = "to"."idid"
  LEFT OUTER JOIN "File" AS "to->File" ON "to"."idid" = "to->File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "to->File->file" ON "to->File"."id" = "to->File->file"."fileId"
  LEFT OUTER JOIN "Id" AS "from" ON "Ref"."from_id" = "from"."idid"
  LEFT OUTER JOIN "File" AS "from->File" ON "from"."idid" = "from->File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "from->File->file" ON "from->File"."id" = "from->File->file"."fileId";

+451ms

SELECT
  "Id".*,
  "File"."id" AS "File.id",
  "File"."path" AS "File.path",
  "File"."toplevel_id" AS "File.toplevel_id",
  "File"."last_parse" AS "File.last_parse",
  "File"."last_render" AS "File.last_render",
  "File"."titleSource" AS "File.titleSource",
  "File"."bodySource" AS "File.bodySource",
  "File"."createdAt" AS "File.createdAt",
  "File"."updatedAt" AS "File.updatedAt",
  "File"."authorId" AS "File.authorId",
  "File->file"."id" AS "File.file.id",
  "File->file"."slug" AS "File.file.slug",
  "File->file"."topicId" AS "File.file.topicId",
  "File->file"."titleRender" AS "File.file.titleRender",
  "File->file"."titleSource" AS "File.file.titleSource",
  "File->file"."titleSourceLine" AS "File.file.titleSourceLine",
  "File->file"."render" AS "File.file.render",
  "File->file"."h1Render" AS "File.file.h1Render",
  "File->file"."h2Render" AS "File.file.h2Render",
  "File->file"."depth" AS "File.file.depth",
  "File->file"."score" AS "File.file.score",
  "File->file"."nestedSetIndex" AS "File.file.nestedSetIndex",
  "File->file"."nestedSetNextSibling" AS "File.file.nestedSetNextSibling",
  "File->file"."createdAt" AS "File.file.createdAt",
  "File->file"."updatedAt" AS "File.file.updatedAt",
  "File->file"."fileId" AS "File.file.fileId"
FROM
  (
    SELECT
      "Id"."id",
      "Id"."idid",
      "Id"."path",
      "Id"."toplevel_id",
      "Id"."ast_json",
      "Id"."macro_name",
      "Id"."createdAt",
      "Id"."updatedAt"
    FROM
      "Id" AS "Id"
    WHERE
      "Id"."idid" = '@barack-obama'
    LIMIT
      1
  ) AS "Id"
  LEFT OUTER JOIN "File" AS "File" ON "Id"."idid" = "File"."toplevel_id"
  LEFT OUTER JOIN "Article" AS "File->file" ON "File"."id" = "File->file"."fileId"

+227m

Hmmm, so no slow updates, only selects. Surprising!

The first query is the:

const oldRef = await sequelize.models.Ref.findOne({

OK, we manually reduced the first query to a subset:

SELECT
  "Ref"."to_id",
  "Ref"."from_id",
  "From"."idid" AS "From.idid",
  "From->File"."titleSource" AS "From->File.titleSource"
FROM "Ref"
INNER JOIN "Id" AS "From"
ON
  "Ref"."from_id" = "From"."idid" AND
  "Ref"."to_id" = '@barack-obama/test-data' AND
  "Ref"."type" = 0
INNER JOIN "File" AS "From->File"
  ON "From"."idid" = "From->File"."toplevel_id"
INNER JOIN "Article" AS "From->Article"
  ON "From->File"."id" = "From->Article"."fileId"
;

which we were certain should not be slow, and then by commenting things out learnt that foreign keys are not automatically indexed, so the fileId finding was super slow!!! OMG.

That single one line change drops us down to half the creation time, amazing:

real    0m0.668s
user    0m0.638s
sys     0m0.035s

On heroku, after manually doing:

create index article_file_id ON "Article" using btree ("fileId");

new article time fell down to 400ms, which is amazing. One liner! Special thanks to sequelize for the timing info.

After this, the only DB activity that has more than 15ms is:

sequelize:sql:pg Executing (default): SELECT "id", "username", "ip", "displayName", "email", "image", "hash", "salt", "score", "followerCount", "admin", "verified", "verificationCode", "verificationCodeSent", "maxArticles", "maxArticleSize", "createdAt", "updatedAt" FROM "User" AS "User" WHERE "User"."username" = 'barack-obama'; +53ms

but we don't reproduce it in isolation, must be something else in play, e.g. something in parallel.

Table of contents 1k 1
- Article create and update slow on web update 1 Article create and update slow on web 154

 Ancestors (3)

 Incoming links (1)

Progress spinner after submit