Performance

Up to: Tech WG

How 2 make the masto run good.

Changes Made

The changes we have actually made from the default configuration, each is either described below or on a separate page:

Split out sidekiq queues into separate service files
Optimized postgres using pgtune

Sidekiq

The Sidekiq queue processes tasks requested by the mastodon rails app.

There are a few strategies in this post for scaling sidekiq performance.

Increase the DB_POOL value in the default service file (below)
Make separate services for each of the queues
Make multiple processes for a queue (after making a separate service)

Default Configuration

By default, the mastodon-sidekiq service is configured with 25 threads, the full service file is as follows:

[Unit]
Description=mastodon-sidekiq
After=network.target

[Service]
Type=simple
User=mastodon
WorkingDirectory=/home/mastodon/live
Environment="RAILS_ENV=production"
Environment="DB_POOL=25"
Environment="MALLOC_ARENA_MAX=2"
Environment="LD_PRELOAD=libjemalloc.so"
ExecStart=/home/mastodon/.rbenv/shims/bundle exec sidekiq -c 25
TimeoutSec=15
Restart=always
# Proc filesystem
ProcSubset=pid
ProtectProc=invisible
# Capabilities
CapabilityBoundingSet=
# Security
NoNewPrivileges=true
# Sandboxing
ProtectSystem=strict
PrivateTmp=true
PrivateDevices=true
PrivateUsers=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectControlGroups=true
RestrictAddressFamilies=AF_INET
RestrictAddressFamilies=AF_INET6
RestrictAddressFamilies=AF_NETLINK
RestrictAddressFamilies=AF_UNIX
RestrictNamespaces=true
LockPersonality=true
RestrictRealtime=true
RestrictSUIDSGID=true
RemoveIPC=true
PrivateMounts=true
ProtectClock=true
# System Call Filtering
SystemCallArchitectures=native
SystemCallFilter=~@cpu-emulation @debug @keyring @ipc @mount @obsolete @privile>
SystemCallFilter=@chown
SystemCallFilter=pipe
SystemCallFilter=pipe2
ReadWritePaths=/home/mastodon/live

[Install]
WantedBy=multi-user.target

Separate Services

Even after increasing the number of worker threads to 75, we were still getting huge backlogs on our queues, particularly pull which was loading up with link crawl workers, presumably the slower jobs were getting in the way of faster jobs and they were piling up.

We want to split up sidekiq into multiple processes using separate systemd service files. We want to a) make the site responsive by processing high-priority queues quickly but also b) use all our available resources by not having processes sit idle. So we give each of the main queues one service file that has that queue as the top prioriry, and mix the other queues in as secondary priorities - sidekiq will try and process items from the first queue first, second queue second, and so on.

So we allocate 25 threads (and 25 db connections) each to four service files with the following priority orders, and two additional service files that give 5 threads to the lower-priority queues. Note that we only do this after increasing the maximum postgres connections to 200, see https://hazelweakly.me/blog/scaling-mastodon/#db_pool-notes-from-nora's-blog

Service Name	Queues	Threads
`mastodon-sidekiq-default`	default, ingress, pull, push	25
`mastodon-sidekiq-ingress`	ingress, default, push, pull	25
`mastodon-sidekiq-push`	push, pull, default, ingress	25
`mastodon-sidekiq-pull`	pull, push, default, ingress	25
`mastodon-sidekiq-mailers`	mailers	5
`mastodon-sidekiq-scheduler`	scheduler	5

Cooldown

sneakers.the.rat#technical-wg24-01-09 04:53:38

so re: Stoplight and Cooldowns, Sidekiq#Cooldown tries to deliver something 16 times ( https://github.com/NeuromatchAcademy/mastodon/blob/eb24c0ad07c4137517e6bd37ebcc99d6e2b86797/app/workers/activitypub/delivery_worker.rb#L11 ) the delay rises exponentially (^4) with each retry. So eg by the 10th retry we're delaying an average of 208 minutes, and by 16 we're at 1365 (22 hours).

That delay uses sidekiq's `sidekiq_retry_in` method, which applies to each delivery task (ie. each status we're trying to push), but there is also an additional control flow tool Stoplight ( https://blog.bolshakov.dev/stoplight/ ) that applies per inbox URL (rather than per job). You start in a good (green) state. Each failure counts towards a threshold (10), after which it halts all jobs matching that inbox (red). After the cooldown period (60 seconds) it flips into a "yellow" state: if the next job succeeds, it flips back to green. If it fails, it goes immediately back to red.

That configuration seems sorta... pointless to me? if it only kicks in after 10 failures, then it'll only be halting after really long delays, right? it seems like that should be a high threshold with like a really long cooldown to me - if we haven't been able to deliver like 200 messages, then cooldown for like 6 hours (i'm not sure if receiving a message clears the stoplight)

Each service file is identical except for this part. (We didn't use the @.service systemd templates because we couldn't find a nice way of doing a list of parameters that could handle multiple queues and variable thread numbers in different services):

Environment="DB_POOL=25"
ExecStart=/home/mastodon/.rbenv/shims/bundle exec sidekiq -q push -q pull -q default -q ingress -c 25

and is located in /etc/systemd/system with the name of its primary queue (eg. /etc/systemd/system/mastodon-sidekiq-default.service)

Then we make one meta-service file mastodon-sidekiq.service that can control the others:

[Unit]
Description=mastodon-sidekiq
After=network.target
Wants=mastodon-sidekiq-default.service
Wants=mastodon-sidekiq-ingress.service
Wants=mastodon-sidekiq-mailers.service
Wants=mastodon-sidekiq-pull.service
Wants=mastodon-sidekiq-push.service
Wants=mastodon-sidekiq-scheduler.service

[Service]
Type=oneshot
ExecStart=/bin/echo "mastodon-sidekiq exists only to collectively start and stop mastodon-sidekiq-* instances"
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

and make the subsidiary service dependent on the main service

[Install]
WantedBy=multi-user.target mastodon-sidekiq.service

This lets sidekiq use all the available CPU (rather than having the queues pile up while the CPU is hovering around 50% usage), which may be good or bad, but it did drain the queues from ~20k to 0 in a matter of minutes.

Postgresql

PGTune

Following the advice of PGTune ( https://pgtune.leopard.in.ua/ ), postgres is configured like:

/etc/postgresql/15/main/postgresql.conf

# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 3 GB
# CPUs num: 4
# Connections num: 200
# Data Storage: ssd

max_connections = 200
shared_buffers = 768MB
effective_cache_size = 2304MB
maintenance_work_mem = 192MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 1966kB
huge_pages = off
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

References

https://thenewstack.io/optimizing-mastodon-performance-with-sidekiq-and-redis-enterprise/
https://thomas-leister.de/en/scaling-up-mastodon/
https://hazelweakly.me/blog/scaling-mastodon/
https://www.digitalocean.com/community/tutorials/how-to-scale-your-mastodon-server
https://hub.sunny.garden/2023/07/08/sidekiq-tuning-small-mastodon-servers/
- https://sunny.garden/@brook/111475392515987172 - "you can probably reduce your total thread count / db connections considerably if you'd like"