Search

From neuromatch
Search
Description Full-text search on neuromatch.social
Part Of Mastodon/Hacking
Contributors Jonny Saunders


Completion Status Stub
Active Status Active
Approval Status Draft


(You may also be looking for ElasticSearch, where config and maintenance info is kept. this is about the search features in masto generally)


Mastodon ElasticSearch

The basic masto full-text search uses ElasticSearch via Chewy.

Indexing

Summary

The chewy indexes are defined in the app/chewy directory - https://github.com/NeuromatchAcademy/mastodon/tree/main/app/chewy

The actual work of creating the index seems to happen in

The Importers define how the individual objects to index are loaded and added to the search index, and the Scheduler runs the importers periodically. The Indexes themselves describe how Chewy handles each object.

When a relevant object is updated, it is added to the indexing queue using an update_index method, (eg. Status:update_index). The Scheduler feeds each object to be indexed to the importer.

Filters

Each importer has a set of rules that determine if something should be added to the index. When indexing, the importers also check to see if an object is searchable_by any accounts - described in the next section.

Note that these filters make it so far less than all posts that the server contains are indexed. Also important is that by default the indexing does not respect noindex or nobot tags in profiles.

Those rules, summarized:

Status

  • Statuses that Mention a local account
  • Statuses that have been Favorited by a local account
  • Statuses that include a Poll that has been voted on by a local account
  • Statuses that were Boosted by a local account.
  • Statuses that were Created by a local account.

Account

  • Accounts that are searchable - ie. that are not unapproved, suspended, or moved.

Tags

  • All hashtags! (except those that are in unlisted posts? -- Manisha)


Querying

The search service is located in services/search_service.rb

It queries indexed objects, and relevant to full-text search also applies the searchable_by filter in the Status model

The searchable_by method returns a list of (local) account IDs that are capable of receiving a given status in a search. Those accounts include:

  • The local account that Created the status.
  • Local accounts that are Mentioned in the status
  • Local accounts that have Favorited the status
  • Local accounts that have Boosted the status
  • Local accounts that have Bookmarked the status
  • Local accounts that have Voted on a Poll in the status.

After searchable_by is calculated for a given status, a status can also be excluded from search results if they fail the StatusFilter which removes

  • Accounts that are blocked, muted, or on a domain blocked by the post creator
  • Accounts that have been silenced by the instance and are not following the creator of the status.

StatusFilter also removes statuses that fail the StatusPolicy:show check, which includes

  • Remote accounts if the post is local only
  • Accounts that are suspended
  • Accounts that are not mentioned if the visibility of the post is "direct" or "limited"
  • Accounts that are not mentioned and do not follow the posting account if the visibility is set to "private" (followers only)
  • Accounts that are blocked, or on a domain that is blocked by the creator of the status.

Other Implementations

See Also

References

Prior Conversations