Skip to content

Chinese search support – 中文搜索​支持

Insiders adds experimental Chinese language support for the built-in search plugin – a feature that has been requested for a long time given the large number of Chinese users.

After the United States and Germany, the third-largest country of origin of Material for MkDocs users is China. For a long time, the built-in search plugin didn't allow for proper segmentation of Chinese characters, mainly due to missing support in lunr-languages which is used for search tokenization and stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users.

Material for MkDocs終於​支持​中文​了!文本​被​正確​分割​並且​更​容易​找到。

This article explains how to set up Chinese language support for the built-in search plugin in a few minutes.

Configuration

Chinese language support for Material for MkDocs is provided by jieba, an excellent Chinese text segmentation library. If jieba is installed, the built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install jieba with:

pip install jieba

The next step is only required if you specified the separator configuration in mkdocs.yml. Text is segmented with zero-width whitespace characters, so it renders exactly the same in the search modal. Adjust mkdocs.yml so that the separator includes the \u200b character:

plugins:
  - search:
      separator: '[\s\u200b\-]'

That's all that is necessary.

Usage

If you followed the instructions in the configuration guide, Chinese words will now be tokenized using jieba. Try searching for 支持 to see how it integrates with the built-in search plugin.


Note that this is an experimental feature, and I, @squidfunk, am not proficient in Chinese (yet?). If you find a bug or think something can be improved, please open an issue.


Last update: April 14, 2023
Created: April 14, 2023