MongoDB 优化指南

MongoDB 优化指南

针对Ampere Altra处理器

概述

MongoDB是一个流行的面向文档的，跨平台开源NoSQL数据库。其灵活的数据模型能支持存储具有完整索引支持和复制的非结构化数据。根据DB-Engines的数据，截至2023年1月，MongoDB是第五大最受欢迎的数据库。它是用c++编写的，旨在为web应用程序提供可扩展的高性能数据存储解决方案。

本指南的目的是描述在Ampere^® Altra^®处理器上以最佳方式运行MongoDB的一些技术参考。

构建先决条件

以高性能方式运行应用程序首先要正确构建应用程序并使用适当的编译器标志。当在Ampere^® Altra^®处理器上从源码开始构建并运行应用程序时，我们建议使用GCC编译器版本10或更高的版本。较新版本的编译器往往对新的处理器特性有更好的支持，并结合了更高级的代码生成技术。

我们的测试使用CentOS Stream 8作为操作系统。

从SCL存储库下载并安装了GCC 11:

yum -y install scl-utils scl-utils-build 
yum -y install gcc-toolset-11.aarch64 
scl enable gcc-toolset-11  bash

对于Ubuntu 22.04 LTS和Debian等其他操作系统，GCC 11也是可用的，可以直接从各自的存储库安装。

构建和安装

MongoDB可以从操作系统包管理器提供的存储库中安装，也可以直接从源代码构建。全面的MongoDB安装指南可以在官方文档中找到。我们建议从源代码安装，以获得更好的灵活性，以及控制和配置特定模块的能力。

为了构建针对Ampere^® Altra^®处理器家族优化的MongoDB，可以在编译阶段添加可以利用硬件特性的额外编译选项。用于编译的MongoDB源代码可以从MongoDB下载页面获得。本指南使用稳定版本MongoDB 6.0.3。从源代码安装需要某些库和附加模块，它们将被编译成二进制文件。

执行以下步骤来安装依赖项。

yum -y install libcurl-devel python39 python39-devel openssl-devel
yum -y install zlib-devel git wget xz-devel
yum -y groupinstall "Development Tools"

要支持https连接，请从各自的git存储库下载这些附加源代码的最新代码。

git clone [https://github.com/mongodb/mongo](https://github.com/mongodb/mongo)
git checkout -b myr6.0.3.rc2 r6.0.3-rc2

需要Python 3.7+，并且必须安装几个Python模块，运行命令:

python3 -m pip install -r etc/pip/compile-requirements.txt

编译

diff a/src/mongo/db/stats/counters.h b/src/mongo/db/stats/counters.h
224a225,226
>     static_assert(sizeof(decltype(_together)) <= stdx::hardware_constructive_interference_size,
>                   "cache line spill");

python3 buildscripts/scons.py支持许多编译选项，如CC, CFLAGS等。

# get help of scons, such as define CXX=<g++ path>, CC=<gcc path> 
python3 buildscripts/scons.py -h
# Note: configure g++ and gcc path
# --force-jobs is CPU core number
python3 buildscripts/scons.py --force-jobs=8040 DESTDIR=<MongoDB_Install_Dir >

MongoDB 配置

在本指南中，MongoDB被配置为使用WiredTiger存储引擎和snappy作为块和日志压缩器。请参考附录中显示的mongodb.conf文件配置服务器。

#start the server
$MongoDB_Install_Dir/bin/mongod --config mongod_conf --storageEngine wiredTiger

#stop the server
$MongoDB_Install_Dir/bin --config mongod_conf --shutdown

性能优化

有数百种设置可以改变MongoDB的功能和性能。下面列出的只是一些可以使用的更常见的调节变量。推荐参考MongoDB文档了解所有设置详情。

cachesizeGB

定义了内部使用的缓存最大值，WiredTiger 将其适用于所有数据。

增加cacheSizeGB可以减少磁盘io的影响，提高读写性能。

使用“db.serverStatus(). wiredtiger”命令检查“maximum bytes configured”(cacheSizeGB或默认设置配置的最大缓存大小)和“bytes current in the cache”(当前缓存中的数据大小)。

Eviction 优化

当应用程序接近最大缓存大小时，WiredTiger开始清除，以防止内存使用增长过大，遵循“最近最少使用”算法。" eviction=(threads_min=X) "是正在运行的WiredTiger Eviction工作线程的最小数量，取值必须在1到20之间。

" eviction=(threads_max=X) "是正在运行的WiredTiger Eviction工作线程的最大数量。取值必须在1到20之间。这应该与MongoDB的threads_min设置相匹配。

#get
db.adminCommand({getParameter: 1, wiredTigerEngineRuntimeConfig: "eviction"})
{
  wiredTigerEngineRuntimeConfig: 'eviction=(threads_min=4,threads_max=8)',
  ok: 1
}
#set
db.adminCommand({setParameter: 1, wiredTigerEngineRuntimeConfig: "eviction=(threads_min=4,threads_max=8)"})

concurrentTransactions

iredTiger使用ticket来控制存储引擎同时处理的读/写操作的数量。默认值是128，在大多数情况下都很有效的。如果ticket降为0，则后续所有操作都排队等待新的ticket。长时间运行的操作可能会导致可用ticket的数量减少，从而降低系统的并发性。例如，增加ticket的配置值可以增加并发性。

#读取当前值
db.serverStatus().wiredTiger.concurrentTransactions
{
  write: { out: 0, available: 128, totalTickets: 128 },
  read: { out: 0, available: 128, totalTickets: 128 }
}

#修改值
db.adminCommand({setParameter: 1, wiredTigerConcurrentWriteTransactions: 256})
{ was: 0, ok: 1 }

db.adminCommand({setParameter: 1, wiredTigerConcurrentReadTransactions: 256})
{ was: 0, ok: 1 }

journalCompressor

指定用于压缩WiredTiger日志数据的压缩类型。压缩操作会消耗额外的CPU资源，但也最小化了存储消耗。

blockCompressor

指定集合数据的默认压缩。在创建集合时，可以在每个集合的基础上重置此设置。当然压缩操作会消耗额外的CPU资源，但也最小化了存储消耗。

64K PAGESIZE

内核PAGESIZE建议设置为64K。可以使用命令“getconf PAGESIZE”来确定。PAGESIZE是一个内存页的大小，以字节为单位，在编译内核时配置。使用较大的页面可以减少将虚拟页面地址转换为物理页面地址的硬件延迟。延迟的减少是由于硬件翻译缓存(如translation lookaside buffer，TLB)的效率得到了提高。因为硬件转换缓存只有有限数量的条目，所以使用更大的页面大小会增加缓存中每个条目可以转换的虚拟内存量。这不但增加了应用程序可以访问的内存量，而且不会导致硬件转换延迟。

Transparent Huge Pages

透明大页(Transparent Huge Pages, THP)是一种Linux内存管理系统，它通过使用更大的内存页，减少了在具有大量内存的机器上 TLB(Translation Lookaside Buffer)查找的开销。然而，在启用THP的情况下，数据库工作负载通常表现不佳，因为它们往往具有稀疏而非连续的内存访问模式。在Linux上运行MongoDB时，应该禁用THP以获得最佳性能。

echo never > /sys/kernel/mm/transparent_hugepage/enabled

大多数类unix操作系统，包括Linux和macOS，都提供了在每个进程和每个用户的基础上限制和控制系统资源(如线程、文件和网络连接)使用的方法。这些“限制”可以防止单个用户使用过多的系统资源。有时，这些限制的默认值比较低，这可能会在正常的MongoDB操作过程中导致许多问题。

要为这些版本配置ulimit值，请创建一个名为 /etc/security/limits.d/99-mongodb-nproc.conf 的文件，并添加新值以提高该进程的限制阈值。

echo "*  soft    fsize        unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  hard    fsize        unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  soft    cpu          unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  hard    cpu          unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  soft    as          	unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  hard    as          	unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  soft    memlock      unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  hard    memlock      unlimited" | sudo tee -a /etc/security/limits.conf
echo "*  soft    nofile       64000" | sudo tee -a /etc/security/limits.conf
echo "*  hard    nofile       64000" | sudo tee -a /etc/security/limits.conf
echo "*  soft    nproc        64000" | sudo tee -a /etc/security/limits.conf
echo "*  hard    nproc        64000" | sudo tee -a /etc/security/limits.conf

为您的部署配置足够的文件句柄(fs.file-max)、内核pid限制(kernel.pid_max)、每个进程的最大线程数(kernel.threads-max)和每个进程的最大内存映射区域数(vm.max_map_count)。对于大型系统，以下值是不错的参考值:

sysctl -w vm.max_map_count = 98000
sysctl -w kernel.pid_max = 64000
sysctl -w kernel.threads-max = 64000
sysctl -w vm.max_map_count=128000
sysctl -w net.core.somaxconn=65535

开始调优并使用吞吐量-性能配置文件

tuned-adm profile throughput-performance

附录

MongoDB conf file

processManagement:
   fork: true
net:
   bindIp: %SERVER%
   port: %PORT%
storage:
   dbPath: %DATA_ROOT%/%PORT%
   engine: wiredTiger
   wiredTiger:
      engineConfig:
         journalCompressor: snappy
         cacheSizeGB: 30
      collectionConfig:
         blockCompressor: snappy

systemLog:
   destination: file
   path: "%DATA_ROOT%/%PORT%/mongod.log"
   logAppend: true
storage:
   journal:
      enabled: true

Created At : May 23rd 2023, 8:30:19 am

Last Updated At : June 12th 2023, 8:59:19 pm

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

| | | | | |

This site is running on Ampere Altra Processors.