生产的Clickhouse集群模式为4*2,即4个shard,其中每个shard有2个replica,采用复制表(Replicated)。

集群中一个CK节点,因服务器电压不稳意外重启后,CK启动失败,一直报错:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
2021.12.23 15:14:17.461463 [ 2033 ] {} <Error> g*c.j*_local (8e95bb13-ec71-49cd-8e95-bb13ec71c9cd): Detaching broken part /data/clickhouse/store/8e9/8e95bb13-ec71-49cd-8e95-bb13ec71c9cd/20211223_5057_5057_0. If it happened after update, it is likely because of backward incompability. You need to resolve this manually
2021.12.23 15:14:17.623012 [ 2046 ] {} <Error> auto DB::MergeTreeData::loadDataParts(bool)::(anonymous class)::operator()() const: Code: 27. DB::ParsingException: Cannot parse input: expected 'columns format version: 1\n' at end of stream. (CANNOT_PARSE_INPUT_ASSERTION_FAILED), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x9366e7a in /usr/bin/clickhouse
1. DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x93c1717 in /usr/bin/clickhouse
2. DB::NamesAndTypesList::readText(DB::ReadBuffer&) @ 0x1031c0f8 in /usr/bin/clickhouse
3. DB::IMergeTreeDataPart::loadColumns(bool) @ 0x1141a7bb in /usr/bin/clickhouse
4. DB::IMergeTreeDataPart::loadColumnsChecksumsIndexes(bool, bool) @ 0x11419c69 in /usr/bin/clickhouse
5. ? @ 0x114b827a in /usr/bin/clickhouse
6. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x93aabb8 in /usr/bin/clickhouse
7. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x93ac75f in /usr/bin/clickhouse
8. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x93a7e9f in /usr/bin/clickhouse
9. ? @ 0x93ab783 in /usr/bin/clickhouse
10. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
11. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so
 (version 21.9.2.17 (official build))
2021.12.23 15:14:20.060892 [ 1869 ] {} <Error> Application: Caught exception while loading metadata: Code: 231. DB::Exception: Suspiciously many (15) broken parts to remove.: Cannot attach table `g*c`.`j*_local` from metadata file /data/clickhouse/store/d39/d39c2612-17b0-43e1-939c-261217b083e1/j*_local.sql from query ATTACH TABLE g*c.j*_local UUID 'f950b4dc-b0f6-4de8-b950-b4dcb0f6fde8' (...) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/g*c/j*_local', '{replica}') PARTITION BY toYYYYMMDD(data_time) ORDER BY (...) SETTINGS index_granularity = 8192: while loading database `g*c` from path /data/clickhouse/metadata/g*c. (TOO_MANY_UNEXPECTED_DATA_PARTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x9366e7a in /usr/bin/clickhouse
1. DB::MergeTreeData::loadDataParts(bool) @ 0x11465d60 in /usr/bin/clickhouse
2. DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::StorageInMemoryMetadata const&, std::__1::shared_ptr<DB::Context>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB::MergeTreeSettings, std::__1::default_delete<DB::MergeTreeSettings> >, bool, bool) @ 0x1119a498 in /usr/bin/clickhouse
3. ? @ 0x11687ba7 in /usr/bin/clickhouse
4. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, std::__1::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 0x11100f61 in /usr/bin/clickhouse
5. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool) @ 0x105ed4c5 in /usr/bin/clickhouse
6. ? @ 0x105eb573 in /usr/bin/clickhouse
7. ? @ 0x105ec5ff in /usr/bin/clickhouse
8. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x93aabb8 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x93ac75f in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x93a7e9f in /usr/bin/clickhouse
11. ? @ 0x93ab783 in /usr/bin/clickhouse
12. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
13. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so
 (version 21.9.2.17 (official build))
总结一下报错关键字:
  • Detaching broken part
  • DB::Exception: Suspiciously many (15) broken parts to remove
  • DB::ParsingException: Cannot parse input: expected ‘columns format version: 1\n’ at end of stream
先直接放解决方案:

查找官方文档中,在Data Replication说明这里,提到了故障恢复方法:

  • 方法1:在ZooKeeper 中创建节点 /path_to_table/replica_name/flags/force_restore_data,节点值可以是任何内容。 修改ZK是一个很有风险的操作,所以建议先尝试方法2。

  • 方法2: 运行以下命令:

1
sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data

注意这里flage目录可以是你安装时指定的具体clickhouse根目录。然后重启CK服务,CK会从另外一个备份中恢复数据。 这里是CK自带的故障恢复机制,前提是使用复制表(Replicated开头),本质是告诉CK,强制重建数据。建议使用此方法。

如果数据完全丢失的情况,进行restore时,CK本身没有带宽限制,表很多或数据量很大的话,需要做好网络压力以及时间评估。

问题分析:
  1. 在遇到错误后,我开始检查CK的数据文件目录,发现有以下情况:
1
2
3
4
5
6
7
8
9
ll -h /data/clickhouse/server/store/8e9/8e95bb13-ec71-49cd-8e95-bb13ec71c9cd/20211223_5057_5057_0
total 0
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 checksums.txt
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 columns.txt
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 count.txt
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 data.bin
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 data.mrk1
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 partition.dat
-rw-r----- 1 clickhouse clickhouse 0 Dec 23 16:32 primary.idx

目录下的所有文件都是空的(0B大小),原因无从得知,只能假定是因为服务器级别的异常重启,数据仍然在缓冲区中,没有写入磁盘?于是有了上面的“ParsingException”,CK没有读取到期望的值。

  1. 继续观察报错日志,发现错误日志开始于DB::MergeTreeData::loadDataParts(bool)方法,搜索一下源码中报错的位置

Detaching broken part

Suspiciously many broken parts to remove

得到CK的逻辑为: 启动时,检查本地文件系统中的数据集是否与预期的数据集( ZooKeeper 中信息)一致。如果存在轻微的不一致,系统会通过与副本同步数据来解决,如果系统检测到损坏的数据片段(如文件大小错误)或无法识别的片段(写入文件系统但未记录在 ZooKeeper 中的部分),则会把它们移动到 ‘detached’ 子目录(相当于逻辑删除),然后再从其他备份中去恢复这个数据片段。 但是注意这里是有一个安全机制的,即CK判断你损坏的片段大于一定的值(max_suspicious_broken_parts,对应源码图二中的逻辑),即“本地数据集与预期数据的差异太大”,CK将会拒绝帮你自动修复,并抛出异常、阻塞启动,这个时候你就必须手动执行恢复。

通过查询配置得到,max_suspicious_broken_parts参数的默认值是10:

1
2
3
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'

max_suspicious_broken_parts

通过此次异常处理,更加深了CK“一辆性能超强的手动跑车”的印象,如同传说中的法拉利开启了ESC-OFF死亡模式,生死完全掌握在使用者的手上,不愧是战斗名族开源出来的系统。在完善周边支撑的道路上,CK还有很长的路要走。