今日遇到了SSD位反转(bit flip),表现为ClickHouse读取时报校验和失败:
1 2 |
ServerException: Code: 40. DB::Exception: Checksum doesn't match: corrupted data. Reference: 57c8cfd63fcca636b19fc5a16ab22cf8. Actual: 3cf7a87a4f6f7473958ce53e35a46203. Size of compressed block: 18726. The mismatch is caused by single bit flip in data block at byte 814, bit 7. This is most likely due to hardware failure. If you receive broken data over network and the error does not repeat every time, this can be caused by bad RAM on network interface controller or bad controller itself or bad RAM on network switches or bad CPU on network switches (look at the logs on related network switches; note that TCP checksums don't help) or bad RAM on host (look at dmesg or kern.log for enormous amount of EDAC errors, ECC-related reports, Machine Check Exceptions, mcelog; note that ECC memory can fail if the number of errors is huge) or bad CPU on host. If you read data from disk, this can be caused by disk bit rot. This exception protects ClickHouse from data corruption due to hardware failures.: (while reading column b1v): (while reading from part /home/clickhouse/store/68a/68a9d0fd-0900-4f68-a8a9-d0fd09009f68/20201015_285559_285788_4/ from mark 284 with max_rows_to_read = 509): While executing MergeTreeThread. |
可见ClickHouse的实际使用场景中所面对的规模确实需要考虑对位反转问题,但搜索该问题资料并不多(https://github.com/marliotto/clickhouse-bitflip,有一个go的修复实现可以学习原理),可见位反转问题的极低概率。
多次尝试排除报错中的如下可能:
1、网卡RAM问题
2、交换机RAM问题
3、交换机CPU问题
4、服务器RAM问题(该服务器使用的是非ECC内存)
因此基本确认是存储问题:
参考资料1中针对SSD常见的概念有较为详细的描述,其中针对BER:
该数据所在的SSD是一块消费级aigo 2TB P7000,写入量16TB:
对比一块近期替换下的重载500GB 970 EVO,写入量29TB,寿命剩余97%,但竟然已经用到了4%的Spare Space。可见,三星主控并没有把Spare Space考虑到寿命百分比中。之前按直觉果断将其替换掉,在一定程度上是正确的:
再对比一块写入量只有4TB的2242盘位256GB TOSHIBA RC100,剩余寿命仅剩76%(当然,这与东芝的保修策略有关):
早在2017年,参考资料3上有人对此进行过讨论:
其中提到7年前Reddit上第一个帖子:
Reddit上的原贴如下:
印象中很久之前曾有文章提到RAID5单盘失效重建期间读挂另一块盘的概率非常高,并非危言耸听。
翻了几个常见消费级产品的手册,其中基本都只提到了TBW而没有提到UBER:
对比常见的企业级的Intel S4510:
可见,虽然同样都是TLC,但企业级SSD写入量达到了6.5PBW,是桌面级产品的5倍。而且,明确标识了UBER,为1/10^17。
参考资料:
1、https://sspai.com/post/69074
2、https://ark.intel.com/content/www/cn/zh/ark/products/134924/intel-ssd-d3s4510-series-1-92tb-2-5in-sata-6gbs-3d2-tlc.html
3、https://exp.newsmth.net/topic/88886a8a4c45d9da9479aed96ec2048b/2
4、https://www.reddit.com/r/zfs/comments/3gpkm9/statistics_on_realworld_unrecoverable_read_error/
转载时请保留出处,违法转载追究到底:进城务工人员小梅 » 【震惊】1/10^16的概率被我们碰上了——记SSD位反转(bit flip)一例