The ARM Linux Project
Linux for all ARM based machines
Latest News
Aarch64 problems
Posted by Russell King on Tuesday 29 December 2020 13:55
Update: Friday 8 January 2021 - Fixed!

Since kernel version 5.4, my Aarch64 systems have become very unreliable, requiring regular reboots to keep them working. Worringly, symptoms have so far pointed towards filesystem data corruption, which results in the root filesystem being marked read-only. This normally results in something like one of these messages:

EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #271688: comm mandb: iget: checksum invalid
[7478798.720368] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #157096: comm mandb: iget: checksum invalid
EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #173544: comm mandb: iget: checksum invalid
[365750.234472] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #166384: comm mandb: iget: checksum invalid
[4175456.231948] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:1004: inode #396582: comm find: Directory block failed checksum

The result is the journal is aborted, the rootfs is marked read only.

The known facts so far:

  • it has not been seen on kernel 5.2 on Armada 8040 hardware (with an uptime of 560 days).
  • it has been seen on all mainline kernel versions from 5.4 to 5.9.
  • it occurs on several of my Armada 8040 and NXP LX2160A based systems, which are both Cortex-A72 based systems. I have all the errata enabled in the kernel.
  • it seems independent of the media; it has been seen on the rootfs of two different NVMes on two different platforms, uSD, and eMMC.
  • it occurs between a week and three months, which makes attempting a bisection of the changes between 5.2 to 5.4 infeasible.
  • I've run xfstests (as suggested by tytso) on the LX2160A and generic/531 triggered the inode checksum error.
Investigation with debugfs sometimes shows that the inode checksum is invalid, but if the block device is flushed (via hdparm) and re-read from the media, the inode checksum is then correct. This implies that the data in memory/CPU caches does not match the data on the media, especially when the inode has not changed for days.

Below is a log of some of the recent instances:

29th February 2020

Error: [73729.556544] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid
Platform: NXP LX2160A
Media: XPG SX8200PNP NVMe
Kernel: 5.5
Uptime: 20 hours

Inode #917524 was /var/backups/dpkg.status.6.gz. Running e2fsck -n /dev/nvme0n1p2 without rebooting showed that the checksum was incorrect, so further investigation with debugfs was warranted:

debugfs:  id <917524>
0000  a481 0000 30ff 0300 3d3d 465e bd77 4f5e  ....0...==F^.wO^
0020  29ca 345e 0000 0000 0000 0100 0002 0000  ).4^............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 4000 0000 c088 3800  ........@.....8.
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 5fc4 cfb4 0000 0000 0000 0000  ...._...........
0160  0000 0000 0000 0000 0000 0000 af23 0000  .............#..
0200  2000 1cc3 ac95 c9c8 a4d2 9883 583e addf   ...........X>..
0220  3de0 485e b04d 7151 0000 0000 0000 0000  =.H^.MqQ........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
debugfs:  stat <917524>
Inode: 917524   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 3033515103    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 261936
File ACL: 0
Links: 1   Blockcount: 512
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5e4f77bd:c8c995ac -- Fri Feb 21 06:25:01 2020
 atime: 0x5e463d3d:dfad3e58 -- Fri Feb 14 06:25:01 2020
 mtime: 0x5e34ca29:8398d2a4 -- Sat Feb  1 00:45:29 2020
crtime: 0x5e48e03d:51714db0 -- Sun Feb 16 06:25:01 2020
Size of extra inode fields: 32
Inode checksum: 0xc31c23af
EXTENTS:
(0-63):3705024-3705087
This is, as I remember, operating on the in-memory data rather than the on-disk data, and the inode checksum of 0xc31c23af was incorrect. I corrected the checksum using debugfs "sif" command, which wrote a corrected checksum. This resulted in:
debugfs:  id <917524>
0000  a481 0000 30ff 0300 3d3d 465e bd77 4f5e  ....0...==F^.wO^
0020  29ca 345e 0000 0000 0000 0100 0002 0000  ).4^............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 4000 0000 c088 3800  ........@.....8.
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 5fc4 cfb4 0000 0000 0000 0000  ...._...........
0160  0000 0000 0000 0000 0000 0000 b61f 0000  ................
                                    ^^^^
0200  2000 aa15 ac95 c9c8 a4d2 9883 583e addf   ...........X>..
           ^^^^
0220  3de0 485e b04d 7151 0000 0000 0000 0000  =.H^.MqQ........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
With only that change, e2fsck then passed:
e2fsck -n /dev/nvme0n1p2
e2fsck 1.44.5 (15-Dec-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/nvme0n1p2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p2: 121163/2097152 files (0.1% non-contiguous), 1349227/8388608 blocks
The file seemed to be intact; being a gzip file, that's easy to verify since gzip files contain their own checksums, and if the data is invalid they won't be readable anyway.

6th June 2020

Error: EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #271688: comm mandb: iget: checksum invalid
Platform: NXP LX2160A
Media: XPG SX8200PNP NVMe

When I originally noticed the problem just after midnight, debugfs said that the inode did indeed have an incorrect checksum. However, by 11am, debugfs said the checksum was correct - the machine had not been rebooted and the rootfs was mounted read-only. This suggests that the on-media copy was in fact correct, but the in-memory copy was incorrect.

This is the dump of the inode after it had "self-healed":

debugfs:  id <271688>
0000  a481 0000 f108 0000 2518 fd5d 2518 fd5d  ........%..]%..]
0020  9f49 715c 0000 0000 0000 0100 0800 0000  .Iq\............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 0100 0000 ed19 1100  ................
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 b42f 4f06 0000 0000 0000 0000  ...../O.........
0160  0000 0000 0000 0000 0000 0000 c9cf 0000  ................
0200  2000 8d83 086d bebf 0000 0000 086d bebf   ....m.......m..
0220  2518 fd5d 086d bebf 0000 0000 0000 0000  %..].m..........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
debugfs:  stat <271688>
Inode: 271688   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 105852852    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 2289
File ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5dfd1825:bfbe6d08 -- Fri Dec 20 18:51:17 2019
 atime: 0x5dfd1825:bfbe6d08 -- Fri Dec 20 18:51:17 2019
 mtime: 0x5c71499f:00000000 -- Sat Feb 23 13:24:47 2019
 crtime: 0x5dfd1825:bfbe6d08 -- Fri Dec 20 18:51:17 2019
Size of extra inode fields: 32
Inode checksum: 0x838dcfc9
EXTENTS:
(0):1120749
# e2fsck -n /dev/nvme0n1p2
e2fsck 1.44.5 (15-Dec-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/nvme0n1p2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p2: 147476/2097152 files (0.1% non-contiguous), 1542719/8388608 blocks
12th July 2020

Error: [7478798.720368] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #157096: comm mandb: iget: checksum invalid
Platform: SolidRun Clearfog GT-8k (Armada 8040)
Media: eMMC
Uptime: 89 days

Inode #157096 is /usr/share/man/nl/man1/apt-transport-mirror.1.gz, which debugfs gives:

  ctime: 0x5ebcd62f:ba34bf1c -- Thu May 14 06:25:03 2020
  atime: 0x5ebcd63b:a2906fa0 -- Thu May 14 06:25:15 2020
  mtime: 0x5eba730a:00000000 -- Tue May 12 10:57:30 2020
 crtime: 0x5ebcd62f:a25cccf4 -- Thu May 14 06:25:03 2020
 Inode checksum: 0x13fd5c3c (bad)
 Inode checksum: 0x600eba80 (good)
The different checksum is the only difference that debugfs reports for the inode between the failing and corrected inodes. Also, we seem to have an inode that has demonstrably not changed for over a month with an incorrect checksum in memory, but good checksum on the media. This seems to mean that either the checksum in memory is wrong or the data in memory is wrong. The following rather confirms this.

Running e2fsck -n /dev/mmcblk0p1 without a reboot gave:

Inode 13755 passes checks, but checksum does not match inode.  Fix? no
Inode 157096 passes checks, but checksum does not match inode.  Fix? no
Simply using "hdparm -f" does not make these errors go away; e2fsck then did not complain about the checksum failures.

The contents of the file are valid gzip, and the only thing that is wrong is the inode checksum.

16th August 2020

Error: EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #173544: comm mandb: iget: checksum invalid
Platform: SolidRun Macchiatobin Single-shot (Armada 8040)
Media: eMMC

This is another instance where the problem with inode #173544 has corrected itself.

30th August 2020

Error: [365750.234472] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #166384: comm mandb: iget: checksum invalid
Platform: SolidRun Clearfog GT-8K (Armada 8040)
Media: eMMC
Uptime: 4 days
Kernel: 5.8

I've added some debug code to ext4_inode_csum_verify() to dump out the inode contents and checksums when there is a checksum failure.

9th November 2020 failure

Error: not recorded iget: checksum invalid
Platform: SolidRun Clearfog GT-8K (Armada 8040)
Media: eMMC
Uptime: 70 days
Kernel: 5.8

After the previous instance, I added some debug, which, after running for 70 days on kernel 5.8, the kernel spat out another inode checksum failure along with my debug:

[6131696.234604] provided = ea2b60d5 calculated = 7929a3c0
[6131696.238402] inode(ffffff839e059500) = a4 81 00 00 46 0d 00 00 5c 92 88 5e 17 92 88 5e c6 56 f0 5b 00 00 00 00 00 00 01 00 08 00 00 00 00 00 08 00 01 00 00 00 0a f3 01 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 e4 14 0a 00

This translates to (ext4 data is little endian):

i_mode = 0x81a4
i_uid = 0x0000
i_size_lo = 0x00000d46
i_atime = 0x5e88925c (Sat Apr  4 14:57:48 2020 +0100)
i_ctime = 0x5e889217 (Sat Apr  4 14:56:39 2020 +0100)
i_mtime = 0x5bf056c6 (Sat Nov 17 17:58:30 2018 +0100)
i_dtime = 0x00000000
i_gid = 0x0000
i_links_count = 0x0001
i_blocks_lo = 0x00000008
i_flags = 0x00080000
l_i_version = 0x00000001
i_block = {
	0x0001f30a
	0x00000004
	0x00000000
	0x00000000
	0x00000001
	0x000a14e4
	...
}
Note that i_blocks has many different purposes, and is 15 32-bit words long.

Further investigation reveals:

  • The data dump above (first 64 bytes) matches the on-media copy.
  • The access/modification times are months before the time when the checksum error has happened, suggesting that the inode has not been modified recently.
  • The "provided" checksum is correct for the data on the media, as confirmed with debugfs.
Unfortunately, this is not the complete 256 bytes of inode, so there is no way to know why the checksum has failed - it doesn't even contain the stored checksums (which are stored as two separate 16-bit integers.) I updated the debug code to print the full 256 bytes of the inode as per the patch below, rebooted the system into a 5.9 kernel and waited for the problem to recur.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index bf596467c234..f5d335452f1d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -98,6 +98,13 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw,
 	else
 		calculated &= 0xFFFF;
 
+if (provided != calculated) {
+  pr_err("provided = %08x calculated = %08x\n", provided, calculated);
+  pr_err("inode(%p)\n", raw);
+  print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, raw, EXT4_INODE_SIZE(inode->i_sb), false);
+  pr_err("recalculated = %08x\n", ext4_inode_csum(inode, raw, ei));
+}
+
 	return provided == calculated;
 }
 
In this patch, I print the in-memory checksum and the calculated checksum, print the address of the inode, dump all 256 bytes of the inode, and then print a calculated the checksum (which should match the initial calculation. If it doesn't match the first calculation, it means that there's some bug in the CRC32c crypto code, or a problem with memory ordering/coherency.

28th December 2020 failure

Error: [4175456.231948] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:1004: inode #396582: comm find: Directory block failed checksum
Platform: SolidRun Clearfog GT-8K (Armada 8040)
Media: eMMC
Uptime: 48 days

This failure is different from the previous, as it did not produce the usual "iget: checksum invalid" but "Directory block failed checksum" instead, which has never been seen before. The directory concerned is "/var/cache/man/pt/cat8". However, as with many of the previous instances, I find that the problem seems to have "self-healed" by the time I've noticed it. The directory which failed its checksum is perfectly readable by the system (and without rebooting it) - again suggesting that the in-memory copy was faulty but the on-media copy was fine.

Consequently, it means a reboot and restarting the three month wait for the next failure.

Some questions and answers:

  1. how does ext4 calculate the inode checksums?
    The checksums are calculated by calling out to the kernel's crypto shash crc32c code.
  2. how are ext4 inodes aligned on disk and memory?
    ext4 inodes on this media are 256 bytes in size, and are naturally aligned.
The corruption feels very much like a memory ordering bug or a cache coherence bug, but these systems are supposed to be cache coherent.

As this has been going on for so long, and there isn't a clear cause, it has completely eroded my confidence in Aarch64 as a viable architecture for running anything useful. This presents quite a problem: if the problem "vanishes" without there being an adequate explanation (e.g. if changing the compiler or filesystem type), how would I know that the systems are then stable? Would they be stable if it runs without problem for three months, six months, a year, a decade?

I have reported this problem a number of times on mailing lists but it has attracted very little interest - somewhat understandably so, given that it takes up to three months to appear.

4th January 2021

Today, I've been able to trigger an inode checksum failure a few times on the LX2160A:

provided = d06328dd calculated = 3ba43925
inode(ffffffa6d6782000)
00000000: a4 81 00 00 de 08 00 00 79 2b 05 5e 7e 2b 05 5e
00000010: 6a cb 45 5c 00 00 00 00 00 00 01 00 08 00 00 00
00000020: 00 00 08 00 01 00 00 00 0a f3 01 00 04 00 00 00
00000030: 00 00 00 00 00 00 00 00 01 00 00 00 30 a2 0a 00
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000060: 00 00 00 00 24 42 a8 74 00 00 00 00 00 00 00 00
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 dd 28 00 00
00000080: 20 00 63 d0 5c 48 ce e5 00 00 00 00 00 00 00 00
00000090: 7e 2b 05 5e e0 1a a2 ab 00 00 00 00 00 00 00 00
000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
recalculated = d06328dd
EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #144001: comm md5sum: iget: checksum invalid

The dump of the inode data appears to be correct, and the recalculated checksum after dump agrees with the checksum in the inode itself. This is, of course, the worst news, because it doesn't really narrow down what is going on. The same questions remain - was the data fed into the first checksum incorrect in some way (due to cache coherence or memory ordering) or was the checksum calculation faulty? There's no way to know from the above. What we can say is that dumping the data and reculating the checksum gives the correct answer, and the first checksum is wrong for some reason.

I have disabled the ARM64 optimised CRC32 support in (arch/arm64/lib/crc32.S) introduced in 7481cddf29ed ("arm64/lib: add accelerated crc32 routines") which was part of v4.20, and as expected, the problem still exists.

5th January 2021

Having ruled out the CRC32 code yesterday, that left two possibilities - cache coherence or memory ordering. To work out which, I decided to add a mb() into ext4_inode_csum_verify() right before the checksum is initially calculated. Initial testing on the LX2160A platform seems to suggest that makes the inode checksum failure much less likely to happen. (I am not going to say "doesn't" since it's going to take at least three if not more months to even give a hint.)

Digging further, there was a change during the 5.4 merge window which changed the barriers - 22ec71615d82 ("arm64: io: Relax implicit barriers in default I/O accessors"). Will Deacon assures me that this is correct, and he spent a long time validating it. I have now reverted this commit, rebuilt the kernel and put it on the ARM64 platforms that I have been running 5.4+ kernels. Time (three to six months) will tell whether this has fixed the problem.

In the mean time, some further debug - I've tried changing the __iormb() and __iowmb() to use "dmb osh" rather than the load/store variants. I've now ended up with this from the LX2160A platform:

[   23.252955] provided = d22f8aab calculated = cac5d3d7
[   23.256697] inode(ffffffa6d9006f00)
[   23.258963] 00000000: a4 81 00 00 43 02 00 00 ec 56 f3 5f 2c 18 fd 5d
[   23.264104] 00000010: 7d c9 ff 5b 00 00 00 00 00 00 01 00 08 00 00 00
[   23.269246] 00000020: 00 00 08 00 01 00 00 00 0a f3 01 00 04 00 00 00
[   23.274389] 00000030: 00 00 00 00 00 00 00 00 01 00 00 00 76 91 20 00
[   23.279529] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.284671] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.289813] 00000060: 00 00 00 00 fd aa b9 50 00 00 00 00 00 00 00 00
[   23.294953] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 ab 8a 00 00
[   23.300097] 00000080: 20 00 2f d2 8c 79 5a c7 00 00 00 00 48 4f 1c 90
[   23.305239] 00000090: 2c 18 fd 5d 8c 79 5a c7 00 00 00 00 00 00 00 00
[   23.310378] 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.315520] 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.320659] 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.325798] 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.330940] 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.336079] 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.341218] recalculated = cac5d3d7
[   23.343397] EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #525808: comm md5sum: iget: checksum invalid

This is unexpected - note that the recalculated value is the same as the initial calculation. However, after extensively checking the hexdump, the hexdump matches what is on disk, and debugfs and e2fsck are happy with that - as now is the kernel. So, d22f8aab is in fact the correct checksum. And the hexdump is correct. But calling ext4_inode_csum() after dumping the entire inode correctly still produced the wrong result. This makes little sense.

Further testing - with __iormb() using dma_rmb() (dmb oshld) and __iowmb() using wmb() (dsb st), the problem remains:

provided = 2e5f9e28 calculated = 12ec3a0e
...
recalculated = 2e5f9e28
EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #272094: comm md5sum: iget: checksum invalid

However, testing with __iormb() as rmb() (dsb ld) and __iowmb() as dma_wmb() (dmb oshst) appears to pass tests.

7th January 2021

Finally, we have got to the bottom of this problem, with the help of Will Deacon and Arnd Bergmann. It appears to be a bug in mainline gcc-4.9 - Android and Linaro gcc-4.9 have the fix. The kernel tickles this bug in the EXT4 checksum code with the stack protector disabled. This exhibits itself with this code in ext4:

static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
                              const void *address, unsigned int length)
{
        struct {
                struct shash_desc shash;
                char ctx[4];
        } desc;

        BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));

        desc.shash.tfm = sbi->s_chksum_driver;
        *(u32 *)desc.ctx = crc;

        BUG_ON(crypto_shash_update(&desc.shash, address, length));

        return *(u32 *)desc.ctx;
}
generating:
0000000000000004 <ext4_chksum.isra.14.constprop.19>:
   4:   a9be7bfd        stp     x29, x30, [sp, #-32]!           <------
   8:   2a0103e3        mov     w3, w1
   c:   aa0203e1        mov     x1, x2
  10:   910003fd        mov     x29, sp                         <------
  14:   f9000bf3        str     x19, [sp, #16]
  18:   d10603ff        sub     sp, sp, #0x180                  <------
  1c:   9101fff3        add     x19, sp, #0x7f			<------
  20:   b9400002        ldr     w2, [x0]
  24:   9279e273        and     x19, x19, #0xffffffffffffff80   <------
  28:   7100105f        cmp     w2, #0x4
  2c:   540001a1        b.ne    60 <ext4_chksum.isra.14.constprop.19+0x5c>  // b.any
  30:   2a0303e4        mov     w4, w3
  34:   aa0003e3        mov     x3, x0
  38:   b9008264        str     w4, [x19, #128]			<------
  3c:   aa1303e0        mov     x0, x19
  40:   f9000263        str     x3, [x19]                       <------
  44:   94000000        bl      0 <crypto_shash_update>
                        44: R_AARCH64_CALL26    crypto_shash_update
  48:   350000e0        cbnz    w0, 64 <ext4_chksum.isra.14.constprop.19+0x60>
  4c:   910003bf        mov     sp, x29                         <======
  50:   b9408260        ldr     w0, [x19, #128]                 <======
  54:   f9400bf3        ldr     x19, [sp, #16]
  58:   a8c27bfd        ldp     x29, x30, [sp], #32
  5c:   d65f03c0        ret
  60:   d4210000        brk     #0x800
  64:   97ffffe7        bl      0 <ext4_chksum.isra.14.part.15>
The bug is the order of the instructions marked with "<======" - this deallocates the local variable "desc" from the stack, and then reads from it. If we receive an interrupt and context switch at that point, "desc" will be overwritten, and hence the checksum will be corrupted.

8 January 2021

It is a big relief that a definitive reason for the problem has finally been found. When you consider that merely upgrading the compiler would have made the bug vanish without explanation, that would leave one in a situation where you don't know whether the bug has been solved, or whether it has been merely masked by different instruction timings. This in turn means that you'd forever be wondering whether your filesystems would be corrupted, or your system would fail at some random point in the future - would you trust your data on such a system? Many would likely not.

Hence, it became very important to find the cause of this problem. As I have said, I got to the point of considering taking all my Aarch64 hardware down to the local recycling centre precisely because this bug had completely eroded my ability to trust Aarch64 as an architecture, and it was taking so long to track down the bug.

I am very grateful to Will Deacon and Arnd Bergmann for their time helping to track this down - which was really key. Will Deacon found a recipe that reproduced it more reliably than I had managed. Will also identified that 5.10 built with his kernel configuration did not exhibit it, but 5.9 built with my configuration did - that then gave me something to work with, to identify what change in the kernel configuration seemed to mask the bug. Thanks!

SSL Enabled
Posted by Russell King on Tuesday 15 January 2019 14:37
As part of the ongoing upgrades, we now have SSL enabled on this website! Links emailed out now use the SSL version, although the non-SSL version is still accessible (and won't, at the moment, bounce you to the SSL site.) This will change in future.

At the moment, the certificate does not include the old "www.arm.linux.org.uk" address as an alternative name in the certificate, but this will be included when it comes up for renewal.

We're using the Linux Foundation's LetsEncrypt.org as the issuing CA, with Dehydrated to manage the regular certificate renewals as necessary.

As mentioned in the previous article, please report any breakages you may find.

Thanks.

News Archive...
Note: please do not use "off-line" site downloaders (eg, HTTrack) against this site.