After much heavy network activity, all networking stops, kernel_task spins at 100%, ifconfig and AirPort menu extra hang

Originator:mark
Number:rdar://32925139 Date Originated:2017-06-22
Status:Closed Resolved:2017-08-01
Product:macOS + SDK Product Version:10.13db2 17A291j
Classification:Serious Bug Reproducible:Always
 
Area:
Networking

Summary:
The Google Chrome team has a distributed compilation environment. It’s internal-only, but it’s conceptually similar to distcc.

On the macOS 10.13 betas (both beta 1 and beta 2 so far), when I attempt to use this service to build Chrome, all networking stops about 2/3 of the way into the build.

The immediately apparent symptom is that the build stops progressing. I can stop the build, but by then, the damage is done in that networking remains unusable. I can’t pass any traffic over the network when this happens. If I run “ifconfig”, it hangs partway through dumping the “p2p0” interface. If I click on the AirPort menu extra, nothing happens and a beach ball eventually appears over the menu extra. If I try to open the Network preference pane in System Preferences, nothing populates.

Running “top -ocpu”, I see kernel_task at 100%, indicating that something’s spinning in a tight look in the kernel.

^C and SIGKILL do not recover the hung ifconfigs. It is impossible to shut the system down cleanly either via the Apple menu:Restart or via “sudo shutdown -r now”.

Steps to Reproduce:
I can’t provide solid reproduction steps because I’ve only been able to reproduce the problem using our internal distributed build service. This service functions similarly to distcc and does not run anything at elevated privileges.

Essentially, it’s:

% git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
[also get goma, our distributed compilation tool, and start it]
% PATH="${PATH}:$(pwd)/depot_tools:$(pwd)/goma_mac"
% mkdir chrome
% cd chrome
% fetch chrome
[wait]
% cd chrome
% gn gen out/debug --args="use_goma=true goma_dir=\"$(pwd)/goma_mac\""
% ninja -C out/debug chrome -j250

Expected Results:
The build should complete successfully.

Observed Results:
After building 21,000 or so files out of 29,000 or so, the build stops progressing. You’ll notice that networking is not working. Browsers can’t browse, ping shows no connectivity, etc. The AirPort menu extra and Networking preference pane don’t work. kernel_task is using 100% CPU. Run “ifconfig” and it’ll hang irrecoverably part of the way through dumping the p2p0 interface. I’m including a snippet of that here because the sysdiagnose probably missed it since ifconfig never completed.

litterbox@litterbox zsh% ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
	options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
	inet 127.0.0.1 netmask 0xff000000 
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
	nd6 options=201<PERFORMNUD,DAD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
XHC20: flags=0<> mtu 0
XHC0: flags=0<> mtu 0
XHC1: flags=0<> mtu 0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	ether uu:vv:ww:xx:yy:zz 
	inet6 blahblahblah%en0 prefixlen 64 secured scopeid 0x7 
	inet6 blahblahblah prefixlen 64 autoconf secured 
	inet6 blahblahblah prefixlen 64 autoconf temporary 
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect
	status: active
en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
	options=60<TSO4,TSO6>
	ether uu:vv:ww:xx:yy:zz 
	media: autoselect <full-duplex>
	status: inactive
en3: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
	options=60<TSO4,TSO6>
	ether uu:vv:ww:xx:yy:zz 
	media: autoselect <full-duplex>
	status: inactive
en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
	options=60<TSO4,TSO6>
	ether uu:vv:ww:xx:yy:zz 
	media: autoselect <full-duplex>
	status: inactive
en4: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
	options=60<TSO4,TSO6>
	ether uu:vv:ww:xx:yy:zz 
	media: autoselect <full-duplex>
	status: inactive
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
	ether uu:vv:ww:xx:yy:zz 

Normally, ifconfig should have continued by printing p2p0’s status line, and several other interfaces. After a fresh reboot, those look like

	status: inactive
awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484
	ether uu:vv:ww:xx:yy:zz 
	inet6 blahblahblah%awdl0 prefixlen 64 scopeid 0xf 
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect
	status: active
ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096
	options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
ipsec1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096
	options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000
	options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
	inet6 blahblahblah%utun0 prefixlen 64 scopeid 0x12 
	nd6 options=201<PERFORMNUD,DAD>

Version:
10.13db2 17A291j with Xcode 9b2 9M137d.

I experienced this with 10.13db1 17A264c and Xcode 9b1 9M136h too.

I see this problem when I’m building on an APFS or HFS+ filesystem.

The system is a MacBook Pro (15-inch, 2016) (MacBookPro13,3).

Configuration:
We never had any trouble up to and including 10.12.5 16F73.

Comments

2017-08-08 14:18 UTC from Apple

Thanks for your feedback, we have reviewed it.

2017-08-07 21:05 UTC to Apple

This bug does indeed still occur. I refiled it as radar 33761055.

2017-08-01 15:50 UTC from Apple

Thanks for your update.

We are closing this report.

If you see this issue again on a current release, please file a new bug report with fresh diagnostics.

If you have further questions, please update your report again at:

http://bugreport.apple.com

2017-08-01 15:03 UTC to Apple

I can no longer reproduce this bug in 10.11db4 17A315i. Thanks.

2017-07-24 20:22 from Apple

We believe this issue has been resolved in the latest macOS 10.13 beta.

Please test with the latest beta. If you still have issues, please update your bug report with any relevant logs or information that could help us investigate.

macOS https://developer.apple.com/download/

2017-07-13 17:37 UTC from Apple

A solution is under investigation. We will follow up with you again when it is available.

2017-06-22 16:12 UTC to Apple

Also, en0 normally has an inet address (IPv4), but I see that this is missing from the dump of “en0” in the ifconfig I show in “Observed Results”.


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!