After much heavy network activity, all networking stops, kernel_task spins at 100%, ifconfig and AirPort menu extra hang
Originator: | mark | ||
Number: | rdar://32925139 | Date Originated: | 2017-06-22 |
Status: | Closed | Resolved: | 2017-08-01 |
Product: | macOS + SDK | Product Version: | 10.13db2 17A291j |
Classification: | Serious Bug | Reproducible: | Always |
Area: Networking Summary: The Google Chrome team has a distributed compilation environment. It’s internal-only, but it’s conceptually similar to distcc. On the macOS 10.13 betas (both beta 1 and beta 2 so far), when I attempt to use this service to build Chrome, all networking stops about 2/3 of the way into the build. The immediately apparent symptom is that the build stops progressing. I can stop the build, but by then, the damage is done in that networking remains unusable. I can’t pass any traffic over the network when this happens. If I run “ifconfig”, it hangs partway through dumping the “p2p0” interface. If I click on the AirPort menu extra, nothing happens and a beach ball eventually appears over the menu extra. If I try to open the Network preference pane in System Preferences, nothing populates. Running “top -ocpu”, I see kernel_task at 100%, indicating that something’s spinning in a tight look in the kernel. ^C and SIGKILL do not recover the hung ifconfigs. It is impossible to shut the system down cleanly either via the Apple menu:Restart or via “sudo shutdown -r now”. Steps to Reproduce: I can’t provide solid reproduction steps because I’ve only been able to reproduce the problem using our internal distributed build service. This service functions similarly to distcc and does not run anything at elevated privileges. Essentially, it’s: % git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git [also get goma, our distributed compilation tool, and start it] % PATH="${PATH}:$(pwd)/depot_tools:$(pwd)/goma_mac" % mkdir chrome % cd chrome % fetch chrome [wait] % cd chrome % gn gen out/debug --args="use_goma=true goma_dir=\"$(pwd)/goma_mac\"" % ninja -C out/debug chrome -j250 Expected Results: The build should complete successfully. Observed Results: After building 21,000 or so files out of 29,000 or so, the build stops progressing. You’ll notice that networking is not working. Browsers can’t browse, ping shows no connectivity, etc. The AirPort menu extra and Networking preference pane don’t work. kernel_task is using 100% CPU. Run “ifconfig” and it’ll hang irrecoverably part of the way through dumping the p2p0 interface. I’m including a snippet of that here because the sysdiagnose probably missed it since ifconfig never completed. litterbox@litterbox zsh% ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=201<PERFORMNUD,DAD> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 XHC20: flags=0<> mtu 0 XHC0: flags=0<> mtu 0 XHC1: flags=0<> mtu 0 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether uu:vv:ww:xx:yy:zz inet6 blahblahblah%en0 prefixlen 64 secured scopeid 0x7 inet6 blahblahblah prefixlen 64 autoconf secured inet6 blahblahblah prefixlen 64 autoconf temporary nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether uu:vv:ww:xx:yy:zz media: autoselect <full-duplex> status: inactive en3: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether uu:vv:ww:xx:yy:zz media: autoselect <full-duplex> status: inactive en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether uu:vv:ww:xx:yy:zz media: autoselect <full-duplex> status: inactive en4: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether uu:vv:ww:xx:yy:zz media: autoselect <full-duplex> status: inactive p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 ether uu:vv:ww:xx:yy:zz Normally, ifconfig should have continued by printing p2p0’s status line, and several other interfaces. After a fresh reboot, those look like status: inactive awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 ether uu:vv:ww:xx:yy:zz inet6 blahblahblah%awdl0 prefixlen 64 scopeid 0xf nd6 options=201<PERFORMNUD,DAD> media: autoselect status: active ipsec0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> ipsec1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 4096 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM> inet6 blahblahblah%utun0 prefixlen 64 scopeid 0x12 nd6 options=201<PERFORMNUD,DAD> Version: 10.13db2 17A291j with Xcode 9b2 9M137d. I experienced this with 10.13db1 17A264c and Xcode 9b1 9M136h too. I see this problem when I’m building on an APFS or HFS+ filesystem. The system is a MacBook Pro (15-inch, 2016) (MacBookPro13,3). Configuration: We never had any trouble up to and including 10.12.5 16F73.
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!
2017-08-08 14:18 UTC from Apple
Thanks for your feedback, we have reviewed it.
2017-08-07 21:05 UTC to Apple
This bug does indeed still occur. I refiled it as radar 33761055.
2017-08-01 15:50 UTC from Apple
Thanks for your update.
We are closing this report.
If you see this issue again on a current release, please file a new bug report with fresh diagnostics.
If you have further questions, please update your report again at:
http://bugreport.apple.com
2017-08-01 15:03 UTC to Apple
I can no longer reproduce this bug in 10.11db4 17A315i. Thanks.
2017-07-24 20:22 from Apple
We believe this issue has been resolved in the latest macOS 10.13 beta.
Please test with the latest beta. If you still have issues, please update your bug report with any relevant logs or information that could help us investigate.
macOS https://developer.apple.com/download/
2017-07-13 17:37 UTC from Apple
A solution is under investigation. We will follow up with you again when it is available.
2017-06-22 16:12 UTC to Apple
Also, en0 normally has an inet address (IPv4), but I see that this is missing from the dump of “en0” in the ifconfig I show in “Observed Results”.