Bug 3584 - Segfault when built with optimisations on macOS 13 (x86_64) with Xcode 14.3
Summary: Segfault when built with optimisations on macOS 13 (x86_64) with Xcode 14.3
Status: RESOLVED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh-keygen (show other bugs)
Version: 9.3p1
Hardware: amd64 Mac OS X
: P5 major
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-24 02:03 AEST by Carlo Cabrera
Modified: 2023-10-12 20:17 AEDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Carlo Cabrera 2023-06-24 02:03:45 AEST
Building openssh 9.3p1 with `-Os` in CFLAGS on macOS 13 using Xcode 14 (with, e.g., `./configure && make install`) fails due to a segfault when `make` runs `ssh-keygen -A`:

```
/bin/bash: line 1: 13268 Segmentation fault: 11  ./ssh-keygen -A
```

Here's what I get out of lldb using the just-built `ssh-keygen`:
```
❯ lldb -- ./ssh-keygen -A
(lldb) target create "./ssh-keygen"
Current executable set to '/tmp/openssh-20230623-7195-4d1ep3/openssh-9.3p1/ssh-keygen' (x86_64).
(lldb) settings set -- target.run-args  "-A"
(lldb) r
Process 15308 launched: '/tmp/openssh-20230623-7195-4d1ep3/openssh-9.3p1/ssh-keygen' (x86_64)
Process 15308 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000300e ssh-keygen`main(argc=0, argv=0x0000000000000000) at ssh-keygen.c:3355:32 [opt]
   3352         /* Ensure that fds 0, 1 and 2 are open or directed to /dev/null */
   3353         sanitise_stdfd();
   3354
-> 3355         __progname = ssh_get_progname(argv[0]);
   3356
   3357         seed_rng();
   3358
Target 0: (ssh-keygen) stopped.
warning: ssh-keygen was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010000300e ssh-keygen`main(argc=0, argv=0x0000000000000000) at ssh-keygen.c:3355:32 [opt]
    frame #1: 0x00007ff80f3fb41f dyld`start + 1903
(lldb) fr v argv
(char **) argv = 0x0000000000000000
```

I haven't worked out why `argv` is a null pointer, but that seems to be what is happening.

Building openssh without any `-O` flags makes the segfault go away.

The segfault also does *not* occur on the following (even with `-Os`):
- macOS 13 on arm64 with Xcode 14.3
- macOS 12 on both x86_64 and arm64 with Xcode 14.2
- macOS 11 on both x86_64 and arm64 with Xcode 13.2
Comment 1 Damien Miller 2023-06-26 15:35:22 AEST
This really looks like a bad bug in XCode/clang. It might be caused by an incompatibility between the options we set in configure.ac and -Os, which admittedly doesn't get a lot of test coverage.

Could you try rebuilding after "configure --without-hardening" and seeing if that helps?
Comment 2 Carlo Cabrera 2023-06-27 16:28:34 AEST
Yes, at Homebrew, we've also come to the conclusion that this is a compiler bug (likely in the backend). I'll try to find the time to report this to Apple.

Passing `--without-hardening` to `configure` also makes the segfault go away, even if we pass `-Os` to the compiler.

Do you have a recommendation on which workaround is better to adopt?
Comment 3 Darren Tucker 2023-06-27 18:06:08 AEST
(In reply to Carlo Cabrera from comment #2)
> Do you have a recommendation on which workaround is better to adopt?

IMO you'd be better off with the compiler hardening flags rather than -Os.  Things like -ftrapv could mitigate what would otherwise be a vulnerability.

If you want to investigate further, you could enumerate the flags added by --with-hardening (which will depend on what the compiler supports, you could diff Makefile generated with and without) and add them to CFLAGS one at a time along with -Os and see if you can narrow down which of them triggers the problem.

(I tried installing xcode 14.3 to reproduce but my test mac doesn't support a new enough OSX version to do that.)
Comment 4 Carlo Cabrera 2023-06-27 19:06:10 AEST
> IMO you'd be better off with the compiler hardening flags rather
> than -Os.  Things like -ftrapv could mitigate what would otherwise
> be a vulnerability.

Ok, sounds good. We (Homebrew) recently had to rebuild our OpenSSH package to use OpenSSL 3 and shipped it without `-O` flags on macOS 13-x86_64, so we're not going to change that for now.

> If you want to investigate further, you could enumerate the flags
> added by --with-hardening (which will depend on what the compiler
> supports, you could diff Makefile generated with and without) and
> add them to CFLAGS one at a time along with -Os and see if you can
> narrow down which of them triggers the problem.

Thanks for the tip. I'll also try to find the time to do this.

> (I tried installing xcode 14.3 to reproduce but my test mac doesn't
> support a new enough OSX version to do that.)

GitHub provides free access to macOS runners for public repositories, and these have various versions of Xcode installed. This is what I'll probably end up using to investigate this problem further, but you might also be inclined to do the same.
Comment 5 Damien Miller 2023-06-28 09:02:54 AEST
Darren already answered your question but fwiw I didn't suggest --without-hardening as a workaround, but to determine whether the compiler bug is with -Os alone or when combined with other flags.
Comment 6 Darren Tucker 2023-06-28 12:32:55 AEST
(In reply to Carlo Cabrera from comment #4)
[...]
> GitHub provides free access to macOS runners for public
> repositories, and these have various versions of Xcode installed.

An interesting idea.  We already use these in our CI tests, eg
https://github.com/openssh/openssh-portable/actions/runs/5351378114
however we don't currently use anything except the default compilers.  How do you select specific xcode versions?

They're a bit inconvenient to interact with for debugging (short of hacks) but
better than nothing.
Comment 7 Carlo Cabrera 2023-06-29 22:06:57 AEST
> How do you select specific xcode versions?

You can use `xcode-select --switch /path/to/Xcode.app`. For example, to use Xcode 14.3.1 on a GitHub macos-13 runner [1], do
```
sudo xcode-select --switch Applications/Xcode_14.3.1.app
```
You can also use `-s` instead of `--switch`.

[1] https://github.com/actions/runner-images/blob/main/images/macos/macos-13-Readme.md#xcode
Comment 8 Michael Cho 2023-06-30 16:22:00 AEST
Changing optimization only masked issue during build but resulting binaries caused segfaults and other problems for Homebrew users.

Based on my analysis, the issue appears to be that Xcode 14.3 (Apple Clang 14.0.3) is based on LLVM 15 and thus we hit the LLVM bug mentioned in configure.ac (ref: https://github.com/llvm/llvm-project/issues/59242, https://reviews.llvm.org/D139679)

Version info is a bit annoying with Apple Clang since they don't align with LLVM version numbers. Also, the text is different so the configure.ac logic doesn't work
```
❯ clang -v 2>&1 | head -1
Apple clang version 14.0.3 (clang-1403.0.22.14.1)

❯ clang -v 2>&1 | awk '/clang version /{print $3}'
version
```

In Homebrew, I added a temporary workaround in https://github.com/Homebrew/homebrew-core/pull/135373 but would be nice to improve configure.ac logic.

Issue should go away with Xcode 15 release as Apple Clang 15.0.0 is based on LLVM 16.
Comment 9 Carlo Cabrera 2023-06-30 23:55:44 AEST
Yes, so it looks like `configure.ac` already knows to avoid `-fzero-call-used-regs=all` when compiling with `clang-15`, except that Apple clang uses a misleading version scheme.

Wikipedia is usually a pretty reliable reference for the corresponding LLVM version given the version string produced by `clang --version`, though: https://en.wikipedia.org/wiki/Xcode#Xcode_11.0_-_14.x_(since_SwiftUI_framework)_2
Comment 10 Darren Tucker 2023-09-10 15:56:59 AEST
I've added a check to configure for an Apple flavoured clang, and if found we'll use -fzero-call-used-regs=used instead of -fzero-call-used-regs=all regardless of apparent version.  Once there are releases known to work we can allowlist those.  This will be in the next release.

Could you please try either a the current git version (you'll need to run "autoreconf") or tomorrow's snapshot (from https://www.mindrot.org/openssh_snap/).
Comment 11 Darren Tucker 2023-09-10 16:03:22 AEST
from our github CI it looks like the output format was not what I expected and did not match the older machine I have access to here.  (I picked the way I did the workaround so it still enables it, but the configure output doesn't include the version numbers.)  Could you please show me the output of "cc -v" from an affected machine so I can fix that up?  Thanks.
Comment 12 Darren Tucker 2023-10-12 20:16:54 AEDT
I got access to a machine running OS X Ventura with XCode 15:

% cc --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: x86_64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

and confirmed that it passes all tests without any additional compiler flags, which means that https://github.com/openssh/openssh-portable/commit/41232d25532b4d2ef6c5db62efc0cf50a79d26ca did in fact fix this.  Removing from the 9.6 list and closing.