A Data Point for MSVC vs Clang Code Generation

I’m a windows PC game developer using MSVC 2015 update 1, working in C++.

More and more, I hear people talk about how great clang is, and that it generates much better code than MSVC, among other niceties.

I have been chalking it up to maybe just being a fad, but keeping my feelers out there to see if I can get some concrete comparitive info.

Well, I’ve stumbled on some of that concrete comparitive info and want to share it with anyone else who is wondering if clang really is better than MSVC. This is just one data point, but it feels like it’s just the tip of the iceberg.

On twitter, Jason Turner (@lefticus) had an interesting tweet:

I wasn’t really sure what he was getting at, so clicked the link to check it out (http://godbolt.org/g/WDpPYq).

It turned out to be very relevant to my interest, because his particular example is comparing a value to a bunch of compile time constants. That is basically the core of what I’ve been looking into with my last few posts asking whether code or data was faster!

This first particular example is comparing a compile time constant to other compile time constants, so the code completely melts away and at runtime just returns the compile time calculated result. That isn’t very interesting, but it is nice to see that clang did so much at compile time. FWIW MSVC was able to do this all at compile time as well, so they are even so far.

What is more interesting is what happens when you test a compile time unknown against compile time constants. Let’s check it out… (https://godbolt.org/g/dKBDSK)

What the assembly does is subtract 1 from the input (it’s unsigned so if 0, wraps around to max int value), and then compares it against 5 to know if it’s in the group or not. Clang realized the numbers were continuous and so made a nice optimization.

In this case, MSVC did similar in release x64:

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
    bool match = false;
    (void)std::initializer_list<bool>{ (match = match || u == t)... };
    return match;
}

int main(int argc, char** argv)
{
00007FF62DB61000  lea         eax,[rcx-1]  
00007FF62DB61003  cmp         eax,4  
00007FF62DB61006  setbe       al  
    return one_of(argc, 1, 2, 3, 4, 5);
00007FF62DB61009  movzx       eax,al  
}
00007FF62DB6100C  ret  

But in x86 release it did a bunch of if/else if/else if’s!

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
    bool match = false;
    (void)std::initializer_list<bool>{ (match = match || u == t)... };
    return match;
}

int main(int argc, char** argv)
{
00331002  in          al,dx  
    return one_of(argc, 1, 2, 3, 4, 5);
00331003  mov         eax,dword ptr [argc]  
00331006  cmp         eax,1  
00331009  je          main+26h (0331026h)  
0033100B  cmp         eax,2  
0033100E  je          main+26h (0331026h)  
00331010  cmp         eax,3  
00331013  je          main+26h (0331026h)  
00331015  cmp         eax,4  
00331018  je          main+26h (0331026h)  
0033101A  cmp         eax,5  
0033101D  je          main+26h (0331026h)  
0033101F  xor         al,al  
00331021  movzx       eax,al  
}
00331024  pop         ebp  
00331025  ret  
    return one_of(argc, 1, 2, 3, 4, 5);
00331026  mov         al,1  
00331028  movzx       eax,al  
}
0033102B  pop         ebp  
0033102C  ret  

You are probably asking “what does clang do in x86?” well it turns out it does the same thing as in x64, it doesn’t fall back to if/else if/else if like MVSC does (proof: add -m32 in goldbolt. https://godbolt.org/g/khnrtO). One point to clang!

What if the numbers are not so continuous though? It turns out it can actually switch to using a binary search! (https://godbolt.org/g/iBkqja)

MSVC on the other hand just does a bunch of if/else if/else if tests, in both x86 release and x64 release.

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
    bool match = false;
    (void)std::initializer_list<bool>{ (match = match || u == t)... };
    return match;
}

int main(const int argc, const char *[])
{
00007FF6C05A1000  cmp         ecx,1AB42h  
00007FF6C05A1006  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1008  cmp         ecx,40Fh  
00007FF6C05A100E  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1010  cmp         ecx,0B131h  
00007FF6C05A1016  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1018  cmp         ecx,93BBh  
00007FF6C05A101E  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1020  cmp         ecx,121Bh  
00007FF6C05A1026  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1028  cmp         ecx,0EE9h  
00007FF6C05A102E  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1030  cmp         ecx,0E1Fh  
00007FF6C05A1036  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1038  cmp         ecx,995h  
00007FF6C05A103E  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1040  cmp         ecx,5FEh  
00007FF6C05A1046  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1048  cmp         ecx,5BFh  
00007FF6C05A104E  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1050  cmp         ecx,5  
00007FF6C05A1053  je          main+63h (07FF6C05A1063h)  
00007FF6C05A1055  cmp         ecx,0FFFEh  
00007FF6C05A105B  je          main+63h (07FF6C05A1063h)  
    return one_of(argc, 1471, 2453, 3817, 45361, 37819, 109378, 1534, 4635, 1039, 3615, 5, 65534);
00007FF6C05A105D  xor         al,al  
00007FF6C05A105F  movzx       eax,al  
}
00007FF6C05A1062  ret  
    return one_of(argc, 1471, 2453, 3817, 45361, 37819, 109378, 1534, 4635, 1039, 3615, 5, 65534);
00007FF6C05A1063  mov         al,1  
00007FF6C05A1065  movzx       eax,al  
}
00007FF6C05A1068  ret  

Closing

This is just one data point about how clang is better than MSVC, but it is a data point. I’m betting there are more if we looked for them.

This makes me wonder how switch statements do in clang vs msvc, and also makes me wonder if clang ever uses jump tables or more advanced data structures in either switch statements, or other code that does comparison against a potentially large number of compile time constants. Those thoughts are driven by this things seen in this article: Something You May Not Know About the Switch Statement in C/C++

The examples above used C++14 level C++ to implement “one_of”. If you can use C++17 level C++, you can also implement it this way, which also does the binary search (Also written by Jason Turner):
https://godbolt.org/g/RZgjRQ

PS wouldn’t it be nice if godbolt supported MSVC so we could do this sort of analysis on MSVC code? It’s in the works, but unsure when it’ll be available. Apparently licensing isn’t the issue, so lets hope it comes sooner rather than later! If you want it, maybe ping @mattgodbolt and let him know how much you want that functionality (:

Have any other clang vs MSVC info? If so, I’d love to hear about it!

Comments

comments

Posted in assembly, C++ permalink

About Demofox

I'm a game and engine programmer at Blizzard Entertainment and have been making games since 1990 (starting out with QBasic and TI-85 games) My shipped titles include: * Heroes of the Storm * StarCraft II: Heart of the Swarm & Legacy of the void * Insanely Twisted Shadow Planet (PC) * Gotham City Impostors (PC, 360, PS3) * Line Rider (PC, Wii, DS) I also like hiking, making music, learning cool new stuff and attempting the impossible.