Git Product home page Git Product logo

nativejit's Introduction

Build status Build status

NativeJIT

NativeJIT is an open-source cross-platform library for high-performance just-in-time compilation of expressions involving C data structures. The compiler is light weight and fast and it takes no dependencies beyond the standard C++ runtime. It runs on Linux, OSX, and Windows. The generated code is optimized with particular attention paid to register allocation.

The compiler was developed by the Bing team for use in the Bing search engine. One important use is scoring documents containing keywords that match a user's query. The scoring process attempts to gauge how well each document matches the user's intent, and as such, depends on the specifics of how each query was phrased. Bing formulates a custom expression for each query and then uses NativeJIT to compile the expression into x64 code that will be run on a large set of candidate documents spread across a cluster of machines.

We knew from the get go that throughput and latency would be essential when processing queries at scale, so we put a lot of effort in to making NativeJIT run fast.

Our design point was scenarios where

  • The expression isn't known until runtime.
  • The expression will be evaluated enough times to amortize the cost of compilation.
  • Latency and throughput demands require low cost for compilation.

Here's trivial "Hello, world" level example that computes the area of a circle:

#include "NativeJIT/CodeGen/ExecutionBuffer.h"
#include "NativeJIT/CodeGen/FunctionBuffer.h"
#include "NativeJIT/Function.h"
#include "Temporary/Allocator.h"

#include <iostream>

using NativeJIT::Allocator;
using NativeJIT::ExecutionBuffer;
using NativeJIT::Function;
using NativeJIT::FunctionBuffer;

int main()
{
    // Create allocator and buffers for pre-compiled and post-compiled code.
    ExecutionBuffer codeAllocator(8192);
    Allocator allocator(8192);
    FunctionBuffer code(codeAllocator, 8192);

    // Create the factory for expression nodes.
    // Our area expression will take a single float parameter and return a float.
    Function<float, float> expression(allocator, code);

    // Multiply input parameter by itself to get radius squared.
    auto & rsquared = expression.Mul(expression.GetP1(), expression.GetP1());

    // Multiply by PI.
    const float  PI = 3.14159265358979f;
    auto & area = expression.Mul(rsquared, expression.Immediate(PI));

    // Compile expression into a function.
    auto function = expression.Compile(area);

    // Now run our expression!
    float radius = 2.0;
    std::cout << "The area of a circle with radius " << radius
              << " is " << function(radius);

    return 0;
}

Here is the generated assembly code on Windows:

PI_CONSTANT:
   db 0f 49 40                              ; PI constant is stored in memory.
ENTRY_POINT:
  sub         rsp,8                         ; Standard function prologue.
  mov         qword ptr [rsp],rbp           ; Standard function prologue.
  lea         rbp,[rsp+8]                   ; Standard function prologue.
  mulss       xmm0,xmm0                     ; Multiply by radius parameter by itself.
  mulss       xmm0,dword ptr [29E2A580000h] ; Multiply by PI.
  mov         rbp,qword ptr [rsp]           ; Standard function epilogue.
  add         rsp,8                         ; Standard function epilogue.

This example shows an expression that multiplies a number by itself. We also support a wide variety of arithmetic and logical operations, pointer and array operations, conditionals, accessing structure fields, and calling out to C functions. See our preliminary API docs for more information and the Examples/ directory for more examples.

Dependencies

In order to build NativeJIT you will need CMake (2.8.11+), and a modern C++ compiler (gcc 5+, clang 3.4+, or VC 2015+). You can run CMake directly to generate the appropriate build setup for your platform. Alternately, we have some scripts that have the defaults that we use available.

*nix

For *nix platforms (including OS X),

./Configure_Make.sh
cd build-make
make
make test

Ubuntu

If you're on Ubuntu 15+, you can install dependencies with:

sudo apt-get install clang cmake

On Ubuntu 14 and below, you'll need to install a newer version of CMake. To install a new-enough CMake, see this link. If you're using gcc, you'll also need to make sure you have gcc-5 (sudo apt-get install g++-5).

To override the default compiler, set the CXX and CC environment variables. For example, if you have clang-3.8 installed as clang-3.8 and are using bash:

export CXX="clang++-3.8"
export CC="clang-3.8"

OS X

Install XCode and then run the following command to install required packages using Homebrew (http://brew.sh/):

brew install cmake

NativeJIT can be built on OS X using either standard *nix makefiles or XCode. In order to generate and build makefiles, in the root NativeJIT directory run:

If you want to create an Xcode project instead of using Makefiles, run:

./Configure_XCode.sh

Windows

Install the following tools:

You can get the free version of Visual Studio here. Note that if you're installing Visual Studio for the first time and select the default install options, you won't get a C++ compiler. To force the install of the C++ compiler, you need to either create a new C++ project or open an existing C++ project.

In order to configure solution for Visual Studio 2015 run the following commands from the root NativeJIT directory:

.\Configure_MSVC.bat

From now on you can use the generated solution build-msvc\NativeJIT.sln from Visual Studio or build from command line using cmake.

nativejit's People

Contributors

danluu avatar hausdorff avatar jondgoodwin avatar mikehopcroft avatar nabijaczleweli avatar nibroc avatar theantony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nativejit's Issues

Static initialization order fiasco

Register.h

    // TODO: Need to avoid "static initialization order fiasco" for register definitions.
    // See http://www.parashift.com/c++-faq/static-init-order.html.
    // Plan is to use constexpr when VS2013 becomes available to Bing build.

Constexpr is available for us.

Speed up Travis CI by removing sudo

We use trusty in order to get a new enough cmake without installing cmake. This presently requires sudo, which prevents us from using dockerized Travis, which is faster. This is ok for now since we have other priorities and low CI load, but we should fix this at some point.

FreeList displacement???

    // TODO: Perhaps this should return a Data*.
    // If nothing is displaced, returns nullptr.
    // Otherwise returns Data* for displaced.
    template <unsigned SIZE>
    void ExpressionTree::FreeList<SIZE>::Allocate(unsigned id)
    {

We believe this comment is irrelevant because this function doesn't displace anything. It's possible comment became obsolete when the concept of storage was added.

Use constexpr for register constants

inc/NativeJIT/Nodes/ParameterNode.h

        // Integer parameters are passed in RCX, RDX, R8, and R9.
        // Use constants to encode registers. See #31.
#ifdef NATIVEJIT_PLATFORM_WINDOWS
        const uint8_t idMap[] = {1, 2, 8, 9};
#else
        const uint8_t idMap[] = {7, 6, 2, 1, 8, 9};
#endif

See also #24

Replace test asserts with appropriate comparisons

We have, for example TestAssert(foo == bar), which should be replaced with TestEqual(foo, bar) so that we get better debug info on failures.

We should be able to do all of these replacements in one shot with a simple script. I'm not doing this right now because there's an outstanding PR that adds tests in a style that matches the "local" style in the file and I want to wait until that's merged so that we don't have a merge conflict.

Protect against orphan nodes

inc/NativeJIT/Nodes/Node.h

        // Increments the number of node's parents that will evaluate the node
        // through the Node<T>::CodeGen() method. The node will be evaluated
        // only once, but the result will also be stored in cache with a
        // matching number of references. The cache will be released once all
        // parents evaluate the node.
        // IMPORTANT: Currently, there's an assumption that if a node is created,
        // it must be placed inside the tree. TODO: Remove this assumption and
        // allow for optimizing away unused nodes.
        void IncrementParentCount();

Fix: maintain count of nodes created and count of nodes in tree. Assert fail if they don't match.

Note that not fixing this will only cause more code to get generated than is necessary. In our first pass, if we have an orphan root with children, code will get generated that can't be reached through any reasonable means.

Yet another bogus Travis CI failure

See travis-ci/travis-ci#6120 (comment) for a workaround. Heading out, will look at this later.

More generally, the majority (all?) of the Travis-CI errors I've caused so far have been due to bogus fails. Is there a real alternative to Travis CI we can use or is this as good as it gets if we don't set up our own CI infrastructure?

Set up CI

Filing bugs on our action items because our action items from the last two meetings have a very large intersection. We should just have bugs to refer to instead of having to refresh the list in meetings so it's not forgotten. Bug assignment is somewhat arbitrary. Please feel free to re-assign.

It sounded like we were going to look into Travis and Appveyor?

Remove sethi-ullman LabelSubtree

In CallNode.h

                // TODO: Need to account for the fact the extra registers may be used
                // when the specific parameter home is different than the one returned
                // duing code generation.

CI broken because of gcc 4.9

https://travis-ci.org/BitFunnel/NativeJIT/jobs/141203696

I grabbed the CI setup from BitFunnel because that has a fix for a problem here. However, it also reverts gcc from gcc5 to gcc 4.9, which causes this failure:

/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp: In member function ‘virtual void NativeJIT::FunctionBuffer::Reset()’:
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::BeginAddress’ [-Werror=missing-field-initializers]
         m_runtimeFunction = {};
                           ^
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::EndAddress’ [-Werror=missing-field-initializers]
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::UnwindData’ [-Werror=missing-field-initializers]

I think Alex inserted some junk in BitFunnel to fix this gcc 4.9 issue in BitFunnel and that we can probably do something similar here.

IosMiniStateRestorer

X64CodeGenerator.h:

    // TODO: Move this class to a separate header

Note that Ios stands for iostream

Inject LICENSEs

I'm personally not a fan of this idea, but since we agreed to do this I don't mind taking this one.

Signed testing

Should we have better coverage for signed? Most of our tests are against uint.

Packed should be moved

inc/NativeJIT/Packed.h

// TODO: This class and PackedMinMaxNode should probably move from NativeJIT to BitFunnel.Library.

Determine strategy for register allocation on conditionals

In ConditionalNode.h

        // TODO: Evaluating both expressions in advance of the test is
        // sub-optimal, but it is currently required to guarantee consistent
        // state: the execution in NativeJIT has a continuous flow regardless
        // of the outcome of the (runtime) condition whereas the generated x64
        // code has two branches and each of them can have independent impact on
        // allocated and spilled registers. To have each expression evaluated
        // only if its branch is hit, the NativeJIT code would have to ensure
        // that the state of the allocated/spilled registers (i.e. all Storages)
        // is consistent once those two x64 branches converge back.
        //
        // A related note: if this is addressed, then it may be possible to
        // also address the fact that common subexpression (CSE) code is
        // inefficient in some cases because some CSEs may not need to be
        // evaluated. For example:
        //   (v == 1)? a : ((v == 2) ? a + b : b + c)
        //
        // Depending on the value of v, either a or c may not need to be
        // evaluated. One possibility is to lazily evaluate CSEs *at runtime*
        // as they are needed.

Remove SuiteCpp

We should remove SuiteCpp and then rename all of the test asserts to use the GTest syntax (e.g. TestEqual will become ASSERT_EQ). Also remove any shims that allow SuiteCpp to be used in place of GTest.

Use COMBINE_FILE_LISTS in cmakelists.txt

This strategy is used in BitFunnel. Just need to define COMBINE_FILE_LISTS() macro in the top level CMakeLists.txt and then use in the lower level CMakeLists.txt files to replace

if (NATIVEJIT_PLATFORM_WINDOWS)
set(CPPFILES ${CPPFILES} ${WINDOWS_CPPFILES})
set(PUBLIC_HFILES ${PUBLIC_HFILES} ${WINDOWS_PUBLIC_HFILES})
set(PRIVATE_HFILES ${PRIVATE_HFILES} ${WINDOWS_PRIVATE_HFILES})
else (NATIVEJIT_PLATFORM_WINDOWS)
set(CPPFILES ${CPPFILES} ${POSIX_CPPFILES})
set(PUBLIC_HFILES ${PUBLIC_HFILES} ${POSIX_PUBLIC_HFILES})
set(PRIVATE_HFILES ${PRIVATE_HFILES} ${POSIX_PRIVATE_HFILES})
endif (NATIVEJIT_PLATFORM_WINDOWS)

Travis CI flakiness from failed apt-get install

https://travis-ci.org/BitFunnel/NativeJIT/builds/129207321

https://travis-ci.org/BitFunnel/NativeJIT/builds/130683483

https://travis-ci.org/BitFunnel/NativeJIT/builds/130904044

The 2nd of those three builds fails with:

E: Failed to fetch http://llvm.org/apt/precise/pool/main/l/llvm-toolchain-3.8/libllvm3.8_3.8~svn269657-1~exp1_amd64.deb  404  Not Found
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
apt-get.diagnostics
apt-get install failed

Despite no change in the travis config file.

Determine which VC++ warning removal pragmas we don't need

This code was originally written for an old version of VC++ which had a lot more quirks than 2015. Some of the warning removal pragmas can probably be removed, and many of the ones that can't be removed should probably have better documentation.

It shouldn't be that hard to write a script that removes the pragmas one-by-one to see which ones are no longer necessary.

Also, in some cases, we might be able to just change the code to work cross-platform without some kind of platform-specific #ifdef'd pragma.

Avoid extra MOV for Return(Immediate(foo))

It would be nice if ReturnNode::CompileAsRoot could generate into RAX instead of moving.

Consider example of a tree that is just Return(Immediate(5)). Right now this generates

mov r15, 5
mov rax, r15

Non-volatile register save & restore testing

Should we have better coverage?

        //   Test register save/restore across call.
        //      Caller save volatiles
        //   Test call register indirect.
        //   Test that tries all permutations of parameter from Function to Call.

Mismatched new[], delete

I've fixed a few of these in BitFunnel, but I don't think I was on the lookout for these with NativeJIT. valgrind can find these, though:

==13082== Mismatched free() / delete / delete []
==13082==    at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13082==    by 0x538D3A: std::default_delete<char>::operator()(char*) const (unique_ptr.h:76)
==13082==    by 0x538AC2: std::unique_ptr<char, std::default_delete<char> >::~unique_ptr() (unique_ptr.h:236)
==13082==    by 0x538833: NativeJIT::Allocator::~Allocator() (Allocator.cpp:48)
==13082==    by 0x488005: NativeJIT::TestFixture::~TestFixture() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487F94: NativeJIT::CodeGenUnitTest::CodeGen::~CodeGen() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487DC4: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::~CodeGen_JCC_Test() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487DE8: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::~CodeGen_JCC_Test() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x568B3A: testing::Test::DeleteSelf_() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x57E7C9: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/leah/dev/Nativ
eJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x568256: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2438)
==13082==    by 0x54D688: testing::TestInfo::Run() (gtest.cc:2661)
==13082==  Address 0x5cedf80 is 0 bytes inside a block of size 8,192 alloc'd
==13082==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13082==    by 0x53875B: NativeJIT::Allocator::Allocator(unsigned long) (Allocator.cpp:40)
==13082==    by 0x53826A: NativeJIT::TestFixture::TestFixture(unsigned int, unsigned int, std::ostream*) (TestSetup.cpp:52)
==13082==    by 0x5381B5: NativeJIT::TestFixture::TestFixture() (TestSetup.cpp:39)
==13082==    by 0x487F5E: NativeJIT::CodeGenUnitTest::CodeGen::CodeGen() (CodeGenTest.cpp:40)
==13082==    by 0x487F1E: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::CodeGen_JCC_Test() (CodeGenTest.cpp:46)
==13082==    by 0x487ECD: testing::internal::TestFactoryImpl<NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test>::CreateTest() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x57E8F9: testing::Test* testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, te
sting::Test* (testing::internal::TestFactoryBase::*)(), char const*) (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x5688A6: testing::Test* testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, testi
ng::Test* (testing::internal::TestFactoryBase::*)(), char const*) (gtest.cc:2438)
==13082==    by 0x54D61B: testing::TestInfo::Run() (gtest.cc:2647)
==13082==    by 0x54DE26: testing::TestCase::Run() (gtest.cc:2774)
==13082==    by 0x556093: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4649)

It looks like we're newing up an array, putting a pointer to that array in a unique_ptr, which doesn't know to call delete[] on destruction.

See also BitFunnel/BitFunnel#57.

JumpTable::PatchCallSites() dangerous casts

    // WARNING: Non portable. Assumes little endian machine architecture.                                                                                                                
    // WARNING: Non portable. Assumes that fixup value is labelAddress - siteAddress - size.Size().                                                                                      
    void JumpTable::PatchCallSites()
    {                                                                                                                                                                            
        for (size_t i=0; i < m_callSites.size(); ++i)
        {                                                                                                                                                                        
            const CallSite& site = m_callSites[i];
            const uint8_t* labelAddress = AddressOfLabel(site.GetLabel());
            uint8_t* siteAddress = site.Site();
            size_t delta = labelAddress - siteAddress - site.Size();

            // TODO: Evaluate whether special cases for size == 2 and size == 4 actually improve performance.                                                                            
            size_t size = site.Size();
            if (size == 2)
            {
                *((int16_t*)siteAddress) = (int16_t)delta;
                siteAddress += size;
            }
            else if (size == 4)
            {
                *((int32_t*)siteAddress) = (int32_t)delta;
                siteAddress += size;
            }
            else
            {
                while (size > 0)
                {
                    *siteAddress++ = (uint8_t)delta;
                    delta = delta >> 8;
                    size--;
                }
            }
        }
    }

I'm enabling -Wold-style-cast (see BitFunnel/BitFunnel#37) and converting these to C++ style casts, but this function still seems dangerous.

Variadic templated `Function` (instead of only allowing 0-4 params)

This was originally implemented the way it's done in the existing code because variadic templates weren't available in VC++ when the project was started, and on Windows the first four params are passed as register, so adding more than four params is a new code path.

VC++ is now at least semi-modern so this should be possible now.

Register allocator uses unnecessary registers

Consider the following C++ code:

    Function<float, float> expression(allocator, code);
    auto & rsquared = expression.Mul(expression.GetP1(), expression.GetP1());
    const float  PI = 3.14159265358979f;
    auto & area = expression.Mul(rsquared, expression.Immediate(PI));
    auto function = expression.Compile(area);

On Windows this code generates something like

sub    rsp, 28h                     
mov    qword ptr[rsp], rbp
movaps xmmword ptr[rsp + 10h], xmm15
lea    rbp, [rsp + 28h]

movss  xmm15, xmm0                  
mulss  xmm15, xmm0                  
mulss  xmm15, dword ptr[PI_CONSTANT]
movss  xmm0, xmm15                  

movaps xmm15, xmmword ptr[rsp + 10h]
mov    rbp, qword ptr[rsp]
add    rsp, 28h
ret

This code is inefficient because it uses xmm15 when it could have used xmm0. Would have expected

mulss  xmm0, xmm0                  
mulss  xmm0, dword ptr[PI_CONSTANT]

Decide what to do with files in inc/Temporary

These are allocator interfaces that will also be used in BitFunnel. Options seem to be

  1. Maintain separate versions, one in the NativeJIT namespace, and one in the BitFunnel namespace.
  2. Make both NativeJIT and BitFunnel take a dependency on a 3rd repo that provides these interfaces.
  3. Make BitFunnel depend on NativeJIT for these interfaces.

Better documentation system

Our current system is a very simple manual hack. We should have some way to automatically extract known-good code and put it into our docs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.