Memory Mapped Files

Penguins in the desert

Remember the old personal digital assistants, otherwise known as PDAs? One in particular, the Palm Pilot, had an interesting operating system. Once you created or opened an application or file, it was open forever. It was just a matter of navigating through the screens to find it again, and it was instantly usable again. Its 32-bit Motorola chip allowed it to address the entire device’s contents. All files resided in memory (at least that is what I surmised). This resulted in zippy performance and never worrying about saving a change.

A Palm Pilot

If only we could do that with a full size operating system. Now that we have 64 bit addressing, we can address a huge chunk of the planet’s data — but just a chunk of it. According to Cisco, and cited in Wikipedia, the planet entered the Zettabyte Era in 2012. We would need at least 70 addressing bits to address the entire planet’s data. Nevertheless, 64 bits allows the addressing of every byte on 16 million terabyte disk drives.


Of course, the modern CPUs in new machines can’t really directly address every byte of 16 million terabytes. They’re still limited by the number of physical address lines on their processor chips, so my little machine has only 64 GB of physical memory in it, not counting the extra memory for graphics.

Nevertheless, an immense number of problems can be solved entirely in memory that were previously solved using combinations of files and memory (and magnetic tapes and drums). Essentially, though, you still have the problem of reading data from the outside world into memory.

In processing large data files for signal process, I discovered (or re-discovered) that memory mapping a file was much faster than reading it. On the old VAX/VMS system I used back then, the memory mapped method was an order of magnitude faster. On more modern systems, such as Windows, Linux, and MacOS, memory mapping sometimes works many times faster:

50847534 primes read in
Read file in 0.503402 seconds.

50847534 primes scanned
Scanned memory map in 0.00546256 seconds

Memory read time is 92.155 times faster

The timings include the time to open the file, set up the mapping, and scanning the contents of the file, and closing the mapping and file.

To get this magical speed-up on POSIX like systems (OSX, Linux, AIX,…) start with the man page on mmap . On POSIX you basically open an existing file (to get a file descriptor), get the length of the file, and map it, and get a pointer to it.

On Windows, it’s slightly more complicated. Start with Microsoft’s documentation at https://docs.microsoft.com/en-us/windows/win32/memory/file-mapping. Open an existing file (to get a HANDLE), get the file length, map it, and do an additional step to get a FileView on it. You may change the FileView to get at different sections of the file. Evidently that is more efficient that just creating another mapping.

On POSIX-like systems with mmap you may create multiple mappings on the same file. POSIX mmap appears to be really cheap so you may close a mapping and make another one in short order to get a new view into the file.

Of course you can hide all the operating system specific details if you use the Boost shared mapping to map a file: https://www.boost.org/doc/libs/1_79_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file . With Boost you create a file mapping object given a file name, then get a pointer to the memory the size available with a mapping region created from the mapping object.

Generally, if you’re not using Boost, you’re wasting your time. Many of the features of C++11, 17, and 20, were first tried out in Boost. A lot of thought and review goes into the Boost libraries. As with all good rules of thumb and examples of group think, there are exceptions. Boosts attempt to isolate operating system dependent functions behind an operating system independent interface is an example of an implementation is just going to cause trouble — different operating systems have different implementation philosophies. In Windows the FileMap function just maps a section of a file into a section of memory, while in Linux, MacOS or OSX, and UNIX like systems mmap has many functions — mmap is the Swiss army knife of memory managment. The Boost interface provides only the file mapping, and attempts to emulate the file view of Windows on Linux, and none of the other functions of mmap.

For example, give mmap a file descriptor of -1, and a flag, MAP_ANON or MAP_ANONYMOUS, it will just give you a new chunk of memory. For really fast memory management, place a Boost Pool on the newly allocated memory with a C++ placement new operator.

For another example of why low-level access is handy, a file may be mapped in multiple processes. You may use this shared area for interprocess shared semaphores, condition variables, or just shared memory. If you use the MAP_PRIVATE flag, modifications are private to the processes. Changes cause a writable copy of the page to be created to contain the modification. The other process doesn’t see the change. MAP_SHARED, though, allows all changes to be shared between the processes.

Without further ado, here is the code that produced the benchmark above:

// The Main Program
#include <cstdlib>
#include <iostream>
#include <stdexcept>
#include "stopwatch.hpp"

auto fileReadMethod(const char*) -> unsigned int;
auto memoryMapMethod(const char*) -> unsigned int;


auto main(int argc, char* argv[]) -> int {
  int returnCode = EXIT_FAILURE;
  const char* input_file_name = argc < 2 ? "primes.dat" : argv[1];

  try {
    StopWatch stopwatch;
    std::cout << fileReadMethod(input_file_name) << " primes read in" << std::endl;
    auto fileReadTime = stopwatch.read();
    std::cout << "Read file in " << fileReadTime << " seconds." << std::endl;

    std::cout << std::endl;
    stopwatch.reset();
    std::cout << memoryMapMethod(input_file_name) << " primes scanned" << std::endl;
    auto memoryReadTime = stopwatch.read();
    std::cout << "Scanned memory map in " << memoryReadTime << " seconds" << std::endl;

    std::cout << std::endl;
    std::cout << "Memory read time is " << fileReadTime/memoryReadTime << " times faster"  << std::endl;

    returnCode = EXIT_SUCCESS;
  } catch (const std::exception& ex) {
    std::cerr << argv[0] << ": Exception: " << ex.what() << std::endl;
  }
  return returnCode;
}

// File reading method of scanning all the bytes in a file
#include <fstream>
auto fileReadMethod(const char* inputFileName) -> unsigned int {
    unsigned int census = 0;

    unsigned long prime;
    std::ifstream primesInput(inputFileName, std::ios::binary);
    while (primesInput.read(reinterpret_cast<char*>(&prime), sizeof(prime))) {
        ++census;
    }

    return census;
}

#include <fcntl.h>
#include <stdexcept>
#include <sys/mman.h>
#include <sys/stat.h>
#include "systemexception.hpp"

// Count the number of primes in the file by memory mapping the file
auto memoryMapMethod(const char* inputFilename) -> unsigned int  {

    int fd = ::open(inputFilename, O_RDONLY | O_CLOEXEC, 0);
    if (fd < 0) throw SystemException{};

    struct stat stats;  //NOLINT
    if (::fstat(fd, &stats) < 0) throw SystemException{};

    size_t len = stats.st_size;
    void* mappedArea = ::mmap(nullptr, len, PROT_READ, MAP_FILE | MAP_PRIVATE, fd, 0L);
    if (mappedArea == MAP_FAILED) throw SystemException{};
    auto* primes = static_cast<unsigned long*>(mappedArea);
    unsigned int countOfPrimes = len/sizeof(unsigned long);

    unsigned int census = 0;
    for (auto* p = primes; p != primes + countOfPrimes; ++p) {
        ++census;
    }
    if (countOfPrimes != census) throw std::runtime_error{"Number of mapped primes mismatch"};
    return countOfPrimes;
}

// Utility for stop watch timinh
#include "stopwatch.hpp"

using std::chrono::steady_clock;
using std::chrono::duration;

auto StopWatch::read() const -> double {
  steady_clock::time_point stopwatch_stop = steady_clock::now();
  steady_clock::duration time_span = stopwatch_stop - start;
  return duration_cast< duration<double> >(time_span).count();
}

And the stopwatch header ….

#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP
#include <chrono>

class StopWatch {
 public:
  StopWatch() : start {std::chrono::steady_clock::now()} { }

  void reset() { start = std::chrono::steady_clock::now(); }

  [[nodiscard]] auto read() const -> double;

 private:
  std::chrono::steady_clock::time_point start;

};

Compile the code with C++20. Use it as you will. I’d appreciate some credit, but don’t insist on it. For the more legal types, apply this license:

@copyright 2022 Glen S. Dayton.  Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following
conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

Do not change the terms of this license, nor make it more restrictive.

Parenthetical Note

Power of 2ISO/IEC-8000-13Approximate Power of 10Prefix
121010241kibi10310001kilo
222010242mebi10610002mega
323010243gibi10910003giga
424010244tebi101210004tera
525010245pebi101510005peta
626010246exbi101810006exa
727010247zebi102110007zetta
828010248yobi102410008yotta
Prefixes

Memory has historically been measured in powers of 1024 (210), but disk space in disk space in powers of 1000 (103) — meaning a kilobyte of disk space is 1000 bytes but a kilobyte of memory is 1024 bytes. In 2008 ISO and IEC invented new prefixes for the binary powers — kibi..yobi. I have yet to see the ISO/IEC prefixes in any advertisements for memory or storage. Human language, especially English, is wonderful in its overlaying of meaning depending on context.