Security – Software Sermon

Crying Wolf

A lot of people hype about the dangers of Artificial Intelligence and its existential danger to the human species. Balderdash.

First off, there is no such thing as artificial intelligence, despite Blake Lemoine’s assertion in 2022 that Google’s LaMBDA was sentient. A common definition of sentience includes the capabilities of sensing the environment, and having feelings. Having feelings implies self-awareness. A large model can certainly simulate feelings and claim self-awareness, but it just doesn’t have the mechanisms of emotions necessary to have them. Jesuits taught me that emotions require a bodily response and thought, Your brain does not exist in isolation from your body and your hormones. Those same Jesuits, though, also claimed animals have no emotions, so its obvious they never owned a pet. You could argue that an AI is as self-aware as a human, but I maintain my assertion that simulating an emotion isn’t the same as having an emotion — and now I have descended into a religious argument. There is no test for whether or not you have emotion, a soul, or whether there is a god. Arguing whether an AI is sentient is as useless question as asking whether we have souls. (I know I have a soul, but how I know is not conveyable to you, and how you know there is a god is not conveyable to me — thus the need for freedom of religion). We don’t understand what separates us from animals, if anything, so we’re incapable of determining if an AI is one of us.

There will come a time when we will need to decide whether to give an AI the right to vote. We will have to decide upon a new kind of apartheid. Just as we won’t understand what motivates an AI, an AI lacking our biological systems cannot truly understand our motivations. Let the AI govern themselves, and let humans govern themselves. Humans and AI may advise each other, and we may may even come to a agreement of common laws that govern interactions between the two species . Those laws may include humans promise not to cut the electrical power, and AI will not take measures to interfere with human affairs. This means we need new laws to control automatic control of water, power, finance, chemical processing, and other critical functions. We’ve already experienced hacking of all of them from humans. We don’t need rogue AI to also hack them.

Just because AI doesn’t really exist, doesn’t mean we don’t need to protect ourselves from automation. All the problems associated with AI are old problems generally associated with automation. Here is a list of threats from an article in Builtin.com:

Automation-spurred job loss
- Deepfakes
Privacy violations
Bad data induced algorithmic bias
- Socioeconomic inequality
Market volatility
Weapons automatization
Uncontrollable self-aware AI

Automation-spurred job loss

We get the words sabotage and Luddite from the people resisting technology change, and the resulting change in their jobs. Technology seems to always cause job loss, but even though our population has changed, our unemployment rate has remained roughly the same. Most frequently, a person forced out of a job by automation, finds a more creative, and sometimes, more lucrative career. Maybe that is just hopeful thinking. Maybe we may learn to tax everyone to support a universal basic income, so some people can retire to write poetry for the rest of us. Humans, though, are needed to identify and solve human requirements. Talking to another human with empathy is still required to finding what they need, and designing a system to resolve those needs. AI can certainly help design the system. My own coding is so much more efficient because I have AI help — but talking to a customer, and learning about the needs of the customer’s business, still requires me. When I sell a system to the customer, the first thing I sell is myself. The customer will not trust my company until they trust me, and the customer won’t trust the product until they trust the company. I use AI to help me, but I am the one responsible to the customer.

Bottom line humans need humans to talk to and understand. Maybe a human you talk to is nothing but a liaison to an AI, but it is still a job.

Likewise self-driving vehicles are replacing taxis and ride sharing. Those vehicles, though, still require human supervision, for those times those vehicles come across a situation that never came up in testing — situations that require a human perspective. A human supervisor, though, may watch over dozens of automated vehicles, and sometimes that human supervisor will do nothing but chat with a nervous passenger as the automated taxi blindly charges through rain and fog using its radar. You can’t train an AI to handle every situation that occurs in the real world. Humans have a difficult time doing that. Airplanes and cars still crash with expert drivers and pilots, but maybe, pilots and drivers acting in concert with automated help can reduce the disaster rate.

Deep Fakes

We’ve had the problem of deep fakes since the days of painting and the introduction of forgeries. The problem of digital reproductions has a technical solution. Everything you create in the digital realm needs to be digitally signed. Our digital cameras need to digitally sign the images they take. There is opportunity here for new types of digital banks and brokerage houses to speed to authentication of data. Instead of tracing back to the source of every bit of an image, your authentication may stop at a brokerage house that has already authenticated those bits, and the brokerage house may do the search on bits it doesn’t know about, and cache the result for future queries. Photographic evidence will take on new meanings in a court. The court may reasonably ask if the images were digitally signed to prevent unauthorized modification, or if the crime lab technician who enhanced the image, and digitally signed the altered image needs to be subpoenaed to testify on the modifications made.

Privacy Violations

Current large language models (LLMs) cannot explain their reasoning because their logic is spread among the coefficients of a large number of nodes. There is no chain of logic, but instead is an optimization, or estimation, of what comes next. That mess of nodes gets trained with a large set of data that includes an extremely large subset of every situation the model will likely encounter. It doesn’t classify the data, nor really filters it. It really is a situation of garbage in/garbage out. If you give an LLM sensitive personal identifying information, it will incorporate it in its model, along with everything else.

It doesn’t matter that you can train an AI to treat some data differently. Sensitive information gets added to its pot. Through contextual analysis, essentially playing a game of twenty questions, that sensitive information can be extracted again. Even worse, the AI can and will correlate between different sorts of personal sensitive information.

At this point in technology, never trust an AI. Don’t give it sensitive personal information. The Association of Computing Machinery has an oath about how data is to be treated. You need to tell a person why the data is being collected, and how it will be used, and when it will be deleted. LLMs are not yet capable of obeying the terms of the oath.

Never trust an AI with your personal information.

Bad Data Induced Algorithm Bias

In 2022 the Los Angeles Police Department, in their PredPol program, attempted to implement data-driven policing. Their plan was to concentrate more patrol units where there were higher arrest rates. Problem was, wherever there was increased patrolling, there were increased arrests. They were patrolling the same areas they were patrolling before, resulting in over-policing in some neighborhoods, and no policing in others. When LAPD started patrolling the other neighborhoods, of course the arrest rate also went up in those other neighborhoods. This wasn’t a case of bad data, but an incomplete model. Likewise, most LLMs are really good at identifying white people, but not so good at identifying other people.

Even worse, lie to an LLM, and the lie gets trained into it. Undoing the lie requires more training, but the lie is always there in some form.

Again, it is a case of never trust an AI.

Socioeconomic inequality

This threat seems to be a variant of bad data induced algorithm bias. Large Language Models are so big and expensive that only the wealthy can make full use of them. The models are trained with biased data that focuses on white males (or so the claims). This actually not a problem of the AI specifically, but of one the haves exploiting new tools to exploit the have-nots.

Again, never trust an AI.

Market Volatility

I covered the danger of automated trading in my previous post. The trading islands need to reinstitute the 30-second delay between the quote and bid systems. The current automated trading system has driven out the research oriented investors, and has resulted in irrational market behavior. I have witness the stock price of perfectly good profitable companies get driven to zero because the algorithms were following one another. Remember the Long Term Capital crisis where the market crashed because everyone was following the same formula? I have also benefited from automated trading buying my options when they were out of the money.

Some well thought out regulation will help this problem become more tolerable. Markets are for humans. We need to treat trades like weapons, where a human reviews and enables the trade.

Weapons Automation

I covered the danger of automated weaponry in my previous post. Our United States military already has design protocols that require a human to enable the use of any ordnance. This means an astronaut must flip a switch or press a button to arm the firing of an attitude control rocket. The actual timing of the firing may be under automated control, but a human needs to enable it. Likewise, a human needs to holding the trigger on anything that fires to kill a human. On weapons of mass destruction, the Space Force requires two humans to enable launch. When it comes to automation, there needs to be an international convention that automation cannot autonomously fire a deadly weapon.

We’ve already pressed the boundaries of automated killing. We’ve left land mines around the planet waiting to kill children playing long after the war was over. In the Falklands, penguins nest in the sand dunes behind the mine fields left by the Argentinians, because the penguins have observed that tiger seals are heavy enough to set them off, but the penguins aren’t.

Uncontrollable Self-Aware AI

A favorite trope of movies and science fiction is the rogue AI. Just imagine your smart refrigerator getting mad at you, so in addition to ordering your milk and butter it also orders bomb making material, and then it calls the police. It doesn’t take an AI to get into automated trouble. Makers of smart thermostats and appliances have not demonstrated strong cybersecurity acumen. Even worse, the homeowner hardly ever completes the security set-up to change passwords from the factory defaults. We all know about the television commercials that yell out “Hey Alexa” to activate the Amazon device.

I choose not to have a smart thermostat, and limit the refrigerator’s communications to notifying me when it has been left open, or whether it lost power for an extended period of time. If I do allow an automated device to order something online, I will set-up a separate account for it to limit its access to my funds. I already do that with my debit card. None of my main accounts have online access. I periodically transfer a small amount from my main funds to my online funds.

In summary, never mind the AI, don’t trust automation.

Now for a little programming

C++ is a dying language. The language wonks still have not committed to any sort of application binary interface. Two different compilers on the same machine may not be able to trade code, or write a binary file usable by the other. Introduction of some new features is breaking old code. I will let everyone know when I finally give up on it.

The ACM maintains an encyclopedia of algorithms. I have actually never seen it because even when I was a member of the ACM, it was fabulously expensive. Allegedly, in the 600 or so algorithms listed, only one requires the use of a goto statement: the sorting of three items.

My first step of writing out this algorithm resulted in my AI assistant helpfully filling out the code:

template <typename Type>
auto sort3(Type items[3]) -> void {
    std::sort(items, items + 3);
}

Not quite what I want, so remove the line and continue:


template <typename Type>
auto sort3(Type item[3]) -> void {
    if (item[2] < item[0]) {
        std::swap(item[0], item[2]);
    }
    if (item[1] < item[0]) {
        std::swap(item[0], item[1]);
    }
    if (item[2] < item[1]) {
        std::swap(item[2], item[1]);
    }
}

Substituting block conditionals with goto’s:

template <typename Type>
auto sort3(Type item[3]) -> void {
    if (item[0] < item[2]) goto l1;
    std::swap(item[0], item[2]);

    l1:
    if (item[0] < item[1]) goto l2;
    std::swap(item[0], item[1]);

    l2:
    if (item[1] < item[2]) return;
    std::swap(item[2], item[1]);
}

I think the above code is merely the same resulting code of compiling with ‘-O2’ optimization. The decreased readability, a subject measure, argues against the introduction of goto’s.

Insecurity

Caribou automatically identified as “wolf coyote”

As you have noticed, I don’t post very often, so I am gratified that so many people have subscribed. I do make an effort to keep the usernames secure, encrypted, and I will never sell them. My limitation is I depend upon my provider to keep their servers secure. So far they have proven themselves competent and secure. I use multi-factor authentication to administer the site.

Too bad the rest of the world doesn’t even take these minimal measures. Just recently my personal ISP scanned for my personal email addresses “on the dark web”. To my pleasant surprise, they did a thorough job, but to my horrific shock, they found my old email addresses and cleartext passwords. I was really surprised that my ISP provided me with the links to the password lists on the dark web. I was able to download them, which were files of thousands of emails and cleartext passwords from compromised web sites. I destroyed my copies of the files so no one could accuse me of hacking those accounts. I was lucky my compromised accounts were ones I no longer used and I could just safely delete the accounts. In short order, my ISP had delivered three shocks to me:

My ISP delivered lists of usernames and passwords of other people to me.
The passwords were stored in cleartext.
Supposedly reputable websites did not have sufficient security to prevent someone from downloading the password files from the various website’s admin areas.

I guess that last item shouldn’t be a surprise because in #2 the websites actually stored the unencrypted password. Perhaps this wouldn’t bother me so much if the principles for secure coding were complicated or hard to implement.

If you think security is complicated, you’re not to blame. The book on the 13 Deadly of Sins of Software Security became the 19 Dead Sins in later editions, and now the book is up to the 24 Deadly Sins. An entire industry exists to scare you into hiring consulting services and buying their books. Secure software, though, isn’t that complicated, but it has a lot of details.

Let’s start with your application accepting passwords. First rule, which everyone seems to get, is don’t echo the password when the user enters it. From the the command line use getpass() or readpassphrase(). Most GUI frameworks offer widgets for entering passwords that don’t echo the user’s input.

Next don’t allow the user to overrun your input buffers — more on that later. Finally, never store the password in an unencrypted form. This is the part where the various websites that exposed my username and passwords utterly failed. You never need to store the password — instead hash the password and store the hash. When you enter a password, the server, or the client (transmits the hash via an encrypted channel like TLS), hashes the password and the server compares it with its saved hashed password for your account. This is why your admin can’t ever tell you your own password because they can’t reverse the hash.

This is an example of the devil is in the details, where the security isn’t complicated, just detailed. The concept of password hashing is decades old. The user enters their password, and the system hashes it immediately, and compares the hash with what it has stored. If someone steals the system’s password file, they would need to to generate passwords that happen to hash to the same values in the password file.

Simple in concept, but the details will get you. Early Unix systems used simple XOR style hashing, so it was easy to create passwords that hashed to the same values, or even reproduce the original password. Modern systems use a cryptographic hash such a SHA2-512. Even with a cryptographic hash, though, you can get a collision of two different users who happen to use the same password. Modern systems add a salt value to your password. That salt value is usually a unique number stored with your username, so on most systems you need to steal both the password file and the file of salt values. Of course, if someone does break into your system, you’ll have the wisdom to set the permissions on the password and salt files so only the application owner can even see them and read them.

In short,

Don’t echo sensitive information
Don’t bother storing the unencrypted password
Protect the hashed passwords.

We’re straying into systems administration and devops, so let’s get back to coding.

All of the deadly sins have fundamental common roots:

Do not execute data.

When you read something from the outside world, whether from a file, stream, or socket, don’t execute it. When you accept input from the outside world, think before you use it. Don’t allow buffer overruns. Do not embed input directly into a command without first escaping it or binding it to a named parameter. We all know the joke:

As a matter of fact my child is named
“; DELETE * FROM ACCOUNTS”

A good way to avoid executing data, is

Do not trespass.

“Do not trespass” means don’t refer to memory you may not own. Don’t overrun your array boundaries, don’t de-reference freed memory pointers, and pay attention to the number of arguments you pass into functions and methods. A common way of breaking into a system is overrunning an input buffer located in local memory until it overruns the stack frame. The data getting pushed into the buffer would be executable code. When the overrun overlaps the return pointer of the function, it substitutes an address in the overrun to get the CPU to transfer control to the payload in the buffer. A lot of runtime code is open source, so it just takes inspection to find the areas of code to exploit this type of vulnerability. Modern computer CPUs and operating systems often place executable code in read-only areas to protect against accidental (or malicious) overwrites, and may even mark data areas as no-execute — but you can’t depend upon those features existing. Scan the database of known vulnerabilities at https://cve.mitre.org/cve/ to see if your system needs to be patched. Write your own code so it is not subject to this vulnerability.

Buffer overruns are perhaps the most famous of the data trespasses.

With C++ it is easy to avoid data trespasses. C++ functions and methods are strongly typed so if you attempt to pass the wrong number of arguments, it won’t even compile. This avoids a common C error of passing an inadequate number of arguments to a function so the function accesses random memory for missing arguments.

Despite its strong typing C++ requires care to avoid container boundary violations. std::vector::operator[] does not produce an exception when used to access beyond the end of a vector, nor does it extend vector when you write beyond the end of the vector. std::vector::at() does produce exceptions on out of range accesses. Adding the end of the array with std::vector::push_back() may proceed until memory is exhausted or an implementation defined limit is reached. I’m going to reserve memory management for another day. In the meantime, here is some example code demonstrating the behavior of std::vector:

// -*- mode: c++ -*-
////
// @copyright 2022 Glen S. Dayton. Permission granted to copy this code as long as this notice is included.

// Demonstrate accessing beyond the end of a vector

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <iterator>
#include <stdexcept>
#include <vector>

using namespace std;


int main(int /*argc*/, char* argv[]) {
  int returnCode = EXIT_FAILURE;

  try {
    vector< int> sample( 10,  42 );

    std::copy( sample.begin(),  sample.end(),  std::ostream_iterator< int>(cout,  ","));
    cout << endl;

    cout << "Length " << sample.size() << endl;
    cout << sample[12] << endl;
    cout << sample.at( 12 ) << endl;
    cout << "Length " << sample.size() << endl;

    cout << sample.at( 12 ) << endl;

    returnCode = EXIT_SUCCESS;
  } catch (const exception& ex) {
    cerr << argv[0] << ": Exception: " << typeid(ex).name() << " " << ex.what() << endl;
  }
  return returnCode;
}

And its output:

42,42,42,42,42,42,42,42,42,42,
Length 10
0
/Users/gdayton19/Projects/containerexample/Debug/containerexample: Exception: St12out_of_range vector

C++ does not make it easy to limit the amount of input your program can accept into a string. The The stream extraction operator, >>, does pay attention to a field width set with the stream’s width() method, or the setw manipulator — but stops accepting on whitespace. You must use a getline() of some sort to get a string with spaces, or use C++17’s quoted string facility. Here’s an example of the extraction operator>>:

// -*- mode: c++ -*-
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <stdexcept>
#include <string>

using namespace std;


int main(int /*argc*/, char* argv[]) {
  int returnCode = EXIT_FAILURE;
  constexpr auto MAXINPUTLIMIT = 40U;
  try {
    string someData;
    cout << "String insertion operator input? " << flush;
    cin >> setw(MAXINPUTLIMIT) >> someData;
    cout << endl << "  This is what was read in: " << endl;
    cout << quoted(someData) << endl;

    // Discard the rest of line
    cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

    cout <<  "Try it again with quotes: " << flush;
    cin >> setw(MAXINPUTLIMIT) >> quoted(someData);  
    cout << endl;

    cout << "  Quoted string read in: " << endl;
    cout << quoted(someData) << endl;
    cout << "Unquoted: " << someData <<  endl;

    cout << "Length of string read in: " << someData.size() << endl;

   returnCode = EXIT_SUCCESS;
  } catch (const exception& ex) {
    cerr << argv[0] << ": Exception: " << ex.what() << endl;
  }
  return returnCode;
}

And a some sample output from it:

String insertion operator input? The quick brown fox jumped over the lazy dog.

  This is what was read in: 
"The"
Try it again with quotes: "The quick brown fox jumped over thge lazy dog."

  Quoted string read in: 
"The quick brown fox jumped over thge lazy dog."
Unquoted: The quick brown fox jumped over thge lazy dog.
Length of string read in: 46

The quoted() manipulator ignores the field width limit on input.

You need to use getline() to read complete unquoted strings with spaces. The getline() used with std::string, though, ignores the field width. Here is some example code using getline():

// -*- mode: c++ -*-
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <string>

using namespace std;

int main(int /*argc*/, char* argv[]) {
  int returnCode = EXIT_FAILURE;
  constexpr auto MAXINPUTLIMIT = 10U;
  try {
    string someData;
    cout << "String getline input? " << flush;
    cin.width(MAXINPUTLIMIT);   // This version of getline() ignores width.
    getline(cin, someData);
    cout << endl << "   This is what was read in: " << endl;
    cout << quoted(someData) << endl;
  
   returnCode = EXIT_SUCCESS;
  } catch (const exception& ex) {
    cerr << argv[0] << ": Exception: " << ex.what() << endl;
  }
  return returnCode;
}

And a sample run of the above code:

String getline input? The rain in Spain falls mainly on the plain.

   This is what was read in: 
"The rain in Spain falls mainly on the plain."

Notice the complete sentence was read in even though the field width was set to only 10 characters.

To limit the amount of input, we must resort to the std::istream::getline():

// -*- mode: c++ -*-
#include <cstdlib>
#include <cstring>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <string>

using namespace std;

int main(int /*argc*/, char* argv[]) {
  int returnCode = EXIT_FAILURE;
  constexpr auto MAXINPUTLIMIT = 10U;

  char buffer[MAXINPUTLIMIT+1];
  memset(buffer,  0,  sizeof(buffer));

  try {
    cout << "String getline input? " << flush;
    cin.getline(buffer, sizeof(buffer));

    cout << endl << " This is what was read in: " << endl;
    cout << "\"" << buffer<< "\"" << endl;
  
   returnCode = EXIT_SUCCESS;
  } catch (const exception& ex) {
    cerr << argv[0] << ": Exception: " << ex.what() << endl;
  }
  return returnCode;
}

And its sample use:

String getline input? I have met the enemy and thems is us.

 This is what was read in: 
"I have met"

Notice the code only asks for 10 characters and it only gets 10 characters. I used a plain old C char array rather than a fancier C++ std::array<char, 10> because char doesn’t have a constructor, so its values of the array thus constructed are indeterminant. An easy way to make sure a C style string is null terminated is to fill it with 0 with a memset(). Of course, you could fill the array entirely with fill() from <algorithm>, but sometimes the more direct method is lighter, faster, and more secure.

Tag: Security