Several years ago, I wrote an article complaining about using bit flags in SQL. But one of the underlying problems has to do with language in answering simple yes/no questions. The first problem is that a question can be worded in a positive or negative fashion. For example, “do you have bananas?” is a positive question and “don’t you have any bananas?” is a negative wording.
In English, we always answer the question as if it was worded positively, instead of the way it’s written. Not everybody in the world does this. There’s a novelty song entitled “Yes, we have no bananas” which was popular in the early 1920s. It is supposedly based on a real incident when the songwriter heard a fruit stand owner answering a question. It sounded so funny to his English-speaking ears that he wrote a song about it.
Linguists define a five-way system of designing responses with a word or other linguistic device.
- Two-form languages: yes and no. This is how English and most European languages work today. If there are bananas at the fruit stand, then the answer is “yes”, no matter how the question was worded.
- Three-form languages: yes (to a positive question), yes (to a negative question) and no. Scandinavian languages have this construct for negatively phrased questions.
For example, let’s assume that we do have bananas. The question “do you have bananas?” would be answered, “Ja”, while the question “haven’t you any bananas?” would be answered “Jo” instead. If bananas do not exist, then both questions are answered “Nej” instead.
- Four-form languages: yea, yes, nay, no. This is an extension of the three-form language, but it includes a way of handling the cases of a negative or positive question and no bananas. This is the way that early modern English worked. The changes in simplification of English over the centuries is a good topic in itself, but it doesn’t have anything to do with data.
- Agreement languages: there are no words like yes and no, so you have to use a word, phrase or grammatical construction. In Japanese, you say “That is correct” (hai) versus “That is incorrect” (Iie) instead. This difference led to some problems with diplomats before World War II, negotiating surrender and war crimes trials at the end of World War II.
- Echo languages: no general words for yes or no, only positive or negative echo statements. You repeat the question back as a statement. Latin and Mandarin fall in this category. The advantage of this form is there’s no confusion as to what you’re affirming or denying.
In the real world, cultures are another huge problem. Some cultures don’t like to refuse anything to another person, so when they say “yes”, what they mean is “I heard you” and it has nothing to do with whether they agree or disagree with your statement. A horrible example of this mindset is watching some people install software in which they never answer “no” to an installation question. They are like dogs that have learned to shake hands, but they have no idea what they agree to when they do it.
And we haven’t even gotten the problem of people wanting “yes/no” answers to questions that are not “yes/no” questions! Yes, I have been listening to some of the Democratic Party candidate debates. You really can’t say “yes” or “no” to a question like “How does your health care plan work?” when asked by the moderator.
Finally, some languages put the negation in the verb form as opposed to a particle marker or separate word of some kind. In these languages, to do something and to not do something are both parts of the same verb.
Boolean functions should be familiar to the readers. Their input parameters are zeros and ones, and they return either a zero or one. This how we define logic circuits in computers. But the truth is that this is a more general tool; you can think of them as a series of yes/no questions that eventually return a yes or no output. If you’ve ever gone through a loan application or other questionnaire, you have seen a less “computer science-y” use of this tool.
Mathematicians talk about various properties of a Boolean function. The sensitivity of a Boolean function roughly measures the likelihood that flipping a single input bit will alter the output bit. Not all inputs carry equal weight. For example, if I tell my loan officer “My salary is over $500,000 per year”, then will I qualify for my loan, even though it’s a complete lie. The loan application is sensitive to that one question. But if I need to lie on seven or eight questions to get my loan, then the function is not as sensitive. Before you think this was a simple problem, it has stood for almost 30 years. Google “Sensitivity Conjecture of Nisan and Szegedy”, if you like to read math papers.
Sensitivity is usually one of the easiest complexity measures to compute, but it’s far from the only illuminating measure. For instance, instead of handing you a paper application, the loan application program could have started with a single question and then used your answer to determine what question to ask next. This is a decision tree. The largest number of questions the bank would ever need to ask before reaching a loan decision is the Boolean function’s query complexity.
This query complexity measure arises in a host of settings — for instance, a doctor might want to use as few tests as possible before reaching a diagnosis, or a machine learning expert might want an algorithm to examine as few features of an object as possible before classifying it. Low query complexity is usually a good thing, but it is not always possible.
There’s a whole bunch of theorems and algorithms for putting a Boolean expression into Conjunctive Normal Form (CNF) or Disjunctive Normal Form (DNF). In English, this means we join things with ANDs or with Ors. By putting Boolean expressions in one of these canonical forms, we can do calculations to look for properties much more easily. It’s also always possible to get a Boolean expression into one of these canonical forms.
Over the years, computer scientists have developed many ways to measure a given Boolean function. Each measure captures a different aspect of how the information in the input string determines the output bit. Each measure provides a unique window into the structure of the Boolean function. Yet computer scientists have found that nearly all these measures fit into a unified framework so that the value of any one of them is a rough gauge for the value of the others. Only one complexity measure that didn’t seem to fit in was sensitivity.
Now Hao Huang, a mathematician at Emory University, has proved the sensitivity conjecture with an ingenious but elementary two-page argument about the combinatorics of points on cubes. This just happened on 2019-07-01! This is pretty recent, and I don’t expect to use it in any applications immediately. But I do expect to see that it’s going to get incorporated into query optimizers in the next year or two.
Other measures involve looking for the simplest way to write the Boolean function as a mathematical expression (query optimization and re-writing) or calculating how many answers the banker would have to show a boss to prove they had made the right loan decision. There’s even a quantum physics version of query complexity in which the banker can ask a “superposition” of several questions at the same time. Figuring out how this measure relates to other complexity measures has helped researchers understand the limitations of quantum algorithms.
The Moral to The Story
There are a few quick morals to this article:
- It’s hard to ask a good clear question when you’re looking for data.
- It’s hard to give a good clear answer to a question, even when it seems simple.
- Complicated, complex questions will have complicated, complex solutions by their nature.