It has been said that Artificial Intelligence will define the next generation of software solutions. If you are even remotely involved with technology, you will almost certainly have heard the term with increasing regularity over the last few years. It is likely that you will also have heard different definitions for Artificial Intelligence offered, such as:
“The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings.” – Encyclopedia Britannica
“Intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans.” – Wikipedia
How useful are these definitions? What exactly are “tasks commonly associated with intelligent beings”? For many people, such definitions can seem too broad or nebulous. After all, there are many tasks that we can associate with human beings! What exactly do we mean by “intelligence” in the context of machines, and how is this different from the tasks that many traditional computer systems are able to perform, some of which may already seem to have some level of intelligence in their sophistication? What exactly makes the Artificial Intelligence systems of today different from sophisticated software systems of the past?
It could be argued that any attempt to try to define “Artificial Intelligence” is somewhat futile, since we would first have to properly define “intelligence”, a word which conjures a wide variety of connotations. Nonetheless, this article attempts to offer a more accessible definition for what passes as Artificial Intelligence in the current vernacular, as well as some commentary on the nature of today’s AI systems, and why they might be more aptly referred to as “intelligent” than previous incarnations.
Firstly, it is interesting and important to note that the technical difference between what used to be referred to as Artificial Intelligence over 20 years ago and traditional computer systems, is close to zero. Prior attempts to create intelligent systems known as expert systems at the time, involved the complex implementation of exhaustive rules that were intended to approximate intelligent behavior. For all intents and purposes, these systems did not differ from traditional computers in any drastic way other than having many thousands more lines of code. The problem with trying to replicate human intelligence in this way was that it requires far too many rules and ignores something very fundamental to the way intelligent beings make decisions, which is very different from the way traditional computers process information.
Let me illustrate with a simple example. Suppose I walk into your office and I say the words “Good Weekend?” Your immediate response is likely to be something like “yes” or “fine thanks”. This may seem like very trivial behavior, but in this simple action you will have immediately demonstrated a behavior that a traditional computer system is completely incapable of. In responding to my question, you have effectively dealt with ambiguity by making a prediction about the correct way to respond. It is not certain that by saying “Good Weekend” I actually intended to ask you whether you had a good weekend. Here are just a few possible intents behind that utterance:
◈ Did you have a good weekend?
◈ Weekends are good (generally).
◈ I had a good weekend.
◈ It was a good football game at the weekend, wasn’t it?
◈ Will the coming weekend be a good weekend for you?
And more.
The most likely intended meaning may seem obvious, but suppose that when you respond with “yes”, I had responded with “No, I mean it was a good football game at the weekend, wasn’t it?”. It would have been a surprise, but without even thinking, you will absorb that information into a mental model, correlate the fact that there was an important game last weekend with the fact that I said “Good Weekend?” and adjust the probability of the expected response for next time accordingly so that you can respond correctly next time you are asked the same question. Granted, those aren’t the thoughts that will pass through your head! You happen to have a neural network (aka “your brain”) that will absorb this information automatically and learn to respond differently next time.
The key point is that even when you do respond next time, you will still be making a prediction about the correct way in which to respond. As before, you won’t be certain, but if your prediction fails again, you will gather new data, which leads to my suggested definition of Artificial Intelligence, as it stands today:
“Artificial Intelligence is the ability of a computer system to deal with ambiguity, by making predictions using previously gathered data, and learning from errors in those predictions in order to generate newer, more accurate predictions about how to behave in the future”.
This is a somewhat appropriate definition of Artificial Intelligence because it is exactly what AI systems today are doing, and more importantly, it reflects an important characteristic of human beings which separates us from traditional computer systems: human beings are prediction machines. We deal with ambiguity all day long, from very trivial scenarios such as the above, to more convoluted scenarios that involve playing the odds on a larger scale. This is in one sense the essence of reasoning. We very rarely know whether the way we respond to different scenarios is absolutely correct, but we make reasonable predictions based on past experience.
Just for fun, let’s illustrate the earlier example with some code in R! If you are not familiar with R, but would like to follow along. First, lets start with some data that represents information in your mind about when a particular person has said “good weekend?” to you.
In this example, we are saying that GoodWeekendResponse is our score label (i.e. it denotes the appropriate response that we want to predict). For modelling purposes, there have to be at least two possible values in this case “yes” and “no”. For brevity, the response in most cases is “yes”.
We can fit the data to a logistic regression model:
library(VGAM)
greetings=read.csv('c:/AI/greetings.csv',header=TRUE)
fit <- vglm(GoodWeekendResponse~., family=multinomial, data=greetings)
Now what happens if we try to make a prediction on that model, where the expected response is different than we have previously recorded? In this case, I am expecting the response to be “Go England!”. Below, some more code to add the prediction. For illustration we just hardcode the new input data, output is shown in bold:
response <- data.frame(FootballGamePlayed="Yes", WorldCup="Yes", EnglandPlaying="Yes", GoodWeekendResponse="Go England!!")
greetings <- rbind(greetings, response)
fit <- vglm(GoodWeekendResponse~., family=multinomial, data=greetings)
prediction <- predict(fit, response, type="response")
prediction
index <- which.max(prediction)
df <- colnames(prediction)
df[index]
No Yes Go England!!
1 3.901506e-09 0.5 0.5
> index <- which.max(prediction)
> df <- colnames(prediction)
> df[index]
[1] "Yes"
The initial prediction “yes” was wrong, but note that in addition to predicting against the new data, we also incorporated the actual response back into our existing model. Also note, that the new response value “Go England!” has been learnt, with a probability of 50 percent based on current data. If we run the same piece of code again, the probability that “Go England!” is the right response based on prior data increases, so this time our model chooses to respond with “Go England!”, because it has finally learnt that this is most likely the correct response!
No Yes Go England!!
1 3.478377e-09 0.3333333 0.6666667
> index <- which.max(prediction)
> df <- colnames(prediction)
> df[index]
[1] "Go England!!"
Do we have Artificial Intelligence here? Well, clearly there are different levels of intelligence, just as there are with human beings. There is, of course, a good deal of nuance that may be missing here, but nonetheless this very simple program will be able to react, with limited accuracy, to data coming in related to one very specific topic, as well as learn from its mistakes and make adjustments based on predictions, without the need to develop exhaustive rules to account for different responses that are expected for different combinations of data. This is this same principle that underpins many AI systems today, which, like human beings, are mostly sophisticated prediction machines. The more sophisticated the machine, the more it is able to make accurate predictions based on a complex array of data used to train various models, and the most sophisticated AI systems of all are able to continually learn from faulty assertions in order to improve the accuracy of their predictions, thus exhibiting something approximating human intelligence.
Machine learning
You may be wondering, based on this definition, what the difference is between machine learning and Artificial intelligence? After all, isn’t this exactly what machine learning algorithms do, make predictions based on data using statistical models? This very much depends on the definition of machine learning, but ultimately most machine learning algorithms are trained on static data sets to produce predictive models, so machine learning algorithms only facilitate part of the dynamic in the definition of AI offered above. Additionally, machine learning algorithms, much like the contrived example above typically focus on specific scenarios, rather than working together to create the ability to deal with ambiguity as part of an intelligent system. In many ways, machine learning is to AI what neurons are to the brain. A building block of intelligence that can perform a discreet task, but that may need to be part of a composite system of predictive models in order to really exhibit the ability to deal with ambiguity across an array of behaviors that might approximate to intelligent behavior.
Practical applications
There are a number of practical advantages in building AI systems, but as discussed and illustrated above, many of these advantages are pivoted around “time to market”. AI systems enable the embedding of complex decision making without the need to build exhaustive rules, which traditionally can be very time consuming to procure, engineer and maintain. Developing systems that can “learn” and “build their own rules” can significantly accelerate organizational growth.
Microsoft’s Azure cloud platform offers an array of discreet and granular services in the AI and Machine Learning domain, that allow AI developers and Data Engineers to avoid re-inventing wheels, and consume re-usable APIs. These APIs allow AI developers to build systems which display the type of intelligent behavior discussed above.