A brief history of the development of the three paradigms in the AI ​​field

Lei Feng: press author: Tomasz Malisiewicz, Dr. CMU. It mainly introduces three paradigms in the AI ​​field: logic, probability methods and deep learning.

Today, we come together to review the three paradigms formed in the field of artificial intelligence (AI) over the past 50 years: logic, probability methods, and deep learning. Today, the concept of big data and deep learning has been deeply rooted in people's minds regardless of experience or "data-driven" methods, but it was not so early. Many early artificial intelligence methods were based on logic, and the transition from logic-to-data-driven methods was deeply influenced by the theory of probability. Let's talk about this process.

This article expands in chronological order, first reviewing the logic and probabilistic diagram methods, and then making some predictions about the future direction of artificial intelligence and machine learning.

Figure 1: Probability Diagram Model Lesson from Picture Coursera

First, logic and algorithm (common sense "thinking" machine)

Many early artificial intelligence tasks were concerned with logic, automatic theorem proving and manipulating various symbols. John McCarthy's seminal paper titled "Common Sense Programming" written in 1959 was also taken advantage of.

If we open one of the most popular AI textbooks in the world, Artificial Intelligence: A Modern Approach (AIMA), we will directly notice that the beginning of the book is to introduce search, constraint satisfaction, first-order logic and planning . The cover of the third edition (see image below) is like a large chessboard (because it is a symbol of human wisdom), and also prints Alan Turing (father of computer theory) and Aristotle (the greatest classical philosopher). One of the photos symbolizing wisdom.

Figure 2: The cover of the AIMA, which is the normative textbook for the CS professional undergraduate AI course

However, logic-based AI obscures perception issues, and I long ago argued that understanding the principles of perception is the golden key to unlocking the mystery of intelligence. Perception is something that is easy for humans and hard to master. (Extended reading: "Computer vision is artificial intelligence," the author's 2011 blog post) The logic is pure, the traditional chess robot is also purely algorithmic, but the real world is ugly, dirty, full of uncertainty Sex.

I think most contemporary artificial intelligence researchers believe that logic-based AI is dead. The world in which everything can be perfectly observed without measurement errors is not the real world where robots and big data are located. We live in the era of machine learning, where digital technology defeats first-order logic. Standing in 2015, I really feel sorry for those stupid people who are surely sticking to the former and dropping the gradient.

The logic is well-suited to explain in class. I suspect that once there is enough cognitive problems to be “essentially resolved”, we will see the recovery of logic. There are many open cognitive issues in the future, so there are many scenarios in which communities do not need to worry about cognitive issues and begin to reexamine these classic ideas. Maybe in 2020.

II. Probability, Statistics and Graph Models ("Measuring" Machines)

Probabilistic methods in artificial intelligence are used to solve the problem of uncertainty. The chapter “Artificial Intelligence: A Modern Approach” introduces "uncertain knowledge and reasoning" and introduces these methods vividly. If you pick up the AIMA for the first time, I suggest you read it from this section. If you are a student who has just come into contact with AI, do not work hard in math.

Figure 3: PDF file of Probability Theory and Mathematical Statistics from Pennsylvania State University

Most people think that they only count when they mention the probabilistic method. It is easy for laymen to assume that probabilistic methods are fancy counting methods. Then we briefly review the two equal methods in statistical thinking in the past.

The frequency theory approach is very dependent on experience—these methods are data-driven and rely solely on data to make inferences. The Bayesian method is more complex and it combines data-driven likelihood and a priori. These transcendental often come from the first principle or "intuition", and the Bayesian method is good at combining data and heuristic thinking to make smarter algorithms - a perfect combination of rationalism and empiricist worldview.

The most exciting, later frequency theory and Bayesian disputes are something that is called a probability map model . This type of technology comes from the field of computer science. Although machine learning is now an important part of CS and statistics, its powerful capabilities are really released when statistics and computing are combined.

Probabilistic graph models are a combination of graph theory and probabilistic methods. In the mid-2000s, they all became popular among machine learning researchers. When I was in graduate school (2005-2011), variational methods, Gibbs sampling, and belief propagation algorithms were deeply embedded in the brains of each CMU graduate student, and provided us with an excellent opportunity to think about machine learning problems. Psychological framework. Most of what I know about graph models comes from Carlos Guestrin and Jonathan Huang. Carlos Guestrin is now the CEO of GraphLab (now Dato), which produces large-scale products for machine learning of images. Jonathan Huang is now a senior researcher at Google.

Although the following video is an overview of GraphLab, it also perfectly illustrates "graphic thinking" and how modern data scientists can use it handily. Carlos is an excellent lecturer, his speech is not limited to the company's products, more is the idea of ​​providing the next generation of machine learning systems.

(Figure 4: Introduction to the Calculation Method for Probability Models | Dato CEO, Prof. Carlos Guestrin)

If you think deep learning can solve all machine learning problems, you really have to take a look at the above video. If you are building a recommendation system, a health data analysis platform, designing a new trading algorithm, or developing next-generation search engines, graph models are the perfect starting point.

Third, deep learning and machine learning (data driver)

Machine learning is the process of learning from the sample, so current state-of-the-art identification techniques require a large amount of training data, as well as deep neural networks and patience. Deep learning emphasizes the network architecture of today's successful machine learning algorithms. These methods are all based on "deep" multilayer neural networks that contain many hidden layers. Note: What I want to emphasize is that the deep structure is no longer what is new today (2015). Just take a look at the 1998 "Deep" structure article.

Figure 5: LeNet-5, Yann LeCun's groundbreaking paper "Gradient-Based Document Recognition Method"

When you read the LeNet model guide, you can see the following terms and conditions:

To run this example on a GPU, you first need to have a good GPU. GPU memory needs at least 1GB. If the monitor is attached to a GPU, more memory may be needed.

When the GPU is connected to the display, each GPU function call has a time limit of a few seconds. This is indispensable because the current GPU cannot continue to serve the display while it is performing operations. Without this restriction, the monitor will freeze for too long and the computer will appear to be dead. If you handle this example with a medium-quality GPU, you will encounter problems beyond the time limit. This time limit does not exist when the GPU is not connected to the monitor. You can reduce the batch size to solve the timeout problem.

I'm really curious as to how Yann really took out some of his depth models as early as 1998. Not surprisingly, we all have to spend another decade to digest this content.

Update: Yann said (via Facebook's comments) ConvNet work dates back to 1989. "It has about 400K connections, and it took about 3 weeks to train the USPS data set (8000 training samples) on a SUN4 machine." - LeCun

Figure 6: Depth Network, Yann's Achievements at Bell Labs, 1989

Note: In about the same period (around 1998), two crazy guys in California tried to cache the entire Internet to their computers (they founded a G-headed company). I don't know how they did it, but I think sometimes we need to do something that is not large-scale before we can achieve great achievements. The world will eventually catch up.

in conclusion:

I didn't see the traditional first-order logic quickly returning. Although there is a lot of hype behind deep learning, the impact of distributed systems and “graphic thinking” on data science is more likely to be far more profound than CNN. There is no reason why deep learning should not be combined with the GraphLab-style architecture. A major breakthrough in the field of machine learning in the coming decades is also likely to come from the combination of these two components.

Lei Fengwang Note: The article was translated by readers and authorized by Lei Feng.com (search for “Lei Feng Net” public number) . If you need to reprint, please contact the authorize and retain the source and author, and you must not delete the content.

Posted on