Speaker 1 (00:05):
This short presentation will give you an overview of When Deep Learning is Used and provide discussion around when you actually need to build deep learning models from scratch. Deep learning takes its name from Deep Neural Networks. Artificial Neural Networks were inspired by progress in brain research and have been discussed since the 1950s. Here is the picture of a biological neuron, and this is an analogous picture of an artificial neuron. You have some input connections with different weights. They are summed and multiplied by some functions called “activation functions” to provide the output.
This is a Single Layer Perceptron. This logic is from about the 1960s. Frank Rosenblatt was the person who created it. This particular machine only has 20 by 20 photocells, single layer. So 20 by 20 means a 400 pixel image. To change the weights on the connections, they use electrical motors to change the resistance of the mechanical resistors. This is a Single Layer Perceptron and this is a Deep Neural Network. It has some inputs, it has outputs, and in the middle it has hidden layers. This is the basic structure of a Forward Neural Network.
There has been a lot of progress over the last 30 years. Yann LeCun, a very talented person, had a big breakthrough. Currently he's a main AI person at Facebook, and he's also a professor at the University of New York. I think he worked with Hinton at some point, but this was in New Jersey at AT&T Labs, and he created Convolutional Neural Networks, which were used to recognize handwritten digits. They were used in banks after that. This was a great success. Computers were very slow and he used specialized computer equipment. Since then, a lot of progress has been made in the University of Toronto, Canada in the laboratory of Dr. Jeffrey Hinton. In 2006, there was a seminal work where the Deep Neural Network, fully connected between layers, was trained. It's called the Deep Belief Network.
Several years later, while working in his lab, Alex Krizhevsky created a new version of LeNet. LeNet is this network by Yann LeCun. That's why it's called LeNet, LeCun, and Alex called it AlexNet. The big difference is that he used GPUs, video cards. This made the whole thing a thousand times faster and became the major breakthrough and main approach. After that, everybody started using GPUs and specialized CPUs, like TPUs, like Google, for example, tons processing units, and so on. Progress developed very quickly after that. We had tremendous growth and acceptance, and people started doing AI, which became possible. (When people say “AI,” this is the same Deep Learning Machine Learning, but dealing with activities which are usually associated with human-level intelligence and human abilities, like the ability to read, understand speech, understand images and video, drive cars, extract information from texts, and so on.)
There are multiple architectures on how Neural Networks can be built and they became big over the years. In 2020, the biggest network was GPT-3. It's a language model from open AI, and it has 175 billion parameters. If each parameter is a float number with four bytes, that’s close to a terabyte of data, just to keep the parameters, nevermind how many thousands of computers and millions of dollars you would need to train the network, deploy it, and use it. So, multiple network architectures were designed over the years. All major clouds, including Google, Azure, Microsoft Cloud, AWS provide models. More than that, they all provide architectures, which are pre-trained so you don't have to train your network on millions of images. You can just use a model, which is already pre-trained, and maybe tune it on a couple hundred images, which you have for your specific task.
This makes models very accessible, affordable, and practical. It's very fast to start using the model. Also, many models are provided as part of so-called cognitive services, where they're already trained and you get API access. For example, you can send an audio file and receive back the transcription into text. You don't have to train anything. You just use the models, which are already trained and ready to go. Very rarely do you need to do heavy lifting and train models from scratch.
Most common data science tasks, when people talk about Machine Learning, they are actually dealing with tabular data. Think about Excel spreadsheets — think about relational databases. You have rows and columns, and there is a lot of data science that can be done with this kind of data. There are predictive analytics. People love to make predictions in business, predicting risk assessment, forecasting, maybe some seasonal changes, trends, investments, optimization of processes, cutting costs, recommended systems, and so on. All of this is dealing with tabular data. There’s been a lot of progress done in mathematics on how to actually solve these problems. They are usually solved by classical Machine Learning methods, like a linear regression, logistic regression, and ensemble decision trees, such as Random Forests and Boosted Trees, like XG Boost.
There are many tasks which can be done using cognitive services. These are ready-to-use services. The problem, though, is that they may be costly. You can try them first — you can see if it is possible to do what you want to do. You can see, for example, if the quality of your data is good enough to do something with. But, if the final result is not good for production, because it's too expensive, you can use open source tools on Linux servers. You can prototype them on the Cloud, or you can go for your own data center and your own private Cloud. When you're dealing with Deep Learning, it's very important to have GPU-enabled servers, which speeds up calculation a thousand times, or at least a hundred times. A go-to approach is to use open source Machine Learning software like Kubeflow, developed at Google. This is a framework to run Machine Learning models, Deep Learning models, TensorFlow, also from Google, Scikit-Learn, which is a good, independent library, or PyTorch, which is from Facebook. There are many others.
So, we're talking about using Private Cloud private data center with GPU-enabled servers. Also, there are certain areas where custom models are still required, and you still need to train custom models, possibly from scratch. The examples may be related to detection of defects in manufacturing, fraud detection, or anomaly detection. In most of these cases, it's possible to pre-trained models, which makes training cheaper and faster, but still you need to train them. In conclusion, most data science tasks are based on tabular data, and don't require Deep Learning at all. For some tasks, when deep learning is needed, it can be prototyped very quickly on major clouds using pre-trained models and transfer learning. For cost savings, the Deep Learning models can be moved to a private cloud. Thank you.