Up next


How good is your fit? - Ep. 21 (Deep Learning SIMPLIFIED)

3,274,820 Views
AI Lover
3
Published on 12/17/22 / In How-to & Learning

A good model follows the “Goldilocks” principle in terms of data fitting. Models that underfit data will have poor accuracy, while models that overfit data will fail to generalize. A model that is “just right” will avoid these important problems.

Deep Learning TV on
Facebook: https://www.facebook.com/DeepLearningTV/
Twitter: https://twitter.com/deeplearningtv

Suppose you are trying to classify big cats based on features such as claw size, sex, body dimensions, bite strength, color, speed, and the presence of a mane. Due to various deficiencies in the training process and data set, the resultant model may fail to fully differentiate the various types of cats. For example, a rule-based model may predict that any cat with a mane that roars is a lion, while ignoring that if such a cat is female, it is technically referred to as a lioness. Since this model does not take all necessary features into account while performing classification, it is said to underfit the data. The problem of underfitting is solved by adding more detail to the model to ensure that it properly captures the differences between classes.

On the other hand, a model may also consider every possible detail and develop very specific, complex rules for classification. For example, if one data point represents a lion that is 3.5 feet tall, weighs 305 pounds, and has 2.9-inch claws, the model may develop a rule that classifies every 3.5 foot tall, 305 pound cat with 2.9-inch claws as a lion. Such rules will accurately classify the training data, but will poorly generalize to new data samples. A model that develops these kinds of rules is said to overfit the data. In other words, the model has failed to identify the true patterns that differentiate the classes. As a separate example, if the data only contained tigers that grew up in a zoo, the model may have difficulty classifying tigers that grew up in the wild. So while improving the process of data collection is helpful to prevent this problem, the model must be designed to identify the most important patterns that identify a class, so that new samples can be properly classified.

With regards to neural networks, overfitting typically stems from too many input features, or the use of an overly-complicated network configuration. If the input count is too large, the training process may start to assign weights to features that either aren't needed or add unnecessary complexity to the model. An overly-complicated configuration may lead to the development of specific rules that improperly relate many different features, resulting in poor generalization.

Overfitting is a common problem in data science. One popular method to reduce overfitting is the use of a cross-validation data set along with parameter averaging. For neural networks, a common method is regularization. There are different types such as L1 and L2, but each of these follows the same principle – penalize the model for letting weights and biases become too large. Another method is Max Norm constraints, which directly adds a size limit to the weights and biases. A different approach is dropout, which randomly switches off certain neurons in the network, preventing the model from becoming too dependent on a set of neurons and the associated weights and biases. While these methods are broadly applied across the model rather than used for systematically searching for problem patterns, they have been proven to reduce and sometimes prevent the problem of overfitting.

Credits
Nickey Pickorita (YouTube art) -
https://www.upwork.com/freelan....cers/~0147b8991909b2
Isabel Descutner (Voice) -
https://www.youtube.com/user/IsabelDescutner
Dan Partynski (Copy Editing) -
https://www.linkedin.com/in/danielpartynski
Marek Scibior (Prezi creator, Illustrator) -
http://brawuroweprezentacje.pl/
Jagannath Rajagopal (Creator, Producer and Director) -
https://ca.linkedin.com/in/jagannathrajagopal

Show more
0 Comments sort Sort By

Up next