<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8" />

   <style type="text/css">

     .mpcs-classroom .nav-back i,
     .mpcs-classroom .navbar-section a.btn,
     .mpcs-classroom .navbar-section a,
     .mpcs-classroom .navbar-section button {
       color: rgba(255, 255, 255) !important;
     }

     .mpcs-classroom .navbar-section .dropdown .menu a {
       color: rgba(44, 54, 55) !important;
     }

     .mpcs-classroom .mpcs-progress-ring {
       background-color: rgba(29, 166, 154) !important;
     }

     .mpcs-classroom .mpcs-course-filter .dropdown .btn span,
     .mpcs-classroom .mpcs-course-filter .dropdown .btn i,
     .mpcs-classroom .mpcs-course-filter .input-group .input-group-btn,
     .mpcs-classroom .mpcs-course-filter .input-group .mpcs-search,
     .mpcs-classroom .mpcs-course-filter .input-group input[type=text],
     .mpcs-classroom .mpcs-course-filter .dropdown a,
     .mpcs-classroom .pagination,
     .mpcs-classroom .pagination i,
     .mpcs-classroom .pagination a {
       color: rgba(44, 54, 55) !important;
       border-color: rgba(44, 54, 55) !important;
     }

     /* body.mpcs-classroom a{
       color: rgba();
     } */

     #mpcs-navbar,
     #mpcs-navbar button#previous_lesson_link,
     #mpcs-navbar button#previous_lesson_link:hover {
       background: rgba(44, 54, 55);
     }

     .course-progress .user-progress,
     .btn-green,
     #mpcs-navbar button:not(#previous_lesson_link){
       background: rgba(29, 166, 154, 0.9);
     }

     .btn-green:hover,
     #mpcs-navbar button:not(#previous_lesson_link):focus,
     #mpcs-navbar button:not(#previous_lesson_link):hover{
       background: rgba(29, 166, 154);
     }

     .btn-green{border: rgba(29, 166, 154)}

     .course-progress .progress-text,
     .mpcs-lesson i.mpcs-circle-regular {
       color: rgba(29, 166, 154)
     }

     #mpcs-main #bookmark, .mpcs-lesson.current{background: rgba(29, 166, 154, 0.3)}

     .mpcs-instructor .tile-subtitle{
       color: rgba(29, 166, 154, 1)
     }

   </style>
    <style media="screen">

.simplesocialbuttons.simplesocialbuttons_inline .ssb-fb-like { margin: ; } /*inline margin*/

.simplesocialbuttons.simplesocialbuttons_inline.simplesocial-simple-icons button{ margin: ; }

/*margin-digbar*/

</style>

The weights of the model are initialized to small random values and updated via an optimization algorithm in response to estimates of error on the training dataset.

Given the use of small weights in the model and the use of error between predictions and expected" /> <meta property="og:url" content="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" /> <meta property="og:site_name" content="Machine Learning Mastery" /> <meta property="og:image" content="https://machinelearningmastery.com/wp-content/uploads/2018/11/Box-and-Whisker-Plots-of-Mean-Squared-Error-With-Unscaled-Normalized-and-Standardized-Input-Variables-for-the-Regression-Problem.png" /> <meta name="twitter:card" content="summary_large_image" /> <meta name="twitter:description" content="Deep learning neural networks learn how to map inputs to outputs from examples in a training dataset.

The weights of the model are initialized to small random values and updated via an optimization algorithm in response to estimates of error on the training dataset.

Given the use of small weights in the model and the use of error between predictions and expected" /> <meta name="twitter:title" content="How to use Data Scaling Improve Deep Learning Model Stability and Performance - Machine Learning Mastery" /> <meta property="twitter:image" content="https://machinelearningmastery.com/wp-content/uploads/2018/11/Box-and-Whisker-Plots-of-Mean-Squared-Error-With-Unscaled-Normalized-and-Standardized-Input-Variables-for-the-Regression-Problem.png" />

</style>

logo .site-title, #logo .site-description { display:none; }

body {background-repeat:no-repeat;background-position:top left;background-attachment:scroll;border-top:0px solid #000000;}

header {background-repeat:no-repeat;background-position:left top;margin-top:0px;margin-bottom:0px;padding-top:10px;padding-bottom:10px;border:0px solid ;}
logo .site-title a {font:bold 40px/1em "Helvetica Neue", Helvetica, sans-serif;color:#222222;}
logo .site-description {font:normal 13px/1em "Helvetica Neue", Helvetica, sans-serif;color:#999999;}

body, p { font:normal 14px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#555555; } h1 { font:bold 28px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; }h2 { font:bold 24px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; }h3 { font:bold 20px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; }h4 { font:bold 16px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; }h5 { font:bold 14px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; }h6 { font:bold 12px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#222222; } .page-title, .post .title, .page .title {font:bold 28px/1.1em "Helvetica Neue", Helvetica, sans-serif;color:#222222;} .post .title a:link, .post .title a:visited, .page .title a:link, .page .title a:visited {color:#222222} .post-meta { font:normal 12px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#999999; } .entry, .entry p{ font:normal 15px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#555555; } .post-more {font:normal 13px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:;border-top:0px solid #e6e6e6;border-bottom:0px solid #e6e6e6;}

post-author, #connect {border-top:1px solid #e6e6e6;border-bottom:1px solid #e6e6e6;border-left:1px solid #e6e6e6;border-right:1px solid #e6e6e6;border-radius:5px;-moz-border-radius:5px;-webkit-border-radius:5px;background-color:#fafafa}

.nav-entries a, .woo-pagination { font:normal 13px/1em "Helvetica Neue", Helvetica, sans-serif;color:#888; } .woo-pagination a, .woo-pagination a:hover {color:#888!important} .widget h3 {font:bold 14px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#555555;border-bottom:1px solid #e6e6e6;} .widget_recent_comments li, #twitter li { border-color: #e6e6e6;} .widget p, .widget .textwidget { font:normal 13px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#555555; } .widget {font:normal 13px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#555555;border-radius:0px;-moz-border-radius:0px;-webkit-border-radius:0px;}

tabs .inside li a, .widget_woodojo_tabs .tabbable .tab-pane li a { font:bold 12px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#555555; }
tabs .inside li span.meta, .widget_woodojo_tabs .tabbable .tab-pane li span.meta { font:300 11px/1.5em "Helvetica Neue", Helvetica, sans-serif;color:#999999; }
tabs ul.wooTabs li a, .widget_woodojo_tabs .tabbable .nav-tabs li a { font:300 11px/2em "Helvetica Neue", Helvetica, sans-serif;color:#999999; }

@media only screen and (min-width:768px) { ul.nav li a, #navigation ul.rss a, #navigation ul.cart a.cart-contents, #navigation .cart-contents #navigation ul.rss, #navigation ul.nav-search, #navigation ul.nav-search a { font:bold 15px/1.2em "Helvetica Neue", Helvetica, sans-serif;color:#ffffff; } #navigation ul.rss li a:before, #navigation ul.nav-search a.search-contents:before { color:#ffffff;}

navigation ul.nav > li a:hover, #navigation ul.nav > li:hover a, #navigation ul.nav li ul li a, #navigation ul.cart > li:hover > a, #navigation ul.cart > li > ul > div, #navigation ul.cart > li > ul > div p, #navigation ul.cart > li > ul span, #navigation ul.cart .cart_list a, #navigation ul.nav li.current_page_item a, #navigation ul.nav li.current_page_parent a, #navigation ul.nav li.current-menu-ancestor a, #navigation ul.nav li.current-cat a, #navigation ul.nav li.current-menu-item a { color:#eeeeee!important; }
navigation ul.nav > li a:hover, #navigation ul.nav > li:hover, #navigation ul.nav li ul, #navigation ul.cart li:hover a.cart-contents, #navigation ul.nav-search li:hover a.search-contents, #navigation ul.nav-search a.search-contents + ul, #navigation ul.cart a.cart-contents + ul, #navigation ul.nav li.current_page_item a, #navigation ul.nav li.current_page_parent a, #navigation ul.nav li.current-menu-ancestor a, #navigation ul.nav li.current-cat a, #navigation ul.nav li.current-menu-item a{background-color:#84abc7!important}
navigation ul.nav li ul, #navigation ul.cart > li > ul > div { border: 0px solid #dbdbdb; }
navigation ul.nav > li:hover > ul { left: 0; }
navigation ul.nav > li { border-right: 0px solid #dbdbdb; }#navigation ul.nav > li:hover > ul { left: 0; }
navigation { box-shadow: none; -moz-box-shadow: none; -webkit-box-shadow: none; }#navigation ul li:first-child, #navigation ul li:first-child a { border-radius:0px 0 0 0px; -moz-border-radius:0px 0 0 0px; -webkit-border-radius:0px 0 0 0px; }
navigation {background:#84abc7;border-top:0px solid #dbdbdb;border-bottom:0px solid #dbdbdb;border-left:0px solid #dbdbdb;border-right:0px solid #dbdbdb;border-radius:0px; -moz-border-radius:0px; -webkit-border-radius:0px;}
top ul.nav li a { font:normal 12px/1.6em "Helvetica Neue", Helvetica, sans-serif;color:#ddd; }

}

footer, #footer p { font:normal 13px/1.4em "Helvetica Neue", Helvetica, sans-serif;color:#999999; }
footer {border-top:1px solid #dbdbdb;border-bottom:0px solid ;border-left:0px solid ;border-right:0px solid ;border-radius:0px; -moz-border-radius:0px; -webkit-border-radius:0px;}

.magazine #loopedSlider .content h2.title a { font:bold 24px/1em Arial, sans-serif;color:#ffffff; } .wooslider-theme-magazine .slide-title a { font:bold 24px/1em Arial, sans-serif;color:#ffffff; } .magazine #loopedSlider .content .excerpt p { font:300 13px/1.5em Arial, sans-serif;color:#cccccc; } .wooslider-theme-magazine .slide-content p, .wooslider-theme-magazine .slide-excerpt p { font:300 13px/1.5em Arial, sans-serif;color:#cccccc; } .magazine .block .post .title a {font:bold 18px/1.2em Helvetica Neue, Helvetica, sans-serif;color:#222222; }

loopedSlider.business-slider .content h2 { font:bold 24px/1em Arial, sans-serif;color:#ffffff; }
loopedSlider.business-slider .content h2.title a { font:bold 24px/1em Arial, sans-serif;color:#ffffff; }

.wooslider-theme-business .has-featured-image .slide-title { font:bold 24px/1em Arial, sans-serif;color:#ffffff; } .wooslider-theme-business .has-featured-image .slide-title a { font:bold 24px/1em Arial, sans-serif;color:#ffffff; }

wrapper #loopedSlider.business-slider .content p { font:300 13px/1.5em Arial, sans-serif;color:#cccccc; }

.wooslider-theme-business .has-featured-image .slide-content p { font:300 13px/1.5em Arial, sans-serif;color:#cccccc; } .wooslider-theme-business .has-featured-image .slide-excerpt p { font:300 13px/1.5em Arial, sans-serif;color:#cccccc; } .archive_header { font:bold 18px/1em Arial, sans-serif;color:#222222; } .archive_header {border-bottom:1px solid #e6e6e6;} .archive_header .catrss { display:none; } </style>

logo img {

  max-width: 100%;
  height: auto;

} </style>

.display-posts-listing.image-left .image { float: left; margin: 0 10px 0 0; }

.display-posts-listing.image-left .attachment-thumbnail { height: auto; width: auto; max-width: 50px; max-height: 50px; border-radius: 50%; }

.display-posts-listing.image-left .title { display: block; }

.display-posts-listing.image-left .excerpt-dash { display: none; }

.display-posts-listing.image-left { margin: 0 0 40px 0; } </style> <noscript><style id="rocket-lazyload-nojs-css">.rll-youtube-player, [data-lazy-src]{display:none !important;}</style></noscript> <script type="f3886dae12b0536ad361ea93-text/javascript"> !function(f,b,e,v,n,t,s) {if(f.fbq)return;n=f.fbq=function(){n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}; if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0'; n.queue=[];t=b.createElement(e);t.async=!0; t.src=v;s=b.getElementsByTagName(e)[0]; s.parentNode.insertBefore(t,s)}(window, document,'script', 'https://machinelearningmastery.com/wp-content/cache/busting/facebook-tracking/fbpix-events-en_US-2.9.5.js'); fbq('init', '834324500844861'); fbq('track', 'PageView'); </script> <noscript><img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=834324500844861&ev=PageView&noscript=1" /></noscript> </head> <body class="post-template-default single single-post postid-6939 single-format-standard chrome alt-style-default two-col-left width-960 two-col-left-960">

 	 <a href="/super-bundle/?utm_campaign=Machine%20Learning%20Mastery%20Super%20Bundle&utm_source=website&utm_medium=banner">Click to get the 20-book Super Bundle! (Save $250)</a>

<a href="#navigation">Navigation</a>

<a href="https://machinelearningmastery.com/" title="Making developers awesome at machine learning"><img width="480" height="80" src="https://machinelearningmastery.com/wp-content/uploads/2019/09/Header_smaller_text_better-1.png" alt="Machine Learning Mastery" /></a> <a href="https://machinelearningmastery.com/">Machine Learning Mastery</a> Making developers awesome at machine learning

   <form method="get" class="searchform" action="https://machinelearningmastery.com/" >
       <input type="text" class="field s" name="s" value="Search..." onfocus="if (!window.__cfRLUnblockHandlers) return false; if (this.value == 'Search...') {this.value = ;}" onblur="if (!window.__cfRLUnblockHandlers) return false; if (this.value == ) {this.value = 'Search...';}" data-cf-modified-f3886dae12b0536ad361ea93-="" />

       <button type="submit" class="fa fa-search submit" name="submit" value="Search"></button>
   </form>

</header> <nav id="navigation" class="col-full" role="navigation">

Main Menu

</section>

<a href="#top" class="nav-close">Return to Content</a>

</nav>

                       <section id="main">

How to use Data Scaling Improve Deep Learning Model Stability and Performance

</header>

By <a href="https://machinelearningmastery.com/author/jasonb/" title="Posts by Jason Brownlee" rel="author">Jason Brownlee</a> on February 4, 2019 in <a href="https://machinelearningmastery.com/category/better-deep-learning/" title="View all items in Deep Learning Performance">Deep Learning Performance</a>

<button class="ssb_tweet-icon" data-href="https://twitter.com/share?text=How+to+use+Data+Scaling+Improve+Deep+Learning+Model+Stability+and+Performance&url=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" rel="nofollow" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 72 72"><path fill="none" d="M0 0h72v72H0z"/><path class="icon" fill="#fff" d="M68.812 15.14c-2.348 1.04-4.87 1.744-7.52 2.06 2.704-1.62 4.78-4.186 5.757-7.243-2.53 1.5-5.33 2.592-8.314 3.176C56.35 10.59 52.948 9 49.182 9c-7.23 0-13.092 5.86-13.092 13.093 0 1.026.118 2.02.338 2.98C25.543 24.527 15.9 19.318 9.44 11.396c-1.125 1.936-1.77 4.184-1.77 6.58 0 4.543 2.312 8.552 5.824 10.9-2.146-.07-4.165-.658-5.93-1.64-.002.056-.002.11-.002.163 0 6.345 4.513 11.638 10.504 12.84-1.1.298-2.256.457-3.45.457-.845 0-1.666-.078-2.464-.23 1.667 5.2 6.5 8.985 12.23 9.09-4.482 3.51-10.13 5.605-16.26 5.605-1.055 0-2.096-.06-3.122-.184 5.794 3.717 12.676 5.882 20.067 5.882 24.083 0 37.25-19.95 37.25-37.25 0-.565-.013-1.133-.038-1.693 2.558-1.847 4.778-4.15 6.532-6.774z"/></svg>Tweet </button> <button class="ssb_fbshare-icon" target="_blank" data-href="https://www.facebook.com/sharer/sharer.php?u=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" class="_1pbq" color="#ffffff"><path fill="#ffffff" fill-rule="evenodd" class="icon" d="M8 14H3.667C2.733 13.9 2 13.167 2 12.233V3.667A1.65 1.65 0 0 1 3.667 2h8.666A1.65 1.65 0 0 1 14 3.667v8.566c0 .934-.733 1.667-1.667 1.767H10v-3.967h1.3l.7-2.066h-2V6.933c0-.466.167-.9.867-.9H12v-1.8c.033 0-.933-.266-1.533-.266-1.267 0-2.434.7-2.467 2.133v1.867H6v2.066h2V14z"></path></svg> Share </button> <button class="ssb_linkedin-icon" data-href="https://www.linkedin.com/cws/share?url=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="15px" height="14.1px" viewBox="-301.4 387.5 15 14.1" enable-background="new -301.4 387.5 15 14.1" xml:space="preserve"> <g id="XMLID_398_"> <path id="XMLID_399_" fill="#FFFFFF" d="M-296.2,401.6c0-3.2,0-6.3,0-9.5h0.1c1,0,2,0,2.9,0c0.1,0,0.1,0,0.1,0.1c0,0.4,0,0.8,0,1.2 c0.1-0.1,0.2-0.3,0.3-0.4c0.5-0.7,1.2-1,2.1-1.1c0.8-0.1,1.5,0,2.2,0.3c0.7,0.4,1.2,0.8,1.5,1.4c0.4,0.8,0.6,1.7,0.6,2.5 c0,1.8,0,3.6,0,5.4v0.1c-1.1,0-2.1,0-3.2,0c0-0.1,0-0.1,0-0.2c0-1.6,0-3.2,0-4.8c0-0.4,0-0.8-0.2-1.2c-0.2-0.7-0.8-1-1.6-1 c-0.8,0.1-1.3,0.5-1.6,1.2c-0.1,0.2-0.1,0.5-0.1,0.8c0,1.7,0,3.4,0,5.1c0,0.2,0,0.2-0.2,0.2c-1,0-1.9,0-2.9,0 C-296.1,401.6-296.2,401.6-296.2,401.6z"/> <path id="XMLID_400_" fill="#FFFFFF" d="M-298,401.6L-298,401.6c-1.1,0-2.1,0-3,0c-0.1,0-0.1,0-0.1-0.1c0-3.1,0-6.1,0-9.2 c0-0.1,0-0.1,0.1-0.1c1,0,2,0,2.9,0h0.1C-298,395.3-298,398.5-298,401.6z"/> <path id="XMLID_401_" fill="#FFFFFF" d="M-299.6,390.9c-0.7-0.1-1.2-0.3-1.6-0.8c-0.5-0.8-0.2-2.1,1-2.4c0.6-0.2,1.2-0.1,1.8,0.2 c0.5,0.4,0.7,0.9,0.6,1.5c-0.1,0.7-0.5,1.1-1.1,1.3C-299.1,390.8-299.4,390.8-299.6,390.9L-299.6,390.9z"/> </g> </svg> Share </button>

Last Updated on August 25, 2020

Deep learning neural networks learn how to map inputs to outputs from examples in a training dataset.

The weights of the model are initialized to small random values and updated via an optimization algorithm in response to estimates of error on the training dataset.

Given the use of small weights in the model and the use of error between predictions and expected values, the scale of inputs and outputs used to train the model are an important factor. Unscaled input variables can result in a slow or unstable learning process, whereas unscaled target variables on regression problems can result in exploding gradients causing the learning process to fail.

Data preparation involves using techniques such as the normalization and standardization to rescale input and output variables prior to training a neural network model.

In this tutorial, you will discover how to improve neural network stability and modeling performance by scaling data.

After completing this tutorial, you will know:

Data scaling is a recommended pre-processing step when working with deep learning neural networks.
Data scaling can be achieved by normalizing or standardizing real-valued input and output variables.
How to apply standardization and normalization to improve the performance of a Multilayer Perceptron model on a regression predictive modeling problem.

Kick-start your project with my new book <a href="https://machinelearningmastery.com/better-deep-learning/">Better Deep Learning</a>, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Improve Neural Network Stability and Modeling Performance With Data Scaling
Photo by <a href="https://www.flickr.com/photos/javiersanp/14202569306/">Javier Sanchez Portero</a>, some rights reserved.

Tutorial Overview

This tutorial is divided into six parts; they are:

The Scale of Your Data Matters
Data Scaling Methods
Regression Predictive Modeling Problem
Multilayer Perceptron With Unscaled Data
Multilayer Perceptron With Scaled Output Variables
Multilayer Perceptron With Scaled Input Variables

The Scale of Your Data Matters

Deep learning neural network models learn a mapping from input variables to an output variable.

As such, the scale and distribution of the data drawn from the domain may be different for each variable.

Input variables may have different units (e.g. feet, kilometers, and hours) that, in turn, may mean the variables have different scales.

Differences in the scales across input variables may increase the difficulty of the problem being modeled. An example of this is that large input values (e.g. a spread of hundreds or thousands of units) can result in a model that learns large weight values. A model with large weight values is often unstable, meaning that it may suffer from poor performance during learning and sensitivity to input values resulting in higher generalization error.

One of the most common forms of pre-processing consists of a simple linear rescaling of the input variables.

— Page 298, <a href="https://amzn.to/2S8qdwt">Neural Networks for Pattern Recognition</a>, 1995.

A target variable with a large spread of values, in turn, may result in large error gradient values causing weight values to change dramatically, making the learning process unstable.

Scaling input and output variables is a critical step in using neural network models.

In practice it is nearly always advantageous to apply pre-processing transformations to the input data before it is presented to a network. Similarly, the outputs of the network are often post-processed to give the required output values.

— Page 296, <a href="https://amzn.to/2S8qdwt">Neural Networks for Pattern Recognition</a>, 1995.

Scaling Input Variables

The input variables are those that the network takes on the input or visible layer in order to make a prediction.

A good rule of thumb is that input variables should be small values, probably in the range of 0-1 or standardized with a zero mean and a standard deviation of one.

Whether input variables require scaling depends on the specifics of your problem and of each variable.

You may have a sequence of quantities as inputs, such as prices or temperatures.

If the distribution of the quantity is normal, then it should be standardized, otherwise the data should be normalized. This applies if the range of quantity values is large (10s, 100s, etc.) or small (0.01, 0.0001).

If the quantity values are small (near 0-1) and the distribution is limited (e.g. standard deviation near 1) then perhaps you can get away with no scaling of the data.

Problems can be complex and it may not be clear how to best scale input data.

If in doubt, normalize the input sequence. If you have the resources, explore modeling with the raw data, standardized data, and normalized data and see if there is a beneficial difference in the performance of the resulting model.

If the input variables are combined linearly, as in an MLP [Multilayer Perceptron], then it is rarely strictly necessary to standardize the inputs, at least in theory. […] However, there are a variety of practical reasons why standardizing the inputs can make training faster and reduce the chances of getting stuck in local optima.

— <a href="ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std">Should I normalize/standardize/rescale the data? Neural Nets FAQ</a>

Scaling Output Variables

The output variable is the variable predicted by the network.

You must ensure that the scale of your output variable matches the scale of the activation function (transfer function) on the output layer of your network.

If your output activation function has a range of [0,1], then obviously you must ensure that the target values lie within that range. But it is generally better to choose an output activation function suited to the distribution of the targets than to force your data to conform to the output activation function.

— <a href="ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std">Should I normalize/standardize/rescale the data? Neural Nets FAQ</a>

If your problem is a regression problem, then the output will be a real value.

This is best modeled with a linear activation function. If the distribution of the value is normal, then you can standardize the output variable. Otherwise, the output variable can be normalized.

Want Better Results with Deep Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

<a href="https://machinelearningmastery.lpages.co/leadbox/1433e7773f72a2%3A164f8be4f346dc/5764144745676800/" target="_blank" style="background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;" rel="noopener noreferrer">Download Your FREE Mini-Course</a>

Data Scaling Methods

There are two types of scaling of your data that you may want to consider: normalization and standardization.

These can both be achieved using the scikit-learn library.

Data Normalization

Normalization is a rescaling of the data from the original range so that all values are within the range of 0 and 1.

Normalization requires that you know or are able to accurately estimate the minimum and maximum observable values. You may be able to estimate these values from your available data.

A value is normalized as follows:

1	y = (x - min) / (max - min)

Where the minimum and maximum values pertain to the value x being normalized.

For example, for a dataset, we could guesstimate the min and max observable values as 30 and -10. We can then normalize any value, like 18.8, as follows:

y = (x - min) / (max - min) y = (18.8 - (-10)) / (30 - (-10)) y = 28.8 / 40

y = 0.72</textarea>

1

2

3

4

y = (x - min) / (max - min)

y = (18.8 - (-10)) / (30 - (-10))

y = 28.8 / 40

y = 0.72

You can see that if an x value is provided that is outside the bounds of the minimum and maximum values, the resulting value will not be in the range of 0 and 1. You could check for these observations prior to making predictions and either remove them from the dataset or limit them to the pre-defined maximum or minimum values.

You can normalize your dataset using the scikit-learn object <a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html">MinMaxScaler</a>.

Good practice usage with the MinMaxScaler and other scaling techniques is as follows:

Fit the scaler using available training data. For normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit() function.
Apply the scale to training data. This means you can use the normalized data to train your model. This is done by calling the transform() function.
Apply the scale to data going forward. This means you can prepare new data in the future on which you want to make predictions.

The default scale for the MinMaxScaler is to rescale variables into the range [0,1], although a preferred scale can be specified via the “feature_range” argument and specify a tuple including the min and the max for all variables.

create scaler

scaler = MinMaxScaler(feature_range=(-1,1))</textarea>

1 2	# create scaler scaler = MinMaxScaler(feature_range=(-1,1))

If needed, the transform can be inverted. This is useful for converting predictions back into their original scale for reporting or plotting. This can be done by calling the inverse_transform() function.

The example below provides a general demonstration for using the MinMaxScaler to normalize data.

demonstrate data normalization with sklearn

from sklearn.preprocessing import MinMaxScaler

load data

data = ...

create scaler

scaler = MinMaxScaler()

fit scaler on data

scaler.fit(data)

apply transform

normalized = scaler.transform(data)

inverse transform

inverse = scaler.inverse_transform(normalized)</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

# demonstrate data normalization with sklearn

from sklearn.preprocessing import MinMaxScaler

# load data

data = ...

# create scaler

scaler = MinMaxScaler()

# fit scaler on data

scaler.fit(data)

# apply transform

normalized = scaler.transform(data)

# inverse transform

inverse = scaler.inverse_transform(normalized)

You can also perform the fit and transform in a single step using the fit_transform() function; for example:

demonstrate data normalization with sklearn

from sklearn.preprocessing import MinMaxScaler

load data

data = ...

create scaler

scaler = MinMaxScaler()

fit and transform in one step

normalized = scaler.fit_transform(data)

inverse transform

inverse = scaler.inverse_transform(normalized)</textarea>

1

2

3

4

5

6

7

8

9

10

# demonstrate data normalization with sklearn

from sklearn.preprocessing import MinMaxScaler

# load data

data = ...

# create scaler

scaler = MinMaxScaler()

# fit and transform in one step

normalized = scaler.fit_transform(data)

# inverse transform

inverse = scaler.inverse_transform(normalized)

Data Standardization

Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1. It is sometimes referred to as “whitening.”

This can be thought of as subtracting the mean value or centering the data.

Like normalization, standardization can be useful, and even required in some machine learning algorithms when your data has input values with differing scales.

Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well behaved mean and standard deviation. You can still standardize your data if this expectation is not met, but you may not get reliable results.

Standardization requires that you know or are able to accurately estimate the mean and standard deviation of observable values. You may be able to estimate these values from your training data.

A value is standardized as follows:

<textarea wrap="soft" class="urvanov-syntax-highlighter-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;"> y = (x - mean) / standard_deviation</textarea>

1	y = (x - mean) / standard_deviation

Where the mean is calculated as:

<textarea wrap="soft" class="urvanov-syntax-highlighter-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;"> mean = sum(x) / count(x)</textarea>

1	mean = sum(x) / count(x)

And the standard_deviation is calculated as:

<textarea wrap="soft" class="urvanov-syntax-highlighter-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;"> standard_deviation = sqrt( sum( (x - mean)^2 ) / count(x))</textarea>

1	standard_deviation = sqrt( sum( (x - mean)^2 ) / count(x))

We can guesstimate a mean of 10 and a standard deviation of about 5. Using these values, we can standardize the first value of 20.7 as follows:

y = (x - mean) / standard_deviation y = (20.7 - 10) / 5 y = (10.7) / 5

y = 2.14</textarea>

1

2

3

4

y = (x - mean) / standard_deviation

y = (20.7 - 10) / 5

y = (10.7) / 5

y = 2.14

The mean and standard deviation estimates of a dataset can be more robust to new data than the minimum and maximum.

You can standardize your dataset using the scikit-learn object <a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html">StandardScaler</a>.

demonstrate data standardization with sklearn

from sklearn.preprocessing import StandardScaler

load data

data = ...

create scaler

scaler = StandardScaler()

fit scaler on data

scaler.fit(data)

apply transform

standardized = scaler.transform(data)

inverse transform

inverse = scaler.inverse_transform(standardized)</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

# demonstrate data standardization with sklearn

from sklearn.preprocessing import StandardScaler

# load data

data = ...

# create scaler

scaler = StandardScaler()

# fit scaler on data

scaler.fit(data)

# apply transform

standardized = scaler.transform(data)

# inverse transform

inverse = scaler.inverse_transform(standardized)

You can also perform the fit and transform in a single step using the fit_transform() function; for example:

demonstrate data standardization with sklearn

from sklearn.preprocessing import StandardScaler

load data

data = ...

create scaler

scaler = StandardScaler()

fit and transform in one step

standardized = scaler.fit_transform(data)

inverse transform

inverse = scaler.inverse_transform(standardized)</textarea>

1

2

3

4

5

6

7

8

9

10

# demonstrate data standardization with sklearn

from sklearn.preprocessing import StandardScaler

# load data

data = ...

# create scaler

scaler = StandardScaler()

# fit and transform in one step

standardized = scaler.fit_transform(data)

# inverse transform

inverse = scaler.inverse_transform(standardized)

Regression Predictive Modeling Problem

A regression <a href="https://machinelearningmastery.com/gentle-introduction-to-predictive-modeling/">predictive modeling</a> problem involves predicting a real-valued quantity.

We can use a standard regression problem generator provided by the scikit-learn library in the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html">make_regression() function</a>. This function will generate examples from a simple regression problem with a given number of input variables, statistical noise, and other properties.

We will use this function to define a problem that has 20 input features; 10 of the features will be meaningful and 10 will not be relevant. A total of 1,000 examples will be randomly generated. The <a href="https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/">pseudorandom number generator</a> will be fixed to ensure that we get the same 1,000 examples each time the code is run.

generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)</textarea>

1 2	# generate regression dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

Each input variable has a Gaussian distribution, as does the target variable.

We can demonstrate this by creating histograms of some of the input variables and the output variable.

regression predictive modeling problem

from sklearn.datasets import make_regression from matplotlib import pyplot

generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

histograms of input variables

pyplot.subplot(211) pyplot.hist(X[:, 0]) pyplot.subplot(212) pyplot.hist(X[:, 1]) pyplot.show()

histogram of target variable

pyplot.hist(y)

pyplot.show()</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

# regression predictive modeling problem

from sklearn.datasets import make_regression

from matplotlib import pyplot

# generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# histograms of input variables

pyplot.subplot(211)

pyplot.hist(X[:, 0])

pyplot.subplot(212)

pyplot.hist(X[:, 1])

pyplot.show()

# histogram of target variable

pyplot.hist(y)

pyplot.show()

Running the example creates two figures.

The first shows histograms of the first two of the twenty input variables, showing that each has a Gaussian data distribution.

Histograms of Two of the Twenty Input Variables for the Regression Problem

The second figure shows a histogram of the target variable, showing a much larger range for the variable as compared to the input variables and, again, a Gaussian data distribution.

Histogram of the Target Variable for the Regression Problem

Now that we have a regression problem that we can use as the basis for the investigation, we can develop a model to address it.

Multilayer Perceptron With Unscaled Data

We can develop a Multilayer Perceptron (MLP) model for the regression problem.

A model will be demonstrated on the raw data, without any scaling of the input or output variables. We expect that model performance will be generally poor.

The first step is to split the data into train and test sets so that we can fit and evaluate a model. We will generate 1,000 examples from the domain and split the dataset in half, using 500 examples for the train and test datasets.

split into train and test

n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]</textarea>

1

2

3

4

# split into train and test

n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]

Next, we can define an MLP model. The model will expect 20 inputs in the 20 input variables in the problem.

A single hidden layer will be used with 25 nodes and a rectified linear activation function. The output layer has one node for the single target variable and a linear activation function to predict real values directly.

define model

model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))</textarea>

1

2

3

4

# define model

model = Sequential()

model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))

The mean squared error loss function will be used to optimize the model and the stochastic gradient descent optimization algorithm will be used with the sensible default configuration of a learning rate of 0.01 and a momentum of 0.9.

compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))</textarea>

1 2	# compile model model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

The model will be fit for 100 training epochs and the test set will be used as a validation set, evaluated at the end of each training epoch.

The mean squared error is calculated on the train and test datasets at the end of training to get an idea of how well the model learned the problem.

evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0)

test_mse = model.evaluate(testX, testy, verbose=0)</textarea>

1

2

3

# evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0)

test_mse = model.evaluate(testX, testy, verbose=0)

Finally, learning curves of mean squared error on the train and test sets at the end of each training epoch are graphed using line plots, providing learning curves to get an idea of the dynamics of the model while learning the problem.

plot loss during training

pyplot.title('Mean Squared Error') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend()

pyplot.show()</textarea>

1

2

3

4

5

6

# plot loss during training

pyplot.title('Mean Squared Error')

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

Tying these elements together, the complete example is listed below.

mlp with unscaled data for the regression problem

from sklearn.datasets import make_regression from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from matplotlib import pyplot

generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

split into train and test

n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:]

define model

model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear'))

compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

fit model

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0) test_mse = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

plot loss during training

pyplot.title('Mean Squared Error') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend()

pyplot.show()</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

# mlp with unscaled data for the regression problem

from sklearn.datasets import make_regression

from keras.layers import Dense

from keras.models import Sequential

from keras.optimizers import SGD

from matplotlib import pyplot

# generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# split into train and test

n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]

# define model

model = Sequential()

model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))

# compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

# fit model

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

# evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0)

test_mse = model.evaluate(testX, testy, verbose=0)

print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

# plot loss during training

pyplot.title('Mean Squared Error')

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

Running the example fits the model and calculates the mean squared error on the train and test sets.

In this case, the model is unable to learn the problem, resulting in predictions of NaN values. The <a href="https://machinelearningmastery.com/exploding-gradients-in-neural-networks/">model weights exploded</a> during training given the very large errors and, in turn, error gradients calculated for weight updates.

<textarea wrap="soft" class="urvanov-syntax-highlighter-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;"> Train: nan, Test: nan</textarea>

1	Train: nan, Test: nan

This demonstrates that, at the very least, some data scaling is required for the target variable.

A line plot of training history is created but does not show anything as the model almost immediately results in a NaN mean squared error.

Multilayer Perceptron With Scaled Output Variables

The MLP model can be updated to scale the target variable.

Reducing the scale of the target variable will, in turn, reduce the size of the gradient used to update the weights and result in a more stable model and training process.

Given the Gaussian distribution of the target variable, a natural method for rescaling the variable would be to standardize the variable. This requires estimating the mean and standard deviation of the variable and using these estimates to perform the rescaling.

It is best practice is to estimate the mean and standard deviation of the training dataset and use these variables to scale the train and test dataset. This is to avoid any data leakage during the model evaluation process.

The scikit-learn transformers expect input data to be matrices of rows and columns, therefore the 1D arrays for the target variable will have to be reshaped into 2D arrays prior to the transforms.

reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1)

testy = testy.reshape(len(trainy), 1)</textarea>

1

2

3

# reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1)

testy = testy.reshape(len(trainy), 1)

We can then create and apply the StandardScaler to rescale the target variable.

created scaler

scaler = StandardScaler()

fit scaler on training dataset

scaler.fit(trainy)

transform training dataset

trainy = scaler.transform(trainy)

transform test dataset

testy = scaler.transform(testy)</textarea>

1

2

3

4

5

6

7

8

# created scaler

scaler = StandardScaler()

# fit scaler on training dataset

scaler.fit(trainy)

# transform training dataset

trainy = scaler.transform(trainy)

# transform test dataset

testy = scaler.transform(testy)

Rescaling the target variable means that estimating the performance of the model and plotting the learning curves will calculate an MSE in squared units of the scaled variable rather than squared units of the original scale. This can make interpreting the error within the context of the domain challenging.

In practice, it may be helpful to estimate the performance of the model by first inverting the transform on the test dataset target variable and on the model predictions and estimating model performance using the root mean squared error on the unscaled data. This is left as an exercise to the reader.

The complete example of standardizing the target variable for the MLP on the regression problem is listed below.

mlp with scaled outputs on the regression problem

from sklearn.datasets import make_regression from sklearn.preprocessing import StandardScaler from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from matplotlib import pyplot

generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

split into train and test

n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:]

reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1) testy = testy.reshape(len(trainy), 1)

created scaler

scaler = StandardScaler()

fit scaler on training dataset

scaler.fit(trainy)

transform training dataset

trainy = scaler.transform(trainy)

transform test dataset

testy = scaler.transform(testy)

define model

model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear'))

compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

fit model

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0) test_mse = model.evaluate(testX, testy, verbose=0) print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

plot loss during training

pyplot.title('Loss / Mean Squared Error') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='test') pyplot.legend()

pyplot.show()</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

# mlp with scaled outputs on the regression problem

from sklearn.datasets import make_regression

from sklearn.preprocessing import StandardScaler

from keras.layers import Dense

from keras.models import Sequential

from keras.optimizers import SGD

from matplotlib import pyplot

# generate regression dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# split into train and test

n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]

# reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1)

testy = testy.reshape(len(trainy), 1)

# created scaler

scaler = StandardScaler()

# fit scaler on training dataset

scaler.fit(trainy)

# transform training dataset

trainy = scaler.transform(trainy)

# transform test dataset

testy = scaler.transform(testy)

# define model

model = Sequential()

model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))

# compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

# fit model

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

# evaluate the model

train_mse = model.evaluate(trainX, trainy, verbose=0)

test_mse = model.evaluate(testX, testy, verbose=0)

print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

# plot loss during training

pyplot.title('Loss / Mean Squared Error')

pyplot.plot(history.history['loss'], label='train')

pyplot.plot(history.history['val_loss'], label='test')

pyplot.legend()

pyplot.show()

Running the example fits the model and calculates the mean squared error on the train and test sets.

Note: Your <a href="https://machinelearningmastery.com/different-results-each-time-in-machine-learning/">results may vary</a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, the model does appear to learn the problem and achieves near-zero mean squared error, at least to three decimal places.

<textarea wrap="soft" class="urvanov-syntax-highlighter-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;"> Train: 0.003, Test: 0.007</textarea>

1	Train: 0.003, Test: 0.007

A line plot of the mean squared error on the train (blue) and test (orange) dataset over each training epoch is created.

In this case, we can see that the model rapidly learns to effectively map inputs to outputs for the regression problem and achieves good performance on both datasets over the course of the run, neither overfitting or underfitting the training dataset.

Line Plot of Mean Squared Error on the Train a Test Datasets for Each Training Epoch

It may be interesting to repeat this experiment and normalize the target variable instead and compare results.

Multilayer Perceptron With Scaled Input Variables

We have seen that data scaling can stabilize the training process when fitting a model for regression with a target variable that has a wide spread.

It is also possible to improve the stability and performance of the model by scaling the input variables.

In this section, we will design an experiment to compare the performance of different scaling methods for the input variables.

The input variables also have a Gaussian data distribution, like the target variable, therefore we would expect that standardizing the data would be the best approach. This is not always the case.

We can compare the performance of the unscaled input variables to models fit with either standardized and normalized input variables.

The first step is to define a function to create the same 1,000 data samples, split them into train and test sets, and apply the data scaling methods specified via input arguments. The get_dataset() function below implements this, requiring the scaler to be provided for the input and target variables and returns the train and test datasets split into input and output components ready to train and evaluate a model.

prepare dataset with input and output scalers, can be none

def get_dataset(input_scaler, output_scaler): # generate dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # scale inputs if input_scaler is not None: # fit scaler input_scaler.fit(trainX) # transform training dataset trainX = input_scaler.transform(trainX) # transform test dataset testX = input_scaler.transform(testX) if output_scaler is not None: # reshape 1d arrays to 2d arrays trainy = trainy.reshape(len(trainy), 1) testy = testy.reshape(len(trainy), 1) # fit scaler on training dataset output_scaler.fit(trainy) # transform training dataset trainy = output_scaler.transform(trainy) # transform test dataset testy = output_scaler.transform(testy)

return trainX, trainy, testX, testy</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

# prepare dataset with input and output scalers, can be none

def get_dataset(input_scaler, output_scaler):

# generate dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# split into train and test

n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]

# scale inputs

if input_scaler is not None:

# fit scaler

input_scaler.fit(trainX)

# transform training dataset

trainX = input_scaler.transform(trainX)

# transform test dataset

testX = input_scaler.transform(testX)

if output_scaler is not None:

# reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1)

testy = testy.reshape(len(trainy), 1)

# fit scaler on training dataset

output_scaler.fit(trainy)

# transform training dataset

trainy = output_scaler.transform(trainy)

# transform test dataset

testy = output_scaler.transform(testy)

return trainX, trainy, testX, testy

Next, we can define a function to fit an MLP model on a given dataset and return the mean squared error for the fit model on the test dataset.

The evaluate_model() function below implements this behavior.

fit and evaluate mse of model on test set

def evaluate_model(trainX, trainy, testX, testy): # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear')) # compile model model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9)) # fit model model.fit(trainX, trainy, epochs=100, verbose=0) # evaluate the model test_mse = model.evaluate(testX, testy, verbose=0)

return test_mse</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

# fit and evaluate mse of model on test set

def evaluate_model(trainX, trainy, testX, testy):

# define model

model = Sequential()

model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))

# compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

# fit model

model.fit(trainX, trainy, epochs=100, verbose=0)

# evaluate the model

test_mse = model.evaluate(testX, testy, verbose=0)

return test_mse

Neural networks are trained using a stochastic learning algorithm. This means that the same model fit on the same data may result in a different performance.

We can address this in our experiment by repeating the evaluation of each model configuration, in this case a choice of data scaling, multiple times and report performance as the mean of the error scores across all of the runs. We will repeat each run 30 times to ensure the mean is statistically robust.

The repeated_evaluation() function below implements this, taking the scaler for input and output variables as arguments, evaluating a model 30 times with those scalers, printing error scores along the way, and returning a list of the calculated error scores from each run.

evaluate model multiple times with given input and output scalers

def repeated_evaluation(input_scaler, output_scaler, n_repeats=30): # get dataset trainX, trainy, testX, testy = get_dataset(input_scaler, output_scaler) # repeated evaluation of model results = list() for _ in range(n_repeats): test_mse = evaluate_model(trainX, trainy, testX, testy) print('>%.3f' % test_mse) results.append(test_mse)

return results</textarea>

1

2

3

4

5

6

7

8

9

10

11

# evaluate model multiple times with given input and output scalers

def repeated_evaluation(input_scaler, output_scaler, n_repeats=30):

# get dataset

trainX, trainy, testX, testy = get_dataset(input_scaler, output_scaler)

# repeated evaluation of model

results = list()

for _ in range(n_repeats):

test_mse = evaluate_model(trainX, trainy, testX, testy)

print('>%.3f' % test_mse)

results.append(test_mse)

return results

Finally, we can run the experiment and evaluate the same model on the same dataset three different ways:

No scaling of inputs, standardized outputs.
Normalized inputs, standardized outputs.
Standardized inputs, standardized outputs.

The mean and standard deviation of the error for each configuration is reported, then box and whisker plots are created to summarize the error scores for each configuration.

unscaled inputs

results_unscaled_inputs = repeated_evaluation(None, StandardScaler())

normalized inputs

results_normalized_inputs = repeated_evaluation(MinMaxScaler(), StandardScaler())

standardized inputs

results_standardized_inputs = repeated_evaluation(StandardScaler(), StandardScaler())

summarize results

print('Unscaled: %.3f (%.3f)' % (mean(results_unscaled_inputs), std(results_unscaled_inputs))) print('Normalized: %.3f (%.3f)' % (mean(results_normalized_inputs), std(results_normalized_inputs))) print('Standardized: %.3f (%.3f)' % (mean(results_standardized_inputs), std(results_standardized_inputs)))

plot results

results = [results_unscaled_inputs, results_normalized_inputs, results_standardized_inputs] labels = ['unscaled', 'normalized', 'standardized'] pyplot.boxplot(results, labels=labels)

pyplot.show()</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

# unscaled inputs

results_unscaled_inputs = repeated_evaluation(None, StandardScaler())

# normalized inputs

results_normalized_inputs = repeated_evaluation(MinMaxScaler(), StandardScaler())

# standardized inputs

results_standardized_inputs = repeated_evaluation(StandardScaler(), StandardScaler())

# summarize results

print('Unscaled: %.3f (%.3f)' % (mean(results_unscaled_inputs), std(results_unscaled_inputs)))

print('Normalized: %.3f (%.3f)' % (mean(results_normalized_inputs), std(results_normalized_inputs)))

print('Standardized: %.3f (%.3f)' % (mean(results_standardized_inputs), std(results_standardized_inputs)))

# plot results

results = [results_unscaled_inputs, results_normalized_inputs, results_standardized_inputs]

labels = ['unscaled', 'normalized', 'standardized']

pyplot.boxplot(results, labels=labels)

pyplot.show()

Tying these elements together, the complete example is listed below.

compare scaling methods for mlp inputs on regression problem

from sklearn.datasets import make_regression from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from keras.layers import Dense from keras.models import Sequential from keras.optimizers import SGD from matplotlib import pyplot from numpy import mean from numpy import std

prepare dataset with input and output scalers, can be none

def get_dataset(input_scaler, output_scaler): # generate dataset X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1) # split into train and test n_train = 500 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] # scale inputs if input_scaler is not None: # fit scaler input_scaler.fit(trainX) # transform training dataset trainX = input_scaler.transform(trainX) # transform test dataset testX = input_scaler.transform(testX) if output_scaler is not None: # reshape 1d arrays to 2d arrays trainy = trainy.reshape(len(trainy), 1) testy = testy.reshape(len(trainy), 1) # fit scaler on training dataset output_scaler.fit(trainy) # transform training dataset trainy = output_scaler.transform(trainy) # transform test dataset testy = output_scaler.transform(testy) return trainX, trainy, testX, testy

fit and evaluate mse of model on test set

def evaluate_model(trainX, trainy, testX, testy): # define model model = Sequential() model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='linear')) # compile model model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9)) # fit model model.fit(trainX, trainy, epochs=100, verbose=0) # evaluate the model test_mse = model.evaluate(testX, testy, verbose=0) return test_mse

evaluate model multiple times with given input and output scalers

def repeated_evaluation(input_scaler, output_scaler, n_repeats=30): # get dataset trainX, trainy, testX, testy = get_dataset(input_scaler, output_scaler) # repeated evaluation of model results = list() for _ in range(n_repeats): test_mse = evaluate_model(trainX, trainy, testX, testy) print('>%.3f' % test_mse) results.append(test_mse) return results

unscaled inputs

results_unscaled_inputs = repeated_evaluation(None, StandardScaler())

normalized inputs

results_normalized_inputs = repeated_evaluation(MinMaxScaler(), StandardScaler())

standardized inputs

results_standardized_inputs = repeated_evaluation(StandardScaler(), StandardScaler())

summarize results

print('Unscaled: %.3f (%.3f)' % (mean(results_unscaled_inputs), std(results_unscaled_inputs))) print('Normalized: %.3f (%.3f)' % (mean(results_normalized_inputs), std(results_normalized_inputs))) print('Standardized: %.3f (%.3f)' % (mean(results_standardized_inputs), std(results_standardized_inputs)))

plot results

results = [results_unscaled_inputs, results_normalized_inputs, results_standardized_inputs] labels = ['unscaled', 'normalized', 'standardized'] pyplot.boxplot(results, labels=labels)

pyplot.show()</textarea>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

# compare scaling methods for mlp inputs on regression problem

from sklearn.datasets import make_regression

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import MinMaxScaler

from keras.layers import Dense

from keras.models import Sequential

from keras.optimizers import SGD

from matplotlib import pyplot

from numpy import mean

from numpy import std

# prepare dataset with input and output scalers, can be none

def get_dataset(input_scaler, output_scaler):

# generate dataset

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# split into train and test

n_train = 500

trainX, testX = X[:n_train, :], X[n_train:, :]

trainy, testy = y[:n_train], y[n_train:]

# scale inputs

if input_scaler is not None:

# fit scaler

input_scaler.fit(trainX)

# transform training dataset

trainX = input_scaler.transform(trainX)

# transform test dataset

testX = input_scaler.transform(testX)

if output_scaler is not None:

# reshape 1d arrays to 2d arrays

trainy = trainy.reshape(len(trainy), 1)

testy = testy.reshape(len(trainy), 1)

# fit scaler on training dataset

output_scaler.fit(trainy)

# transform training dataset

trainy = output_scaler.transform(trainy)

# transform test dataset

testy = output_scaler.transform(testy)

return trainX, trainy, testX, testy

# fit and evaluate mse of model on test set

def evaluate_model(trainX, trainy, testX, testy):

# define model

model = Sequential()

model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))

model.add(Dense(1, activation='linear'))

# compile model

model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01, momentum=0.9))

# fit model

model.fit(trainX, trainy, epochs=100, verbose=0)

# evaluate the model

test_mse = model.evaluate(testX, testy, verbose=0)

return test_mse

# evaluate model multiple times with given input and output scalers

def repeated_evaluation(input_scaler, output_scaler, n_repeats=30):

# get dataset

trainX, trainy, testX, testy = get_dataset(input_scaler, output_scaler)

# repeated evaluation of model

results = list()

for _ in range(n_repeats):

test_mse = evaluate_model(trainX, trainy, testX, testy)

print('>%.3f' % test_mse)

results.append(test_mse)

return results

# unscaled inputs

results_unscaled_inputs = repeated_evaluation(None, StandardScaler())

# normalized inputs

results_normalized_inputs = repeated_evaluation(MinMaxScaler(), StandardScaler())

# standardized inputs

results_standardized_inputs = repeated_evaluation(StandardScaler(), StandardScaler())

# summarize results

print('Unscaled: %.3f (%.3f)' % (mean(results_unscaled_inputs), std(results_unscaled_inputs)))

print('Normalized: %.3f (%.3f)' % (mean(results_normalized_inputs), std(results_normalized_inputs)))

print('Standardized: %.3f (%.3f)' % (mean(results_standardized_inputs), std(results_standardized_inputs)))

# plot results

results = [results_unscaled_inputs, results_normalized_inputs, results_standardized_inputs]

labels = ['unscaled', 'normalized', 'standardized']

pyplot.boxplot(results, labels=labels)

pyplot.show()

Running the example prints the mean squared error for each model run along the way.

After each of the three configurations have been evaluated 30 times each, the mean errors for each are reported.

Note: Your <a href="https://machinelearningmastery.com/different-results-each-time-in-machine-learning/">results may vary</a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that as we expected, scaling the input variables does result in a model with better performance. Unexpectedly, better performance is seen using normalized inputs instead of standardized inputs. This may be related to the choice of the rectified linear activation function in the first hidden layer.

... >0.010 >0.012 >0.005 >0.008 >0.008 Unscaled: 0.007 (0.004) Normalized: 0.001 (0.000)

Standardized: 0.008 (0.004)</textarea>

1

2

3

4

5

6

7

8

9

...

>0.010

>0.012

>0.005

>0.008

Unscaled: 0.007 (0.004)

Normalized: 0.001 (0.000)

Standardized: 0.008 (0.004)

A figure with three box and whisker plots is created summarizing the spread of error scores for each configuration.

The plots show that there was little difference between the distributions of error scores for the unscaled and standardized input variables, and that the normalized input variables result in better performance and more stable or a tighter distribution of error scores.

These results highlight that it is important to actually experiment and confirm the results of data scaling methods rather than assuming that a given data preparation scheme will work best based on the observed distribution of the data.

Box and Whisker Plots of Mean Squared Error With Unscaled, Normalized and Standardized Input Variables for the Regression Problem

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

Normalize Target Variable. Update the example and normalize instead of standardize the target variable and compare results.
Compared Scaling for Target Variable. Update the example to compare standardizing and normalizing the target variable using repeated experiments and compare the results.
Other Scales. Update the example to evaluate other min/max scales when normalizing and compare performance, e.g. [-1, 1] and [0.0, 0.5].

If you explore any of these extensions, I’d love to know.

Summary

In this tutorial, you discovered how to improve neural network stability and modeling performance by scaling data.

Specifically, you learned:

Data scaling is a recommended pre-processing step when working with deep learning neural networks.
Data scaling can be achieved by normalizing or standardizing real-valued input and output variables.
How to apply standardization and normalization to improve the performance of a Multilayer Perceptron model on a regression predictive modeling problem.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Better Deep Learning Models Today!

Train Faster, Reduce Overftting, and Ensembles

...with just a few lines of python code

Discover how in my new Ebook:
<a href="/better-deep-learning/" rel="nofollow">Better Deep Learning</a>

It provides self-study tutorials on topics like:
weight decay, batch normalization, dropout, model stacking and much more...

Bring better deep learning to your projects!

Skip the Academics. Just Results.

<a href="/better-deep-learning/" class="woo-sc-button red" >See What's Inside</a>

<button class="ssb_tweet-icon" data-href="https://twitter.com/share?text=How+to+use+Data+Scaling+Improve+Deep+Learning+Model+Stability+and+Performance&url=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" rel="nofollow" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 72 72"><path fill="none" d="M0 0h72v72H0z"/><path class="icon" fill="#fff" d="M68.812 15.14c-2.348 1.04-4.87 1.744-7.52 2.06 2.704-1.62 4.78-4.186 5.757-7.243-2.53 1.5-5.33 2.592-8.314 3.176C56.35 10.59 52.948 9 49.182 9c-7.23 0-13.092 5.86-13.092 13.093 0 1.026.118 2.02.338 2.98C25.543 24.527 15.9 19.318 9.44 11.396c-1.125 1.936-1.77 4.184-1.77 6.58 0 4.543 2.312 8.552 5.824 10.9-2.146-.07-4.165-.658-5.93-1.64-.002.056-.002.11-.002.163 0 6.345 4.513 11.638 10.504 12.84-1.1.298-2.256.457-3.45.457-.845 0-1.666-.078-2.464-.23 1.667 5.2 6.5 8.985 12.23 9.09-4.482 3.51-10.13 5.605-16.26 5.605-1.055 0-2.096-.06-3.122-.184 5.794 3.717 12.676 5.882 20.067 5.882 24.083 0 37.25-19.95 37.25-37.25 0-.565-.013-1.133-.038-1.693 2.558-1.847 4.778-4.15 6.532-6.774z"/></svg>Tweet </button> <button class="ssb_fbshare-icon" target="_blank" data-href="https://www.facebook.com/sharer/sharer.php?u=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" class="_1pbq" color="#ffffff"><path fill="#ffffff" fill-rule="evenodd" class="icon" d="M8 14H3.667C2.733 13.9 2 13.167 2 12.233V3.667A1.65 1.65 0 0 1 3.667 2h8.666A1.65 1.65 0 0 1 14 3.667v8.566c0 .934-.733 1.667-1.667 1.767H10v-3.967h1.3l.7-2.066h-2V6.933c0-.466.167-.9.867-.9H12v-1.8c.033 0-.933-.266-1.533-.266-1.267 0-2.434.7-2.467 2.133v1.867H6v2.066h2V14z"></path></svg> Share </button> <button class="ssb_linkedin-icon" data-href="https://www.linkedin.com/cws/share?url=https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/" onclick="if (!window.__cfRLUnblockHandlers) return false; javascript:window.open(this.dataset.href, , 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;" data-cf-modified-f3886dae12b0536ad361ea93-=""> <svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="15px" height="14.1px" viewBox="-301.4 387.5 15 14.1" enable-background="new -301.4 387.5 15 14.1" xml:space="preserve"> <g id="XMLID_398_"> <path id="XMLID_399_" fill="#FFFFFF" d="M-296.2,401.6c0-3.2,0-6.3,0-9.5h0.1c1,0,2,0,2.9,0c0.1,0,0.1,0,0.1,0.1c0,0.4,0,0.8,0,1.2 c0.1-0.1,0.2-0.3,0.3-0.4c0.5-0.7,1.2-1,2.1-1.1c0.8-0.1,1.5,0,2.2,0.3c0.7,0.4,1.2,0.8,1.5,1.4c0.4,0.8,0.6,1.7,0.6,2.5 c0,1.8,0,3.6,0,5.4v0.1c-1.1,0-2.1,0-3.2,0c0-0.1,0-0.1,0-0.2c0-1.6,0-3.2,0-4.8c0-0.4,0-0.8-0.2-1.2c-0.2-0.7-0.8-1-1.6-1 c-0.8,0.1-1.3,0.5-1.6,1.2c-0.1,0.2-0.1,0.5-0.1,0.8c0,1.7,0,3.4,0,5.1c0,0.2,0,0.2-0.2,0.2c-1,0-1.9,0-2.9,0 C-296.1,401.6-296.2,401.6-296.2,401.6z"/> <path id="XMLID_400_" fill="#FFFFFF" d="M-298,401.6L-298,401.6c-1.1,0-2.1,0-3,0c-0.1,0-0.1,0-0.1-0.1c0-3.1,0-6.1,0-9.2 c0-0.1,0-0.1,0.1-0.1c1,0,2,0,2.9,0h0.1C-298,395.3-298,398.5-298,401.6z"/> <path id="XMLID_401_" fill="#FFFFFF" d="M-299.6,390.9c-0.7-0.1-1.2-0.3-1.6-0.8c-0.5-0.8-0.2-2.1,1-2.4c0.6-0.2,1.2-0.1,1.8,0.2 c0.5,0.4,0.7,0.9,0.6,1.5c-0.1,0.7-0.5,1.1-1.1,1.3C-299.1,390.8-299.4,390.8-299.6,390.9L-299.6,390.9z"/> </g> </svg> Share </button>

</section>

About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.

<a href="https://machinelearningmastery.com/author/jasonb/"> View all posts by Jason Brownlee → </a>

</aside>

</article>

<a href="https://machinelearningmastery.com/greedy-layer-wise-pretraining-tutorial/" rel="prev"> How to Use Greedy Layer-Wise Pretraining in Deep Learning Neural Networks</a>

<a href="https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/" rel="next">How to Avoid Exploding Gradients With Gradient Clipping </a>

125 Responses to How to use Data Scaling Improve Deep Learning Model Stability and Performance

<img alt= src='https://secure.gravatar.com/avatar/5f4a8c800d8bc7448b290942ea3679f6?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/5f4a8c800d8bc7448b290942ea3679f6?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Wonbin February 13, 2019 at 6:03 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-468125" title="Direct link to this comment">#</a>

Thank you for this helpful post for beginners!

Could you please provide more details about the steps of “using the root mean squared error on the unscaled data” to interpret the performance in a specific domain?

Would it be like this??
———————————————————–
1. Finalize the model (based on the performance being calculated from the scaled output variable)
2. Make predictions on test set
3. Invert the predictions (to convert them back into their original scale)
4. Calculate the metrics (e.g. RMSE, MAPE)
———————————————————–

Waiting for your reply! Cheers mate!

<a rel='nofollow' class='comment-reply-link' href='#comment-468125' data-commentid="468125" data-postid="6939" data-belowelement="comment-468125" data-respondelement="respond" data-replyto="Reply to Wonbin" aria-label='Reply to Wonbin'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 14, 2019 at 8:39 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-468241" title="Direct link to this comment">#</a>
  
  Correct.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-468241' data-commentid="468241" data-postid="6939" data-belowelement="comment-468241" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/ec47f14bc90cbce4c89a445adfd2b7a8?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/ec47f14bc90cbce4c89a445adfd2b7a8?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  ajebulon April 30, 2019 at 2:44 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-483399" title="Direct link to this comment">#</a>
  
  Really nice article! I got Some quick questions,
  
  If I have multiple input columns, each has different value range, might be [0, 1000] or even a one-hot-encoded data, should all be scaled with same method, or it can be processed differently?
  
  For example:
  – input A is normalized to [0, 1],
  – input B is normalized to [-1, 1],
  – input C is standardized,
  – one-hot-encoded data is not scaled
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-483399' data-commentid="483399" data-postid="6939" data-belowelement="comment-483399" data-respondelement="respond" data-replyto="Reply to ajebulon" aria-label='Reply to ajebulon'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 1, 2019 at 6:58 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-483508" title="Direct link to this comment">#</a>
    
    Yes, typically it is a good idea to scale all columns to have the same range. Perhaps start with [0,1] and compare others to see if they result in an improvement.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-483508' data-commentid="483508" data-postid="6939" data-belowelement="comment-483508" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/fda5f1787e764f8a4c0b7ba80e7be001?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/fda5f1787e764f8a4c0b7ba80e7be001?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

mk123qwe February 19, 2019 at 5:38 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-469330" title="Direct link to this comment">#</a>

we want standardized inputs, no scaling of outputs,but outputs value is not in (0,1).Are the predictions inaccurate?

<a rel='nofollow' class='comment-reply-link' href='#comment-469330' data-commentid="469330" data-postid="6939" data-belowelement="comment-469330" data-respondelement="respond" data-replyto="Reply to mk123qwe" aria-label='Reply to mk123qwe'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 20, 2019 at 7:51 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-469398" title="Direct link to this comment">#</a>
  
  I don’t follow, are what predictions accurate?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-469398' data-commentid="469398" data-postid="6939" data-belowelement="comment-469398" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/716e4d0c9a6ea39edb7c9e05126eb3a0?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/716e4d0c9a6ea39edb7c9e05126eb3a0?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

yingxiao kong February 28, 2019 at 8:17 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-471145" title="Direct link to this comment">#</a>

Hi Jason,

Your experiment is very helpful for me to understand the difference between different methods, actually I have also done similar things. I always standardized the input data. I have compared the results between standardized and standardized targets. The plots shows that with standardized targets, the network seems to work better. However, here I have a question: suppose the standard deviation of my target is 300, then I think the MSE will be strongly decreased after you fixed the standard deviation to 1. So shall we multiply the original std to the MSE in order to get the MSE in the original target value space?

<a rel='nofollow' class='comment-reply-link' href='#comment-471145' data-commentid="471145" data-postid="6939" data-belowelement="comment-471145" data-respondelement="respond" data-replyto="Reply to yingxiao kong" aria-label='Reply to yingxiao kong'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 28, 2019 at 2:32 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-471186" title="Direct link to this comment">#</a>
  
  You can invert the standardization, by adding the mean and multiplying by the stdev.
  
  I also have an example here using the sklaern:
  <a href="https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/" rel="nofollow ugc">https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-471186' data-commentid="471186" data-postid="6939" data-belowelement="comment-471186" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/0243aa151ffd1cb2f529425448cfd9c2?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/0243aa151ffd1cb2f529425448cfd9c2?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Beato March 11, 2019 at 3:51 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-473491" title="Direct link to this comment">#</a>

Hi Jason,

My data includes categorical and continued data. Could I transform the categorical data with 1,2,3…into standardized data and put them into the neural network models to make classification? Or do I need to transformr the categorical data with with one-hot coding(0,1)? I have been confused about it. Thanks

<a rel='nofollow' class='comment-reply-link' href='#comment-473491' data-commentid="473491" data-postid="6939" data-belowelement="comment-473491" data-respondelement="respond" data-replyto="Reply to Beato" aria-label='Reply to Beato'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 11, 2019 at 6:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-473523" title="Direct link to this comment">#</a>
  
  Yes, perhaps try it and compare results?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-473523' data-commentid="473523" data-postid="6939" data-belowelement="comment-473523" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/49db595c12abdfab10272020470cfc8d?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/49db595c12abdfab10272020470cfc8d?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Bart March 16, 2019 at 5:23 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-474594" title="Direct link to this comment">#</a>

Hi Jason, I have a specific Question regarding the normalization (min-max scaling) of the output value. Usually you are supposed to use normalization only on the training data set and then apply those stats to the validation and test set. Otherwise you would feed the model at training time certain information about the world it shouldn’t have access to. (The Elements of Statistical Learning: Data Mining, Inference, and Prediction p.247)

But for instance, my output value is a single percentage value ranging [0, 100%] and I am using the ReLU activation function in my output layer. I know for sure that in the “real world” regarding my problem statement, that I will get samples ranging form 60 – 100%. But my training sample size is to small and does not contain enough data points including all possible output values. So here comes my question: Should I stay with my initial statement (normalization only on training data set) or should I apply the maximum possible value of 100% to max()-value of the normalization step? The latter would contradict the literature. Best Regards Bart

<a rel='nofollow' class='comment-reply-link' href='#comment-474594' data-commentid="474594" data-postid="6939" data-belowelement="comment-474594" data-respondelement="respond" data-replyto="Reply to Bart" aria-label='Reply to Bart'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 16, 2019 at 8:02 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-474629" title="Direct link to this comment">#</a>
  
  Correct.
  
  I would recommend a sigmoid activation in the output.
  
  I would then recommend interpreting the 0-1 scale as 60-100 prior to model evaluation.
  
  Does that help?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-474629' data-commentid="474629" data-postid="6939" data-belowelement="comment-474629" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/49db595c12abdfab10272020470cfc8d?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/49db595c12abdfab10272020470cfc8d?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Bart March 17, 2019 at 1:37 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-474728" title="Direct link to this comment">#</a>
    
    I’m not quite sure what you mean by your second recommendation. How would I achieve that?
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-474728' data-commentid="474728" data-postid="6939" data-belowelement="comment-474728" data-respondelement="respond" data-replyto="Reply to Bart" aria-label='Reply to Bart'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 17, 2019 at 6:24 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-474757" title="Direct link to this comment">#</a>
      
      You can project the scale of 0-1 to anything you want, such as 60-100.
      
      First rescale to a number between 0 and 40 (value * 40) then add the min value (+ 60)
      
      result = value * 40 + 60
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-474757' data-commentid="474757" data-postid="6939" data-belowelement="comment-474757" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/0884dd13d296d711d439bc0b6287cda4?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/0884dd13d296d711d439bc0b6287cda4?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Mike March 25, 2019 at 1:04 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-476186" title="Direct link to this comment">#</a>

Dear Jason, thank you for the great article.

I am wondering if there is any advantage using StadardScaler or MinMaxScaler over scaling manually. I could calculate the mean, std or min, max of my training data and apply them with the corresponding formula for standard or minmax scaling.

Would this approach produce the same results as the StadardScaler or MinMaxScaler or are the sklearn scalers special?

<a rel='nofollow' class='comment-reply-link' href='#comment-476186' data-commentid="476186" data-postid="6939" data-belowelement="comment-476186" data-respondelement="respond" data-replyto="Reply to Mike" aria-label='Reply to Mike'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 25, 2019 at 6:46 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-476218" title="Direct link to this comment">#</a>
  
  Yes, it is reliable bug free code all wrapped up in a single class – making it harder to introduce new bugs.
  
  Same results as manual, if you coded the manual scaling correctly.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-476218' data-commentid="476218" data-postid="6939" data-belowelement="comment-476218" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/d2d0dadc3e4cd520f14f70264094f8f1?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/d2d0dadc3e4cd520f14f70264094f8f1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Magnus May 9, 2019 at 8:32 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-484599" title="Direct link to this comment">#</a>

Dear Jason,

I have a few questions from section “Data normalization”. You mention that we should estimate the max and min values, and use that to normalize the training set to e.g. [-1,1]. But what if the max and min values are in the validation or test set? Then I might get values e.g. [-1.2, 1.3] in the validation set. Do you consider this to be incorrect or not?

Another approach is then to make sure that the min and max values for all parameters are contained in the training set. What are your thoughts on this? Is this the way to do it? Or should we use the max and min values for all data combined (training, validation and test sets) when normalizing the training set?

For the moment I use the MinMaxScaler and fit_transform on the training set and then apply that scaler on the validation and test set using transform. But I realise that some of my max values are in the validation set. I suppose this is also related to network saturation.

<a rel='nofollow' class='comment-reply-link' href='#comment-484599' data-commentid="484599" data-postid="6939" data-belowelement="comment-484599" data-respondelement="respond" data-replyto="Reply to Magnus" aria-label='Reply to Magnus'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 10, 2019 at 8:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-484663" title="Direct link to this comment">#</a>
  
  Perhaps estimate the min/max using domain knowledge. If new data exceeded the limits, snap to known limits, or not – test and see how the model is impacted.
  
  Regardless, the training set must be representative of the problem.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-484663' data-commentid="484663" data-postid="6939" data-belowelement="comment-484663" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/bce58c4b2459db4a5d3793f7ecfded68?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/bce58c4b2459db4a5d3793f7ecfded68?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

youssef May 30, 2019 at 9:33 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-487085" title="Direct link to this comment">#</a>

Hello Jason, I am a huge fan of your work! Thank you so much for your insightful tutorials. You are a life saver! I have a small question if i may:

I am trying to fit spectrograms in a cnn in order to do some classification tasks. Unfortunately each spectrogram is around (3000,300) array. Is there a way to reduce the dimensionality without losing so much information?

<a rel='nofollow' class='comment-reply-link' href='#comment-487085' data-commentid="487085" data-postid="6939" data-belowelement="comment-487085" data-respondelement="respond" data-replyto="Reply to youssef" aria-label='Reply to youssef'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 30, 2019 at 2:50 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-487103" title="Direct link to this comment">#</a>
  
  Ouch, perhaps start with simple downsampling and see what effect that has?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-487103' data-commentid="487103" data-postid="6939" data-belowelement="comment-487103" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/bfc82179cc284608a467f9bd6338e8a3?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/bfc82179cc284608a467f9bd6338e8a3?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Muktamani July 4, 2019 at 8:54 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-491581" title="Direct link to this comment">#</a>
Hi Jason,
It was always good and informative to go through your blogs and your interaction with comments by different people all across the globe.
I have question regarding the scaling techniques.

As you explained about scaling :
Case1:
1. created scaler
scaler = StandardScaler()
1. fit scaler on training dataset
scaler.fit(trainy)
1. transform training dataset
trainy = scaler.transform(trainy)
1. transform test dataset
testy = scaler.transform(testy)

in this case mean and standard deviation for all train and test remain same.

What i approached is:
case2
1. created scaler
scaler_train = StandardScaler()
1. fit scaler on training dataset
scaler_train.fit(trainy)
1. transform training dataset
trainy = scaler_train.transform(trainy)

# created scaler
scaler_test = StandardScaler()
1. fit scaler on training dataset
scaler_test.fit(trainy)
1. transform test dataset
testy = scaler_test.transform(testy)

Here the mean and standard deviation in train data and test data are different.so model may find the test data completely unknown and new .rather in first case where mean and standard deviation is same on train and test data that may leads to providing the known test data to model (known in term of same mean and standard deviation treatment).

Jason,can you guide me if my logics is good to go with case2 or shall i consider case1 .
or if logic is wrong you can also say that and explain.
(Also i applied Same for min-max scaling i.e normalization, if i choose this then)
Again thanks Jason for such a nice work !

Happy Learning !!

<a rel='nofollow' class='comment-reply-link' href='#comment-491581' data-commentid="491581" data-postid="6939" data-belowelement="comment-491581" data-respondelement="respond" data-replyto="Reply to Muktamani" aria-label='Reply to Muktamani'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 5, 2019 at 8:06 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-491669" title="Direct link to this comment">#</a>
  
  I recommend fitting the scaler on the training dataset once, then apply it to transform the training dataset and test set.
  
  If you fit the scaler using the test dataset, you will have data leakage and possibly an invalid estimate of model performance.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-491669' data-commentid="491669" data-postid="6939" data-belowelement="comment-491669" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/0a3cc60781e8ac093426c5c4d8d4acd7?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/0a3cc60781e8ac093426c5c4d8d4acd7?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

ICHaLiL July 6, 2019 at 12:13 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-491776" title="Direct link to this comment">#</a>

Hi Jason,

I’m working on sequence2sequence problem. Input’s max and min points are around 500-300, however output’s are 200-0. If I want to normalize them, should I use different scalers? For example:

scx = MinMaxScaler(feature_range = (0, 1))
scy = MinMaxScaler(feature_range = (0, 1))

trainx = scx.fit_transform(trainx)
trainy = scy.fit_transform(trainy)

or should I scale them with same scale like below?

sc = MinMaxScaler(feature_range = (0, 1))

trainx = sc.fit_transform(trainx)
trainy = sc.fit_transform(trainy)

<a rel='nofollow' class='comment-reply-link' href='#comment-491776' data-commentid="491776" data-postid="6939" data-belowelement="comment-491776" data-respondelement="respond" data-replyto="Reply to ICHaLiL" aria-label='Reply to ICHaLiL'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 6, 2019 at 8:40 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-491833" title="Direct link to this comment">#</a>
  
  Yes, use a separate transform for inputs and outputs is a good idea. Otherwise have them all as separate columns in the same matrix and use one scaler, but the column order for transform/inverse_transform will always have to be consistent.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-491833' data-commentid="491833" data-postid="6939" data-belowelement="comment-491833" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/1a0f3d8c68c551f8c48172336f96e584?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/1a0f3d8c68c551f8c48172336f96e584?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Brent July 12, 2019 at 6:55 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-492643" title="Direct link to this comment">#</a>

Hi Jason,

Confused about one aspect, I have a small NN with 8 independent variables and one dichotomous dependent variable. I have standardized the input variables (the output variable was left untouched). I have both trained and created the final model with the same standardized data. However, the question is, if I want to create a user interface to receive manual inputs, those will no longer be in the standardized format, so what is the best way to proceed?

<a rel='nofollow' class='comment-reply-link' href='#comment-492643' data-commentid="492643" data-postid="6939" data-belowelement="comment-492643" data-respondelement="respond" data-replyto="Reply to Brent" aria-label='Reply to Brent'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 13, 2019 at 6:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-492702" title="Direct link to this comment">#</a>
  
  You must maintain the objects used to prepare the data, or the coefficients used by those objects (mean and stdev) so that you can prepare new data in an identically way to the way data was prepared during training.
  
  Does that help
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-492702' data-commentid="492702" data-postid="6939" data-belowelement="comment-492702" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/1a0f3d8c68c551f8c48172336f96e584?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/1a0f3d8c68c551f8c48172336f96e584?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Brent July 15, 2019 at 10:58 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-492925" title="Direct link to this comment">#</a>
    
    Thank you, that makes perfect sense.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-492925' data-commentid="492925" data-postid="6939" data-belowelement="comment-492925" data-respondelement="respond" data-replyto="Reply to Brent" aria-label='Reply to Brent'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/c59c98d4d3a9300927a6776535a99ce1?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/c59c98d4d3a9300927a6776535a99ce1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

cgv July 21, 2019 at 11:27 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-493673" title="Direct link to this comment">#</a>

Hi Jason,

I have built an ANN model and scaled my inputs and outputs before feeding to the network. I measure the performance of the model by r2_score. My output variable is height. My r2_score when the output variable is in metres is .98, but when my output variable is in centi-metres , my r2_score is .91. I have scaled my output too before feeding to the network, why is there a difference in r2_score even because the output variable is scaled before feeding to the network.

Thanks in advance

<a rel='nofollow' class='comment-reply-link' href='#comment-493673' data-commentid="493673" data-postid="6939" data-belowelement="comment-493673" data-respondelement="respond" data-replyto="Reply to cgv" aria-label='Reply to cgv'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 22, 2019 at 8:27 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-493717" title="Direct link to this comment">#</a>
  
  Good question, this is why it is important to test different scaling approaches in order to discover what works best for a given dataset and model combination.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-493717' data-commentid="493717" data-postid="6939" data-belowelement="comment-493717" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/f8d373791418b8a89df9877140f5e930?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/f8d373791418b8a89df9877140f5e930?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  madhuri August 29, 2019 at 3:44 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-498877" title="Direct link to this comment">#</a>
  
  Hi Jason,
  I am working on sequence to data prediction problem wherein i am performing normalization on input and output both.
  Once model is trained then to get the actual output in real-time, I have to perform the de-normalization and when I will perform the denorm then error will increase by the same factor I have used for normalization.
  Lets consider, norm predicted output is 0.1 and error of the model is 0.01 .
  denorm predicted output become 0.1*100 = 10 and after de-normalizing the error will be 0.01*100= 1
  So, what will be solution to this eliminate this kind of problem in regression.
  
  Thanks
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-498877' data-commentid="498877" data-postid="6939" data-belowelement="comment-498877" data-respondelement="respond" data-replyto="Reply to madhuri" aria-label='Reply to madhuri'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> August 29, 2019 at 6:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-498916" title="Direct link to this comment">#</a>
    
    What problem exactly?
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-498916' data-commentid="498916" data-postid="6939" data-belowelement="comment-498916" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/f8d373791418b8a89df9877140f5e930?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/f8d373791418b8a89df9877140f5e930?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      madhuri August 29, 2019 at 8:38 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-498975" title="Direct link to this comment">#</a>
      
      The problem is after de-normalization of the output, the error difference between actual and predicted output is scaled up by the normalization factor (max-min) So, I want to know what can be done to make the error difference same for both de-normized as well as normalized output.
      
      Thanks
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-498975' data-commentid="498975" data-postid="6939" data-belowelement="comment-498975" data-respondelement="respond" data-replyto="Reply to madhuri" aria-label='Reply to madhuri'>Reply</a>
      - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
        
        <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> August 30, 2019 at 6:18 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-499030" title="Direct link to this comment">#</a>
        
        I don’t understand, sorry.
<img alt= src='https://secure.gravatar.com/avatar/55e911fe89990c37d91f85e294e0b5b1?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/55e911fe89990c37d91f85e294e0b5b1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

joshBorrison October 7, 2019 at 5:08 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-504531" title="Direct link to this comment">#</a>

Hi Jason,

Do I have to use only one normalization formula for all inputs?

For example: I have 5 inputs [inp1, inp2, inp3, inp4, inp5] where I can estimate max and min only for [inp1, inp2]. So can I use

y = (x – min) / (max – min)

for [inp1, inp2] and

y = x/(1+x)

for [inp3, inp4, inp5]?

<a rel='nofollow' class='comment-reply-link' href='#comment-504531' data-commentid="504531" data-postid="6939" data-belowelement="comment-504531" data-respondelement="respond" data-replyto="Reply to joshBorrison" aria-label='Reply to joshBorrison'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> October 8, 2019 at 7:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-504621" title="Direct link to this comment">#</a>
  
  Yes, it is applied to each input separately – assuming they have different units.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-504621' data-commentid="504621" data-postid="6939" data-belowelement="comment-504621" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/29eba53be478753830b44a7192949154?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/29eba53be478753830b44a7192949154?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

shiva November 13, 2019 at 4:17 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510336" title="Direct link to this comment">#</a>

Hi Jason

what if I scale the word vectors(glove) for exposing to LSTM?

would it affect the accuracy of results or it maintains the semantic relations of words?

Thank you a lot.

<a rel='nofollow' class='comment-reply-link' href='#comment-510336' data-commentid="510336" data-postid="6939" data-belowelement="comment-510336" data-respondelement="respond" data-replyto="Reply to shiva" aria-label='Reply to shiva'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 13, 2019 at 5:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510377" title="Direct link to this comment">#</a>
  
  I don’t think so. Try it and see?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-510377' data-commentid="510377" data-postid="6939" data-belowelement="comment-510377" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/6893f70f68043ccb6fa48815e9962258?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/6893f70f68043ccb6fa48815e9962258?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Murilo Souza November 14, 2019 at 12:35 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510516" title="Direct link to this comment">#</a>

Hello, i was trying to normalize/inverse transoformation in my data, but i got one error that i think its due to the resize i did in my input data. Here’s my code:

import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import time as time
import matplotlib.pyplot as plt
import pydot
import csv as csv
import keras.backend as K
from sklearn.preprocessing import MinMaxScaler

# Downloading data
!wget <a href="https://raw.githubusercontent.com/sibyjackgrove/CNN-on-Wind-Power-Data/master/MISO_power_data_classification_labels.csv" rel="nofollow ugc">https://raw.githubusercontent.com/sibyjackgrove/CNN-on-Wind-Power-Data/master/MISO_power_data_classification_labels.csv</a>
!wget <a href="https://raw.githubusercontent.com/sibyjackgrove/CNN-on-Wind-Power-Data/master/MISO_power_data_input.csv" rel="nofollow ugc">https://raw.githubusercontent.com/sibyjackgrove/CNN-on-Wind-Power-Data/master/MISO_power_data_input.csv</a>

# Trying normalization
batch_size = 1
valid_size = max(1,np.int(0.2*batch_size))
df_input = pd.read_csv(‘./MISO_power_data_input.csv’,usecols =[‘Wind_MWh’,’Actual_Load_MWh’], chunksize=24*(batch_size+valid_size),nrows = 24*(batch_size+valid_size),iterator=True)
df_target = pd.read_csv(‘./MISO_power_data_classification_labels.csv’,usecols =[‘Mean Wind Power’,’Standard Deviation’,’WindShare’],chunksize =batch_size+valid_size,nrows = batch_size+valid_size, iterator=True)
for chunk, chunk2 in zip(df_input,df_target):
InputX = chunk.values
InputX = np.resize(InputX,(batch_size+valid_size,24,2,1))
print(InputX)
InputX.astype(‘float32’, copy=False)
InputY = chunk2.values
InputY.astype(‘float32’, copy=False)
print(InputY)

# create scaler
scaler = MinMaxScaler() # Define limits for normalize data
normalized_input = scaler.fit_transform(InputX) # Normalize input data
normalized_output = scaler.fit_transform(InputY) # Normalize output data
print(normalized_input)
print(normalized_output)
inverse_output = scaler.inverse_transform(normalized_output) # Inverse transformation of output data
print(inverse_output)

The error:

“ValueError: Found array with dim 4. MinMaxScaler expected <= 2."

Do you have any idea how can i fix this? I really didn't wish to change the resize command at the moment.

<a rel='nofollow' class='comment-reply-link' href='#comment-510516' data-commentid="510516" data-postid="6939" data-belowelement="comment-510516" data-respondelement="respond" data-replyto="Reply to Murilo Souza" aria-label='Reply to Murilo Souza'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 14, 2019 at 8:04 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510565" title="Direct link to this comment">#</a>
  
  Perhaps this will help:
  <a href="https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/" rel="nofollow ugc">https://machinelearningmastery.com/machine-learning-data-transforms-for-time-series-forecasting/</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-510565' data-commentid="510565" data-postid="6939" data-belowelement="comment-510565" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/6893f70f68043ccb6fa48815e9962258?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/6893f70f68043ccb6fa48815e9962258?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Murilo Souza November 15, 2019 at 12:44 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510656" title="Direct link to this comment">#</a>
    
    Is there anyway i can do the inverse transform inside the model itself? Because, for example, my MSE reported at the end of each epoch would be in the “wrong” scale.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-510656' data-commentid="510656" data-postid="6939" data-belowelement="comment-510656" data-respondelement="respond" data-replyto="Reply to Murilo Souza" aria-label='Reply to Murilo Souza'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 15, 2019 at 7:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510694" title="Direct link to this comment">#</a>
      
      Yes, you could wrap the model in a sklearn pipeline.
      
      Or wrap the model in your own wrapper class.
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-510694' data-commentid="510694" data-postid="6939" data-belowelement="comment-510694" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
      - <img alt= src='https://secure.gravatar.com/avatar/9ebba87d298788ebb9631a55f842a0d0?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/9ebba87d298788ebb9631a55f842a0d0?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
        
        Mariana Costa April 29, 2021 at 11:11 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-607456" title="Direct link to this comment">#</a>
        
        Does it improve the net to do this?
      - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
        
        <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> April 30, 2021 at 6:06 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-607501" title="Direct link to this comment">#</a>
        
        No, but it may be helpful when coding.
<img alt= src='https://secure.gravatar.com/avatar/9eaa036e1e8a4d28f1f88518e1991d30?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/9eaa036e1e8a4d28f1f88518e1991d30?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Jules Damji November 14, 2019 at 6:55 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510545" title="Direct link to this comment">#</a>

Hey Jason,

I love this tutorial. I was wondering if I can get your permission to use this tutorial, convert all its experimentation and tracking using MLflow, and include it in my tutorials I teach at conferences.

It’s a fitting example of how you can use MLFlow to track different experiments and visually compare the outcomes.

All the credit will be given to you as the source and inspiration. You can see some of the examples here: <a href="https://github.com/dmatrix/spark-saturday/tree/master/tutorials/mlflow/src/python" rel="nofollow ugc">https://github.com/dmatrix/spark-saturday/tree/master/tutorials/mlflow/src/python</a>.

<a rel='nofollow' class='comment-reply-link' href='#comment-510545' data-commentid="510545" data-postid="6939" data-belowelement="comment-510545" data-respondelement="respond" data-replyto="Reply to Jules Damji" aria-label='Reply to Jules Damji'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 14, 2019 at 8:07 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510572" title="Direct link to this comment">#</a>
  
  Thanks!
  
  No problem as long as you clearly cite and link to the post.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-510572' data-commentid="510572" data-postid="6939" data-belowelement="comment-510572" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/9eaa036e1e8a4d28f1f88518e1991d30?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/9eaa036e1e8a4d28f1f88518e1991d30?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

jules Damji November 14, 2019 at 2:45 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510615" title="Direct link to this comment">#</a>

Thanks, I will certainly put the original link and plug your book too, along with your site and an excellent resource of tutorials and examples to learn from.

Cheers
Jules

<a rel='nofollow' class='comment-reply-link' href='#comment-510615' data-commentid="510615" data-postid="6939" data-belowelement="comment-510615" data-respondelement="respond" data-replyto="Reply to jules Damji" aria-label='Reply to jules Damji'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 15, 2019 at 7:42 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-510683" title="Direct link to this comment">#</a>
  
  Thanks Jules.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-510683' data-commentid="510683" data-postid="6939" data-belowelement="comment-510683" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/47703d72c09f5d8df5925defa68b82b3?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/47703d72c09f5d8df5925defa68b82b3?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Hanser November 28, 2019 at 8:13 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-512747" title="Direct link to this comment">#</a>

Amazing content Jason! I was wondering if it is possible to apply different scalers to different inputs given based on their original characteristics? I am asking you that because as you mentioned in the tutorial “Differences in the scales across input variables may increase the difficulty of the problem being modeled” Therefore, if I use standard scaler in one input and normal scaler in another it could be bad for gradient descend.

<a rel='nofollow' class='comment-reply-link' href='#comment-512747' data-commentid="512747" data-postid="6939" data-belowelement="comment-512747" data-respondelement="respond" data-replyto="Reply to Hanser" aria-label='Reply to Hanser'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 28, 2019 at 8:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-512749" title="Direct link to this comment">#</a>
  
  Thanks!
  
  Yes, perhaps try it and compare the results to using one type of scaling for all inputs.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-512749' data-commentid="512749" data-postid="6939" data-belowelement="comment-512749" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/06742781eee91a4b33cd5636540afa46?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/06742781eee91a4b33cd5636540afa46?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Riyaz Pasha December 9, 2019 at 9:26 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-514349" title="Direct link to this comment">#</a>

Hi Jason,
I am solving the Regression problem and my accuracy after normalizing the target variable is 92% but I have the doubt about scaling the target variable. So can you elaborate about scaling the Target variable?

<a rel='nofollow' class='comment-reply-link' href='#comment-514349' data-commentid="514349" data-postid="6939" data-belowelement="comment-514349" data-respondelement="respond" data-replyto="Reply to Riyaz Pasha" aria-label='Reply to Riyaz Pasha'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> December 10, 2019 at 7:30 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-514403" title="Direct link to this comment">#</a>
  
  You cannot calculate accuracy for regression. You must calculate error.
  
  More details here:
  <a href="https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression" rel="nofollow ugc">https://machinelearningmastery.com/faq/single-faq/how-do-i-calculate-accuracy-for-regression</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-514403' data-commentid="514403" data-postid="6939" data-belowelement="comment-514403" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/392a28317334d0cdf35969ebce45afd6?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/392a28317334d0cdf35969ebce45afd6?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

FAIZ December 30, 2019 at 7:34 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-516662" title="Direct link to this comment">#</a>

Hi Jason Sir!
My data range is variable, e.g. -1500000, 0.0003456, 2387900,23,50,-45,-0.034, what should i do? i want to use MLP, 1D-CNN and SAE.
THANKS

<a rel='nofollow' class='comment-reply-link' href='#comment-516662' data-commentid="516662" data-postid="6939" data-belowelement="comment-516662" data-respondelement="respond" data-replyto="Reply to FAIZ" aria-label='Reply to FAIZ'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> December 31, 2019 at 7:31 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-516700" title="Direct link to this comment">#</a>
  
  Perhaps try normalizing the data first?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-516700' data-commentid="516700" data-postid="6939" data-belowelement="comment-516700" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/392a28317334d0cdf35969ebce45afd6?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/392a28317334d0cdf35969ebce45afd6?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Faiz January 2, 2020 at 12:28 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-516827" title="Direct link to this comment">#</a>
    
    i tried different type of normalization but got data type errors, i used “MinMaxScaler ” and also (X-min(X))/ (max(X)-min(X)), but it can’t process. I want to know about the tf.compat.v1.keras.utils.normalize() command, what it actually do? thanks
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-516827' data-commentid="516827" data-postid="6939" data-belowelement="comment-516827" data-respondelement="respond" data-replyto="Reply to Faiz" aria-label='Reply to Faiz'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> January 2, 2020 at 6:42 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-516854" title="Direct link to this comment">#</a>
      
      I don’t have a tutorial on that, perhaps check the source code?
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-516854' data-commentid="516854" data-postid="6939" data-belowelement="comment-516854" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/13394f97aa5b4cb680aa557b82cfe3b7?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/13394f97aa5b4cb680aa557b82cfe3b7?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

BNB January 30, 2020 at 3:32 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-519745" title="Direct link to this comment">#</a>

Hi Jason

I have a question about the normalization of data. Samples from the population may be added to the dataset over time, and the attribute values for these new objects may then lie outside those you have seen so far. One possibility to handle new minimum and maximum values is to periodically renormalize the data after including the new values. Is there any normalization approach without renormalization?

Thanks,

<a rel='nofollow' class='comment-reply-link' href='#comment-519745' data-commentid="519745" data-postid="6939" data-belowelement="comment-519745" data-respondelement="respond" data-replyto="Reply to BNB" aria-label='Reply to BNB'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> January 30, 2020 at 6:56 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-519769" title="Direct link to this comment">#</a>
  
  Yes, re-normalizing is one approach.
  
  Clipping values to historical limits is another.
  
  Perhaps try a few methods and see what makes sense for your project?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-519769' data-commentid="519769" data-postid="6939" data-belowelement="comment-519769" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/a032a57b03c559f83d394587e12353bf?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a032a57b03c559f83d394587e12353bf?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

<a href='http://None.' rel='external nofollow ugc' class='url'>Tajik</a> February 19, 2020 at 1:15 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522214" title="Direct link to this comment">#</a>

Hi Jason

Should we use “standard_deviation = sqrt( sum( (x – mean)**2 ) / count(x))” instead of “standard_deviation = sqrt( sum( (x – mean)^2 ) / count(x))”?

Does “^” sign represent square root in Python and is it fine not to subtract count (x) by 1 (in order to make it std of sample distribution, unless we have 100% observation of a population)?

Thank you

Best
Tajik

<a rel='nofollow' class='comment-reply-link' href='#comment-522214' data-commentid="522214" data-postid="6939" data-belowelement="comment-522214" data-respondelement="respond" data-replyto="Reply to Tajik" aria-label='Reply to Tajik'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 19, 2020 at 8:06 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522269" title="Direct link to this comment">#</a>
  
  ^ means superscript (e.g. exponent) in latex and excel.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-522269' data-commentid="522269" data-postid="6939" data-belowelement="comment-522269" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/2c010c2a8cb9a42b1e486859a52ff1c8?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/2c010c2a8cb9a42b1e486859a52ff1c8?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Peter February 22, 2020 at 6:18 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522703" title="Direct link to this comment">#</a>

Hi Jason,

Very helpful post as always! I am slightly confused regarding the use of the scaler object though. In my scenario…

If I have a set of data that I split into a training set and validation set, I then scale the data as follows:

scaler = MinMaxScaler()
scaledTrain = scaler.fit_transform(trainingSet)
scaledValid = scaler.transform(validationSet)

I then use this data to train a deep learning model.

My question is, should I use the same scaler object, which was created using the training set, to scale my new, unseen test data before using that test set for predicting my model’s performance? Or should I create a new, separate scaler object using the test data?

Thanks in advance

Michael

<a rel='nofollow' class='comment-reply-link' href='#comment-522703' data-commentid="522703" data-postid="6939" data-belowelement="comment-522703" data-respondelement="respond" data-replyto="Reply to Peter" aria-label='Reply to Peter'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 22, 2020 at 6:40 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522738" title="Direct link to this comment">#</a>
  
  Yes.
  
  Any data given to your model MUST be prepared in the same way. You are defining the expectations for the model based on how the training set looks.
  
  Use the same scaler object – it knows – from being fit on the training dataset – how to transform data in the way your model expects.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-522738' data-commentid="522738" data-postid="6939" data-belowelement="comment-522738" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/2c010c2a8cb9a42b1e486859a52ff1c8?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/2c010c2a8cb9a42b1e486859a52ff1c8?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Peter February 22, 2020 at 7:24 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522746" title="Direct link to this comment">#</a>
    
    Awesome! Thanks so much for the quick response and clearing that up for me.
    
    Very best wishes.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-522746' data-commentid="522746" data-postid="6939" data-belowelement="comment-522746" data-respondelement="respond" data-replyto="Reply to Peter" aria-label='Reply to Peter'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 23, 2020 at 7:19 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-522840" title="Direct link to this comment">#</a>
      
      You’re welcome.
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-522840' data-commentid="522840" data-postid="6939" data-belowelement="comment-522840" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/f9d8a0f5294efbd30ff2bce83feec60e?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/f9d8a0f5294efbd30ff2bce83feec60e?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Mike March 10, 2020 at 2:21 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-525070" title="Direct link to this comment">#</a>

Hi Jason,

Thank you for the tutorial. A question about the conclusion: I find it surprising that standardization did not yield better performance compared to the model with unscaled inputs. Shouldn’t standardization provide better convergence properties when training neural networks? It’s also surprising that min-max scaling worked so well. If all of your inputs are positive (i.e between [0, 1] in this case), doesn’t that mean ALL of your weight updates at each step will be the same sign, which leads to inefficient learning?

<a rel='nofollow' class='comment-reply-link' href='#comment-525070' data-commentid="525070" data-postid="6939" data-belowelement="comment-525070" data-respondelement="respond" data-replyto="Reply to Mike" aria-label='Reply to Mike'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 11, 2020 at 5:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-525143" title="Direct link to this comment">#</a>
  
  Not always. It really depends on the problem and the model.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-525143' data-commentid="525143" data-postid="6939" data-belowelement="comment-525143" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/b667b30b21ddb5dd97602fdd1511b7ac?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/b667b30b21ddb5dd97602fdd1511b7ac?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Zeynep newby May 15, 2020 at 8:59 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-534865" title="Direct link to this comment">#</a>

Hi Jason,

I am an absolute beginner into neural networks and I appreciate your helpful website. In the lecture, I learned that when normalizing a training set, one should use the same mean and standard deviation from training for the test set. But I see in your codes that you’re normalizing training and test sets individually. Is that for a specific reason?

<a rel='nofollow' class='comment-reply-link' href='#comment-534865' data-commentid="534865" data-postid="6939" data-belowelement="comment-534865" data-respondelement="respond" data-replyto="Reply to Zeynep newby" aria-label='Reply to Zeynep newby'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 16, 2020 at 6:09 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-534911" title="Direct link to this comment">#</a>
  
  The example correctly fits the transform on the training set then applies the transform to train and test sets.
  
  If we don’t do it this way, it will result in data leakage and in turn an optimistic estimate of model performance.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-534911' data-commentid="534911" data-postid="6939" data-belowelement="comment-534911" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/b667b30b21ddb5dd97602fdd1511b7ac?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/b667b30b21ddb5dd97602fdd1511b7ac?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Zeynep newby May 15, 2020 at 9:04 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-534866" title="Direct link to this comment">#</a>

Hi again,

since I saw another comment having the same question like me, I noticed that you acutally have done exactly the same thing as I expected. Since I am not familiar with the syntax yet, I got it wrong. Thanks very much!

<a rel='nofollow' class='comment-reply-link' href='#comment-534866' data-commentid="534866" data-postid="6939" data-belowelement="comment-534866" data-respondelement="respond" data-replyto="Reply to Zeynep newby" aria-label='Reply to Zeynep newby'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 16, 2020 at 6:10 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-534912" title="Direct link to this comment">#</a>
  
  No problem!
  
  Ask questions anyway, even if you’re not sure. The tutorials are really just the starting point in a conversation.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-534912' data-commentid="534912" data-postid="6939" data-belowelement="comment-534912" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/019c11b37908413095fb30111b8fa83b?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/019c11b37908413095fb30111b8fa83b?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Isaac May 17, 2020 at 3:25 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-535093" title="Direct link to this comment">#</a>

Hai Jaison, I am a beginner in ML and I am having an issue with normalizing..
I am developing a multivariate regression model with three inputs and three outputs.
The three inputs are in the range of [700 1500] , [700-1500] and [700 1500]
The three outputs are in the range of [-0.5 0.5] , [-0.5 0.5] and [700 1500]
I have normalized everything in the range of [-1 1].
The loss at the end of 1000 epoch is in the order of 1e-4, but still, I am not satisfied with the fit of the model. Since the loss function is based on normalized target variables and normalized prediction, its value id very less from the first epoch itself.

Is there a way to bring the cost further down?

<a rel='nofollow' class='comment-reply-link' href='#comment-535093' data-commentid="535093" data-postid="6939" data-belowelement="comment-535093" data-respondelement="respond" data-replyto="Reply to Isaac" aria-label='Reply to Isaac'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 18, 2020 at 6:08 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-535157" title="Direct link to this comment">#</a>
  
  Yes, the suggestions here will help you improve your model:
  <a href="https://machinelearningmastery.com/start-here/#better" rel="nofollow ugc">https://machinelearningmastery.com/start-here/#better</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-535157' data-commentid="535157" data-postid="6939" data-belowelement="comment-535157" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Victor Yu June 9, 2020 at 11:43 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-538831" title="Direct link to this comment">#</a>

Hi Jason,

I wonder how you apply scaling to batch data? Say we batch load from tfrecords, for each batch we fit a scaler? If so, then the final scaler is on the last batch, which will be used for test data? Also in batch data, if the batch is small, then it seems the scaler is volatile, especially for MaxMin. Would like to hear your thoughts since in a lot of practices it’s nearly impossible to load huge data into driver to do scaling.

Thanks!

<a rel='nofollow' class='comment-reply-link' href='#comment-538831' data-commentid="538831" data-postid="6939" data-belowelement="comment-538831" data-respondelement="respond" data-replyto="Reply to Victor Yu" aria-label='Reply to Victor Yu'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> June 10, 2020 at 6:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-538889" title="Direct link to this comment">#</a>
  
  Scaling is fit on the training set, then applied to all data, e.g. train, test, val.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-538889' data-commentid="538889" data-postid="6939" data-belowelement="comment-538889" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Victor Yu June 10, 2020 at 12:10 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-538935" title="Direct link to this comment">#</a>
    
    The entire training set? What if the entire training set is too big to load in the memory? Even doing batch training, you still do scaling on the entire training set first then do batch training? That seems pretty inefficient
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-538935' data-commentid="538935" data-postid="6939" data-belowelement="comment-538935" data-respondelement="respond" data-replyto="Reply to Victor Yu" aria-label='Reply to Victor Yu'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> June 10, 2020 at 1:25 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-538949" title="Direct link to this comment">#</a>
      
      You can use a generator to load the data step by step, only keep in memory what you can/need.
      
      More suggestions here:
      <a href="https://machinelearningmastery.com/faq/single-faq/how-to-i-work-with-a-very-large-dataset" rel="nofollow ugc">https://machinelearningmastery.com/faq/single-faq/how-to-i-work-with-a-very-large-dataset</a>
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-538949' data-commentid="538949" data-postid="6939" data-belowelement="comment-538949" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
      - <img alt= src='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/ff1ca8052a3c188d4610d96d1b1d0e2b?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
        
        Victor Yu June 11, 2020 at 10:53 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-539080" title="Direct link to this comment">#</a>
        
        Yes, that’s my question. When doing batch training, do you fit (or re-fit) a scaler on each batch? If so, it seems the final scaler that will be used for scoring is fit on the final batch. Do you see any issue with that especially when batch is small? Thanks
      - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
        
        <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> June 11, 2020 at 1:31 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-539088" title="Direct link to this comment">#</a>
        
        You could, this is what batch norm does.
        
        Or you can estimate the coefficients used in scaling up front from a sample of training data. Or some other way you prefer.
<img alt= src='https://secure.gravatar.com/avatar/019dc51ce56a4d8653eca7fafb8d6777?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/019dc51ce56a4d8653eca7fafb8d6777?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Najeh June 19, 2020 at 6:32 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-540105" title="Direct link to this comment">#</a>

Hi Jason,

In deep learning as machine learning, data should be transformed into a tabular format? if yes or no why?

<a rel='nofollow' class='comment-reply-link' href='#comment-540105' data-commentid="540105" data-postid="6939" data-belowelement="comment-540105" data-respondelement="respond" data-replyto="Reply to Najeh" aria-label='Reply to Najeh'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> June 19, 2020 at 1:08 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-540131" title="Direct link to this comment">#</a>
  
  Input data must be vectors or matrices of numbers, this covers tabular data, images, audio, text, and so on.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-540131' data-commentid="540131" data-postid="6939" data-belowelement="comment-540131" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/89fde3b7ab677bddd5b849b15143f92c?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/89fde3b7ab677bddd5b849b15143f92c?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Julie June 24, 2020 at 10:29 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-540919" title="Direct link to this comment">#</a>

Hello Jason,

I used your method (i did standardized my outputs and normalized my inputs with MinMaxScaler()) but i keep having the same issue : when i train my neural network with 3200 and validate with 800 everything alright, i have R2 = 99% but when i increase the training / validation set, R2 decreases which is weird, it should be even higher ? Do you think it has something to do with the scaling of the data ?
Thank you !

<a rel='nofollow' class='comment-reply-link' href='#comment-540919' data-commentid="540919" data-postid="6939" data-belowelement="comment-540919" data-respondelement="respond" data-replyto="Reply to Julie" aria-label='Reply to Julie'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> June 25, 2020 at 6:17 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-540974" title="Direct link to this comment">#</a>
  
  It might be interesting to perform a sensitivity analysis on model performance vs train or test set size to understand the relationship.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-540974' data-commentid="540974" data-postid="6939" data-belowelement="comment-540974" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/6d7de1da9f2b4c1eb0e8b60cf3258437?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/6d7de1da9f2b4c1eb0e8b60cf3258437?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Munaf February 23, 2021 at 4:30 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-598424" title="Direct link to this comment">#</a>
    
    Sir how can I normalize real-time data and scale them between -150 to 150? The data are coming every 5 min interval.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-598424' data-commentid="598424" data-postid="6939" data-belowelement="comment-598424" data-respondelement="respond" data-replyto="Reply to Munaf" aria-label='Reply to Munaf'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> February 23, 2021 at 6:25 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-598446" title="Direct link to this comment">#</a>
      
      Perhaps use the minmaxscaler if you’re having trouble:
      <a href="https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/" rel="nofollow ugc">https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/</a>
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-598446' data-commentid="598446" data-postid="6939" data-belowelement="comment-598446" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/db93172ca2e4548c6328be19ab0761e1?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/db93172ca2e4548c6328be19ab0761e1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

David July 4, 2020 at 1:29 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-542748" title="Direct link to this comment">#</a>

Hi sir,

I have a NN with 6 input variables and one output , I employed minmaxscaler for inputs as well as outputs . My approach was applying the scaler to my whole dataset then splitting it into training and testing dataset, as I dont know the know-hows so is my approach wrong .
Currently the problem I am facing is my actual outputs are positive values but after unscaling the NN predictions I am getting negative values. I tried changing the feature range, still NN predicted negative values , so how can i solve this?

Y1=Y1.reshape(-1, 1)
Y2=Y2.reshape(-1, 1)
TY1=TY1.reshape(-1, 1)
TY2=TY2.reshape(-1, 1)
scaler1 = MinMaxScaler(feature_range=(0, 1))
rescaledX= scaler1.fit_transform(X)
rescaledTX=scaler1.fit_transform(TX)
scaler2 = MinMaxScaler(feature_range=(0, 2))
rescaledY1 = scaler2.fit_transform(Y1)

scaler3 = MinMaxScaler(feature_range=(0, 2))

rescaledY2 = scaler3.fit_transform(Y2)

<a rel='nofollow' class='comment-reply-link' href='#comment-542748' data-commentid="542748" data-postid="6939" data-belowelement="comment-542748" data-respondelement="respond" data-replyto="Reply to David" aria-label='Reply to David'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 5, 2020 at 6:54 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-542829" title="Direct link to this comment">#</a>
  
  First, perhaps confirm that there is no bug in your code.
  
  Second, it is possible for the model to predict values that get mapped to a value out of bounds. You could use a n if-statement to snap them to the required range or use a model that forces predictions to the required range.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-542829' data-commentid="542829" data-postid="6939" data-belowelement="comment-542829" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/d3ca990d016251679e2fe27de19d6af2?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/d3ca990d016251679e2fe27de19d6af2?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

TAMER A. FARRAG July 30, 2020 at 9:42 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-546517" title="Direct link to this comment">#</a>

Thanks a lot,
My question is:

I finish training my model and I use normalized data for inputs and outputs.
my problem now is when I need to use this model I do the following:
1- I load the model
2- normalize the inputs
3- use model to get the outputs (predicted data)

how to denormalized the output of the model ??? I don’t have the MinMaxScaler for the output ??

<a rel='nofollow' class='comment-reply-link' href='#comment-546517' data-commentid="546517" data-postid="6939" data-belowelement="comment-546517" data-respondelement="respond" data-replyto="Reply to TAMER A. FARRAG" aria-label='Reply to TAMER A. FARRAG'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 30, 2020 at 1:45 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-546534" title="Direct link to this comment">#</a>
  
  You can call inverse_transform() on the scaler object for the predictions to get the data back to the original scale.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-546534' data-commentid="546534" data-postid="6939" data-belowelement="comment-546534" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/d3ca990d016251679e2fe27de19d6af2?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/d3ca990d016251679e2fe27de19d6af2?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    TAMER A. FARRAG July 30, 2020 at 7:21 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-546563" title="Direct link to this comment">#</a>
    
    Thanks for fast replay,
    
    I think my question is not clear to you.
    
    imagine than I finish the training phase and save the trained model named “model1”.
    
    I send the “model1” file to a friend and he tries to use it, he will normalize the inputs and get the outputs. In this case, he doesn’t have the scaler object to recover the original values using inverse_transform().
    
    my problem is similar to: <a href="https://stackoverflow.com/questions/37595891/how-to-recover-original-values-after-a-model-predict-in-keras" rel="nofollow ugc">https://stackoverflow.com/questions/37595891/how-to-recover-original-values-after-a-model-predict-in-keras</a>
    but the answer don’t use the scaler object. It depends on manual normalization and normalization process
    
    Thanks for your time
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-546563' data-commentid="546563" data-postid="6939" data-belowelement="comment-546563" data-respondelement="respond" data-replyto="Reply to TAMER A. FARRAG" aria-label='Reply to TAMER A. FARRAG'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 31, 2020 at 6:15 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-546618" title="Direct link to this comment">#</a>
      
      Save the scaler object as well:
      <a href="https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/" rel="nofollow ugc">https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/</a>
      
      You are developing a “modeling pipeline”, not just a predictive model.
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-546618' data-commentid="546618" data-postid="6939" data-belowelement="comment-546618" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/694ca2a3a45788c5d034de77503eff26?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/694ca2a3a45788c5d034de77503eff26?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Mel August 15, 2020 at 9:09 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-549367" title="Direct link to this comment">#</a>

Hi Jason,

Do you know of any textbooks or journal articles that address the input scaling issue as you’ve described it here, in addition to the Bishop textbook? I’m struggling so far in vain to find discussions of this type of scaling, when different raw input variables have much different ranges. Instead I’m finding plenty of mentions in tutorials and blog posts (of which yours is one of the clearest), and papers describing the problems of scale (size) variance in neural networks designed for image recognition.

Thanks!

<a rel='nofollow' class='comment-reply-link' href='#comment-549367' data-commentid="549367" data-postid="6939" data-belowelement="comment-549367" data-respondelement="respond" data-replyto="Reply to Mel" aria-label='Reply to Mel'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> August 15, 2020 at 1:26 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-549393" title="Direct link to this comment">#</a>
  
  Not really, practical issues are not often discussed in textbooks/papers.
  
  Maybe “neural smithing”? Maybe Bishops later book?
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-549393' data-commentid="549393" data-postid="6939" data-belowelement="comment-549393" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/be0a9890d7813eeca82228a828048fa0?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/be0a9890d7813eeca82228a828048fa0?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Munisha Bansal September 29, 2020 at 5:44 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-565174" title="Direct link to this comment">#</a>

Hi Jason,

Thank you very much for the article. I wanted to understand the following scenario

I have mix of categorical and numerical inputs. I can normalize/standardize the numerical inputs and the output numerical variable.
But in the categorical variables I have high number of categories ~3000. So I use label encoder (not one hot coding) and then I use embedding layers. How can I achieve scaling in this case.

<a rel='nofollow' class='comment-reply-link' href='#comment-565174' data-commentid="565174" data-postid="6939" data-belowelement="comment-565174" data-respondelement="respond" data-replyto="Reply to Munisha Bansal" aria-label='Reply to Munisha Bansal'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> September 30, 2020 at 6:24 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-565223" title="Direct link to this comment">#</a>
  
  You can separate the columns and scale them independently, then aggregate the results.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-565223' data-commentid="565223" data-postid="6939" data-belowelement="comment-565223" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/3c1d6c60d6feecafba80283ef8eed1d4?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/3c1d6c60d6feecafba80283ef8eed1d4?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Hamed October 22, 2020 at 1:58 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-569739" title="Direct link to this comment">#</a>

Hi Jason,

I really enjoyed reading your article. My CNN regression network has binary image as input which the background is black, and foreground is white. The ground truth associated with each input is an image with color range from 0 to 255 which is normalized between 0 and 1.

The network can almost detect edges and background but in foreground all the predicted values are almost same. Do you have any idea what is the solution?

I appreciate in advance.

<a rel='nofollow' class='comment-reply-link' href='#comment-569739' data-commentid="569739" data-postid="6939" data-belowelement="comment-569739" data-respondelement="respond" data-replyto="Reply to Hamed" aria-label='Reply to Hamed'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> October 22, 2020 at 6:45 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-569783" title="Direct link to this comment">#</a>
  
  Thanks.
  
  Perhaps these tips will help you improve the performance of your model:
  <a href="https://machinelearningmastery.com/start-here/#better" rel="nofollow ugc">https://machinelearningmastery.com/start-here/#better</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-569783' data-commentid="569783" data-postid="6939" data-belowelement="comment-569783" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/893fb72ba4861e6b85b47d6575d13777?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/893fb72ba4861e6b85b47d6575d13777?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

walid November 5, 2020 at 11:50 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-573097" title="Direct link to this comment">#</a>
Hi jason, how are you?

i have data with input X (matrix with real values) and output y (matrix real values).
i tried to normalize X and y :

scaler1 = Normalizer()
X = scaler1.fit_transform(X)
scaler2 = Normalizer()
y = scaler2.fit_transform(y)

i get a good result with the transform normalizer as shown by: <a href="https://ibb.co/bQYCkvK" rel="nofollow ugc">https://ibb.co/bQYCkvK</a>

at the end i tried to get the predicted values: yhat = model.predict(X_test)

the problem here yhat is not the original data, it’s a transformed data and there is no inverse for normalizer.

I tried to use the minmaxScalar an order to do the inverse operation (invyhat = scaler2.inverse_transform(yhat)) but i get a big numbers compared to the y_test values that i want.

I tried to normalize just X, i get a worst result compared to the first one.

could you please help me.

example of X values : 1006.808362,13.335140,104.536458 …..
289.197205,257.489613,106.245104,566.941857…..
.

example of y values: 0.50000, 250.0000
0.879200,436.000000
.
.

this is my code:

X = dataset[:,0:20]
y = dataset[:,20:22]

scaler1 = Normalizer()
X = scaler1.fit_transform(X)
scaler2 = Normalizer()
y = scaler2.fit_transform(y)

X_train = X[90000:,:]
X_test= X[:90000,:]
y_train =y[90000:,:]
y_test=y[:90000,:]

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
1. define the keras model
model = Sequential()
1. input layer
model.add(Dense(20, input_dim=20,activation=’relu’,kernel_initializer=’normal’))
1. hidden layer
model.add(Dense(7272,activation=’relu’,kernel_initializer=’normal’))
model.add(Dropout(0.8))
1. output layer
model.add(Dense(2, activation=’linear’))
opt =Adadelta(lr=0.01)
1. compile the keras model
model.compile(loss=’mean_squared_error’, optimizer=opt, metrics=[‘mse’])
1. fit the keras model on the dataset
history=model.fit(X_train, y_train, validation_data=(X_test, y_test),epochs=20,verbose=0)
1. evaluate the model
_, train_mse = model.evaluate(X_train, y_train, verbose=0)
_, test_mse = model.evaluate(X_test, y_test, verbose=0)
print(‘Train: %.3f, Test: %.3f’ % (train_mse, test_mse))
yhat = model.predict(X_test)
1. plot loss during training
pyplot.title(‘Loss / Mean Squared Error’)
pyplot.plot(history.history[‘loss’], label=’train’)
pyplot.plot(history.history[‘val_loss’], label=’test’)
pyplot.legend()
pyplot.show()

<a rel='nofollow' class='comment-reply-link' href='#comment-573097' data-commentid="573097" data-postid="6939" data-belowelement="comment-573097" data-respondelement="respond" data-replyto="Reply to walid" aria-label='Reply to walid'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 6, 2020 at 5:57 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-573152" title="Direct link to this comment">#</a>
  
  Sorry to hear that you’re having trouble, perhaps some of these tips will help:
  <a href="https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code" rel="nofollow ugc">https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-573152' data-commentid="573152" data-postid="6939" data-belowelement="comment-573152" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/6616000cf7519211f351670ba2f44684?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/6616000cf7519211f351670ba2f44684?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Carlos November 17, 2020 at 9:18 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-575881" title="Direct link to this comment">#</a>

Hi Jason, first thanks for the wonderful article. I have a little doubt. By normalizing my data and then dividing it into training and testing, all samples will be normalized. But in the case of a real application, where I have an input given by the user, do I need to put it together with all the data and normalize it so that it has the same pattern as the other data? What would be the best alternative?

<a rel='nofollow' class='comment-reply-link' href='#comment-575881' data-commentid="575881" data-postid="6939" data-belowelement="comment-575881" data-respondelement="respond" data-replyto="Reply to Carlos" aria-label='Reply to Carlos'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> November 17, 2020 at 12:56 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-575928" title="Direct link to this comment">#</a>
  
  Good question.
  
  No, save the scaler object or coefficients used for scaling along with the model and use them to prepare new data in the future. More here:
  <a href="https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/" rel="nofollow ugc">https://machinelearningmastery.com/how-to-save-and-load-models-and-data-preparation-in-scikit-learn-for-later-use/</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-575928' data-commentid="575928" data-postid="6939" data-belowelement="comment-575928" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/02c42cbeb380c2b5c7be98988003ca06?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/02c42cbeb380c2b5c7be98988003ca06?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

<a href='http://www.iqvia.com' rel='external nofollow ugc' class='url'>Chris</a> December 3, 2020 at 2:41 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-580301" title="Direct link to this comment">#</a>

Hi Jason, what is the best way to scale NANs when you need the model to generate them? I am creating a synthetic dataset where NANs are critical part. In one case we have people with no corresponding values for a field (truly missing) and in another case we have missing values but want to replicate the fact that values are missing. I tried filling the missing values with the negative sys.max value, but the model tends to spread values between the real data negative limit and the max limit, instead of treating the max value as an outlier. In another case, it seems to ignore that value and always generates values with the real data range, resulting in no generated NANs. I enjoyed your book and look forward to your response.

<a rel='nofollow' class='comment-reply-link' href='#comment-580301' data-commentid="580301" data-postid="6939" data-belowelement="comment-580301" data-respondelement="respond" data-replyto="Reply to Chris" aria-label='Reply to Chris'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> December 3, 2020 at 8:21 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-580371" title="Direct link to this comment">#</a>
  
  You cannot scale a NaN, you must replace it with a value, called imputation.
  
  If you want to mark missing values with a special value, mark and then scale, or remove the rows from the scale process, and impute after scale. The latter sounds better to me.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-580371' data-commentid="580371" data-postid="6939" data-belowelement="comment-580371" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/13b5071fa396d975a63419f7285e76e3?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/13b5071fa396d975a63419f7285e76e3?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Luke Mao January 6, 2021 at 4:13 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-591809" title="Direct link to this comment">#</a>

Thanks Jason for the blog post.
One question:
is it necessary to apply feature scaling for linear regression models as well as MLP’s?

<a rel='nofollow' class='comment-reply-link' href='#comment-591809' data-commentid="591809" data-postid="6939" data-belowelement="comment-591809" data-respondelement="respond" data-replyto="Reply to Luke Mao" aria-label='Reply to Luke Mao'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> January 6, 2021 at 6:32 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-591844" title="Direct link to this comment">#</a>
  
  Yes, it is a good idea to scale input data prior to modeling for models that use a weighted sum of input, like neural nets and regression models.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-591844' data-commentid="591844" data-postid="6939" data-belowelement="comment-591844" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/13b5071fa396d975a63419f7285e76e3?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/13b5071fa396d975a63419f7285e76e3?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Lu Mao January 6, 2021 at 9:00 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-591999" title="Direct link to this comment">#</a>
    
    Thanks Jason. May I ask a follow up question, what is your view on if it is wrong to only scale the input, not scale the output?.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-591999' data-commentid="591999" data-postid="6939" data-belowelement="comment-591999" data-respondelement="respond" data-replyto="Reply to Lu Mao" aria-label='Reply to Lu Mao'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> January 7, 2021 at 6:16 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-592055" title="Direct link to this comment">#</a>
      
      It depends on the data and model.
      
      Do whatever results in the best performance for your prediction problem.
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-592055' data-commentid="592055" data-postid="6939" data-belowelement="comment-592055" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/e42eb5aaa2a16c0da0eb985043120cc1?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/e42eb5aaa2a16c0da0eb985043120cc1?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Nisarg Patel January 25, 2021 at 1:23 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-594313" title="Direct link to this comment">#</a>

sir, i have a 1 problem

When normalizing a dataset, the resulting data will have a minimum value of 0 and a
maximum value of 1. However, the dataset we work with in data mining is typically a
sample of a population. Therefore, the minimum and maximum for each of the attributes
in the population are unknown.
Samples from the population may be added to the dataset over time, and the attribute
values for these new objects may then lie outside those you have seen so far. One
possibility to handle new minimum and maximum values is to periodically renormalize
the data after including the new values. Your task is to think of a normalization scheme
that does not require you to renormalize all of the data. Your normalization approach has
to fulfill all of the following requirements:
– all values (old and new) have to lie in the range between 0 and 1
– no transformation or renormalization of the old values is allowed
Describe your normalization approach.

<a rel='nofollow' class='comment-reply-link' href='#comment-594313' data-commentid="594313" data-postid="6939" data-belowelement="comment-594313" data-respondelement="respond" data-replyto="Reply to Nisarg Patel" aria-label='Reply to Nisarg Patel'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> January 25, 2021 at 1:31 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-594316" title="Direct link to this comment">#</a>
  
  Perhaps you can use domain knowledge to estimate a broader min and max range prior to scaling.
  
  Perhaps you can clip values to a pre-defined range prior to scaling.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-594316' data-commentid="594316" data-postid="6939" data-belowelement="comment-594316" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/c81ade4d31cc6f6be6d5015e33a80777?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/c81ade4d31cc6f6be6d5015e33a80777?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

J March 10, 2021 at 5:43 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-600398" title="Direct link to this comment">#</a>

Hi Jason!
Thank you so much for this great post 🙂

I have one question I hope you could help with:
Why do we need to conduct 30 model runs in particular? I do understand the idea, but i mean why 30 exactly?

<a rel='nofollow' class='comment-reply-link' href='#comment-600398' data-commentid="600398" data-postid="6939" data-belowelement="comment-600398" data-respondelement="respond" data-replyto="Reply to J" aria-label='Reply to J'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 10, 2021 at 6:28 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-600404" title="Direct link to this comment">#</a>
  
  30 is often used to create a large enough sample that we can use statistical methods and that the estimated stats like mean and stev are not too noisy.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-600404' data-commentid="600404" data-postid="6939" data-belowelement="comment-600404" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/7e738f045a3bc45ac8cf4d42d8618ebe?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/7e738f045a3bc45ac8cf4d42d8618ebe?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Maha March 16, 2021 at 7:14 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-601009" title="Direct link to this comment">#</a>

Thanks Jason
I have some confused questions
If the scaling to input data done on the all data set or done to each sample of the data set seperately?

the scalling is done after dividing data to training and test, yes?

If I done normalizations manual to inputs and output, so I should save the max and min values to normalization inputs and denormalization outputs in future prediction?

If I have the outputs containing two differerent range of variables , is same normalization is effective or I should do further things,for example two different normalization?

Thanks in advance

<a rel='nofollow' class='comment-reply-link' href='#comment-601009' data-commentid="601009" data-postid="6939" data-belowelement="comment-601009" data-respondelement="respond" data-replyto="Reply to Maha" aria-label='Reply to Maha'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 16, 2021 at 7:59 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-601015" title="Direct link to this comment">#</a>
  
  Data scaling, and all data pre-processing should be fit on the training set and applied to the training set, validation set and test sets in order to avoid data leakage. You can learn more about this here:
  <a href="https://machinelearningmastery.com/data-preparation-without-data-leakage/" rel="nofollow ugc">https://machinelearningmastery.com/data-preparation-without-data-leakage/</a>
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-601015' data-commentid="601015" data-postid="6939" data-belowelement="comment-601015" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/7e738f045a3bc45ac8cf4d42d8618ebe?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/7e738f045a3bc45ac8cf4d42d8618ebe?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    Maha March 16, 2021 at 9:20 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-601019" title="Direct link to this comment">#</a>
    
    Many thanks for that, I hd read your mentioned article and understood to avoid data leakage. as I should split data first and got the scale from the trainning set .
    
    but i had another Q:
    my data set , for example contain four vectors [ x1 x2 x3 x4 ], where for example each had 100 values ., x1= [value1……………………..value100], x2=[value1…….value100],……
    then my traing data may be 400Xnumberof sumples.
    
    but the range of values to these is varying , x1 , x2 and x3 had values in range [ -04], forexample [ – 4.7338e-04 to – 1.33-04 ] and the x4 has values in range of [-02], forexample[ -1.33e-02 to 3.66e-02 ]
    
    the same the output has values some in range [-0.0698 to 0.06211] and other in range [-3.1556 to 3.15556]
    
    sorry for long discription , but , what suitable scaling you recommend me to do, if normalization(max, min ) to input and outs can be suitable , or I had to do any other prepation
    
    many thanks to you
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-601019' data-commentid="601019" data-postid="6939" data-belowelement="comment-601019" data-respondelement="respond" data-replyto="Reply to Maha" aria-label='Reply to Maha'>Reply</a>
    - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
      
      <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> March 17, 2021 at 5:55 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-601133" title="Direct link to this comment">#</a>
      
      I recommend starting with normalization. Perhaps try standardization if the variables look like they have a gaussian probability distribution.
      
      <a rel='nofollow' class='comment-reply-link' href='#comment-601133' data-commentid="601133" data-postid="6939" data-belowelement="comment-601133" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/5c8b1fab861a9605f3124497cc3b3113?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/5c8b1fab861a9605f3124497cc3b3113?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Maha April 3, 2021 at 3:27 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-603221" title="Direct link to this comment">#</a>

If normalization and standarization is done of the whole data or each row of the samples , for example , in standardization , we got the mean of the whole data set and substract from each element in data set , or we treat each row in the data set separately and got its mean ,..?

<a rel='nofollow' class='comment-reply-link' href='#comment-603221' data-commentid="603221" data-postid="6939" data-belowelement="comment-603221" data-respondelement="respond" data-replyto="Reply to Maha" aria-label='Reply to Maha'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> April 3, 2021 at 5:35 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-603243" title="Direct link to this comment">#</a>
  
  No, data preparation is typically fit on the training set and applied to the train and test dataset.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-603243' data-commentid="603243" data-postid="6939" data-belowelement="comment-603243" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/76a3b3b0611d93f19dabacbb58a53773?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/76a3b3b0611d93f19dabacbb58a53773?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Carlos May 2, 2021 at 11:12 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-607743" title="Direct link to this comment">#</a>

Hi Jason,
I have a question.. I hope you have time to answer it…

If I scale/normalize the input data… The output label (calculated) will be generated “scalated/normalized” also..correct…
and in order to calculate the output error the expected label should be scalated also..
Correct??
In other words.. I should scalate both..data and labels??

<a rel='nofollow' class='comment-reply-link' href='#comment-607743' data-commentid="607743" data-postid="6939" data-belowelement="comment-607743" data-respondelement="respond" data-replyto="Reply to Carlos" aria-label='Reply to Carlos'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 3, 2021 at 4:55 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-607817" title="Direct link to this comment">#</a>
  
  Scaling input is a good idea, depending on the data and choice of model.
  
  If the target is numeric (e.g. regression), then scaling the target is a good idea, depending on the data and choice of model.
  
  If the target was sealed, then the scaling must be inverted on the prediction and the test data before calculating an error metic.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-607817' data-commentid="607817" data-postid="6939" data-belowelement="comment-607817" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/aa14c08ab4ab406e94848ea6539cd2cb?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/aa14c08ab4ab406e94848ea6539cd2cb?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

Israel May 6, 2021 at 10:03 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-608235" title="Direct link to this comment">#</a>

Hi Jason,

I’m new to deep learning. I tried to implement a CNN regression model with multiple impute image chips of 31 channels(Raster image/TIFF format), and a numeric target variable. But the result I got is quite weird cos its giving me 100% accuracy (r2_score). I also noticed that during training, the loss/val loss output values were all zeros and the training was pretty fast considering feeding over 5000 images into the network. so I feel the network isn’t training anything passé.

I want to ask if this could be as a result of data scaling? My image chips pixel values are in decimals (float) between 0 and 1 (all the image chips are less than 1), while my target variable are a continuous variable between 0 and 160 (integer).

Do you think i need to perform some sort of normalization or standardization of my data?

<a rel='nofollow' class='comment-reply-link' href='#comment-608235' data-commentid="608235" data-postid="6939" data-belowelement="comment-608235" data-respondelement="respond" data-replyto="Reply to Israel" aria-label='Reply to Israel'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 7, 2021 at 6:26 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-608282" title="Direct link to this comment">#</a>
  
  Perhaps try scaling the data and see if it makes a difference.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-608282' data-commentid="608282" data-postid="6939" data-belowelement="comment-608282" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/38fad5c19b9f971dbd2b25b324bda3f8?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/38fad5c19b9f971dbd2b25b324bda3f8?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

<a href='https://acehl.org/' rel='external nofollow ugc' class='url'>JG</a> May 9, 2021 at 6:00 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-608515" title="Direct link to this comment">#</a>

Hi Jason,

Great Tutorial! Thank you very much.
very clear explanation of scaling inputs and output necessity !

I am introducing your tutorial to a friend of mine who is very interested in following you.

regards

<a rel='nofollow' class='comment-reply-link' href='#comment-608515' data-commentid="608515" data-postid="6939" data-belowelement="comment-608515" data-respondelement="respond" data-replyto="Reply to JG" aria-label='Reply to JG'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> May 10, 2021 at 6:18 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-608599" title="Direct link to this comment">#</a>
  
  Thanks!
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-608599' data-commentid="608599" data-postid="6939" data-belowelement="comment-608599" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/e46bacdd9def9b26d935ea5f69c08d42?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/e46bacdd9def9b26d935ea5f69c08d42?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  Phil July 16, 2021 at 4:00 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-616693" title="Direct link to this comment">#</a>
  
  Hi Jason,
  
  Im currently training an MLP and I have 9 metric features and 3 binary coded to 0/1.
  So I have decided only to standardize the 9 metric feautes and leave the binary features untouched.
  Is this approch okay or should standardize the binary features as well – so they have an mean neat to zero and sd of 1
  
  Cheers
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-616693' data-commentid="616693" data-postid="6939" data-belowelement="comment-616693" data-respondelement="respond" data-replyto="Reply to Phil" aria-label='Reply to Phil'>Reply</a>
  - <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
    
    <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> July 16, 2021 at 5:29 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-616721" title="Direct link to this comment">#</a>
    
    It sounds strange to me that you would standardize binary features. Often they would be excluded from any scaling operation.
    
    <a rel='nofollow' class='comment-reply-link' href='#comment-616721' data-commentid="616721" data-postid="6939" data-belowelement="comment-616721" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>
<img alt= src='https://secure.gravatar.com/avatar/64dea7ac4d545d5f96926180a6c7ed54?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/64dea7ac4d545d5f96926180a6c7ed54?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>

voloddia August 2, 2021 at 1:01 pm <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-619500" title="Direct link to this comment">#</a>

“You must ensure that the scale of your output variable matches the scale of the activation function (transfer function) on the output layer of your network.”

I don’t understand this point.
First, the output layer often has no activation function, or in other words, identity activation function which has arbitrary scale.
Second, normalization and standardization are only linear transformations.
Therefore, is it true that normalization/standardization of output is almost always unnecessary? If not, why?

<a rel='nofollow' class='comment-reply-link' href='#comment-619500' data-commentid="619500" data-postid="6939" data-belowelement="comment-619500" data-respondelement="respond" data-replyto="Reply to voloddia" aria-label='Reply to voloddia'>Reply</a>
- <img alt= src='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=40&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/a0942b56b07831ac15d4a168a750e34a?s=80&d=mm&r=g 2x' class='avatar avatar-40 photo' height='40' width='40' loading='lazy'/>
  
  <a href='http://MachineLearningMastery.com' rel='external nofollow ugc' class='url'>Jason Brownlee</a> August 3, 2021 at 4:49 am <a href="https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#comment-619602" title="Direct link to this comment">#</a>
  
  This was critical in the olden days of sigmoid and tanh. These days, normalizing or standardizing is sufficient.
  
  It’s critical because large inputs cause large weights which leads to an unstable network, in general.
  
  <a rel='nofollow' class='comment-reply-link' href='#comment-619602' data-commentid="619602" data-postid="6939" data-belowelement="comment-619602" data-respondelement="respond" data-replyto="Reply to Jason Brownlee" aria-label='Reply to Jason Brownlee'>Reply</a>

Whitening and standardization

Contents

<a href="#navigation">Navigation</a>

Main Menu

How to use Data Scaling Improve Deep Learning Model Stability and Performance

Tutorial Overview

The Scale of Your Data Matters

Scaling Input Variables

Scaling Output Variables

Want Better Results with Deep Learning?

Data Scaling Methods

Data Normalization

Data Standardization

Regression Predictive Modeling Problem

Multilayer Perceptron With Unscaled Data

Multilayer Perceptron With Scaled Output Variables

Multilayer Perceptron With Scaled Input Variables

Extensions

Further Reading

Posts

Books

API

Articles

Summary

Develop Better Deep Learning Models Today!

Train Faster, Reduce Overftting, and Ensembles

Bring better deep learning to your projects!

About Jason Brownlee

125 Responses to How to use Data Scaling Improve Deep Learning Model Stability and Performance

Leave a Reply <a rel="nofollow" id="cancel-comment-reply-link" href="/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/#respond" style="display:none;">Click here to cancel reply.</a>

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Navigation menu

Search