A neural network was given eight noisy position measurements from the first quarter of a projectile's flight — only the ascending phase, never any data showing the apex or the descent. Asked to predict the rest of the trajectory, the conventional model continued upward in a smooth, monotonically increasing curve. It had no idea the projectile was supposed to come down. The same architecture, trained on the same eight points but with Newton's second law and the aerodynamic drag force embedded directly into its loss function, produced the full trajectory — apex, asymmetric descent, ground impact — within centimeters of the truth. The two models differed only by what they knew about the world.
Key point: A pure data-driven neural network and a physics-informed neural network with identical architecture, trained on the same sparse, noisy data, produced extrapolation errors that differed by 49× horizontally and 53× vertically. The only difference was the loss function.
That headline result comes from a brief I just published in the netrii Wisdom Library — a controlled study extending physics-informed neural networks (PINNs) from the well-behaved 1D damped oscillator to a nonlinear, coupled, two-dimensional system: a sphere moving through air under gravity and quadratic aerodynamic drag, with no closed-form analytical solution. The full technical detail, equations, and figures are there. This post is for the operator who needs to know when this lever applies and what it changes about how you think about ML in engineering domains.
The brittle-curve-fitting problem
The conventional neural network in this study was not undertrained, badly architected, or unlucky. Four hidden layers, 64 neurons each, tanh activations, 2000 epochs of Adam with cosine-annealed learning rate. It fit the eight training points beautifully. The failure was not in the training region. The failure was the moment it stepped outside.
This is the failure mode every engineer who has tried to deploy a data-driven model into a physical system has eventually run into. Inside the training distribution, the model looks brilliant. Outside it, the model has no opinion that is grounded in anything real, because nothing in its loss function ever told it that the world has structure. Drop a measurement gap, change an operating regime, ask the model to predict beyond its sampled domain — and the smooth tanh-shaped curves it learned do whatever extrapolation looks locally plausible. Plausible to a curve, not plausible to nature.
For pure-data domains — language, vision, recommender systems — there is no governing equation to embed, and we live with the brittleness by collecting more data. For physical systems, that response is not just expensive. It is often impossible. The data is sparse because measurements are expensive, sensors are limited, regimes are rare, or the regime you care about is the one you have not seen yet. The faster horse here is "collect more data." The automobile is "tell the network what physics already knows."
What the physics residual actually buys you
The PINN is not a different network. It is the same network with a different loss. Three terms instead of one:
- Data loss. The mean-squared error against the eight noisy measurements, exactly as before.
- Physics residual. At a few hundred collocation points sampled across the entire flight time — including all the times for which there is no measurement — the network's predictions are differentiated using PyTorch's autograd, and the resulting acceleration is checked against
m·d²x/dt² = -b·|v|·dx/dtandm·d²y/dt² = -m·g - b·|v|·dy/dt. Any violation is squared and added to the loss. - Initial-condition loss. A penalty if the network does not start the trajectory at the origin.
That is the whole trick. The derivatives are computed exactly by automatic differentiation, not approximated by finite differences. If the residual is zero everywhere, the network's output satisfies Newton's second law exactly. The optimizer is now searching a much smaller manifold — the space of physically admissible trajectories — instead of the full space of curves.
Key point: The physics residual is an inductive bias — it constrains the optimizer to solutions consistent with the governing equations. This is mathematically analogous to Tikhonov regularization, but grounded in physical law rather than an arbitrary smoothness prior. When you actually know the law, that distinction is decisive.
A subtle consequence emerges in the noise robustness experiment. We re-ran the study with five times heavier measurement noise (σ = 1.5 m). The conventional network overfit to the scatter and produced a foreshortened, distorted flight profile. The PINN refused to. The physics residual penalized trajectories that fit the noise but violated Newton's law — so the optimizer was steered toward the physically consistent path even when the noisy data suggested otherwise. The known physics functioned as a regularizer that the noisy data could not overpower. That is something no amount of dropout, weight decay, or smoothness prior can replicate, because none of them know what's true.
Where this lever actually applies
A 49×–53× extrapolation improvement is the kind of number that invites overgeneralization. Let me be precise about where this matters and where it does not.
The lever applies cleanly when all of the following are true:
- The governing equations are known. Conservation laws, transport equations, constitutive relations, ODEs/PDEs that describe how the system has to behave. Newton's second law in this study; the heat equation, Navier–Stokes, drift-diffusion, magnetization dynamics, Maxwell's equations in others.
- Data is sparse, noisy, or expensive. If you can collect a million labeled examples cheaply, the data-driven baseline will close the gap on its own. PINNs earn their keep when each measurement costs a wafer, an hour of beam time, a destructive test, or a regulatory cycle.
- You need to predict outside the sampled regime. Apex prediction from ascending data only is the toy version of this. The real version is predicting yield in process windows you have not run, fatigue beyond the test envelope, or device behavior in operating regimes you have not characterized.
- No closed-form analytical solution exists or it is computationally prohibitive. If a fast analytical or numerical solver already handles the problem, use it. PINNs win where the equations are known but solving them is harder than learning a network that satisfies them.
These are precisely the conditions in manufacturing process control, semiconductor yield modeling, magnetic recording physics, fluid dynamics in design, and physics-aware optimization in quantum systems — the domains where I have spent most of my career, and the domains where most of the sparse-data, expensive-measurement, governing-equation-known problems actually live.
The lever does not apply to language modeling, vision, recommendation, or any domain where the "law" is statistical regularity in human-generated data rather than a differential equation. There is no Newton's second law of customer churn. Use PINNs where physics rules; use data-driven ML where it does not.
A clean operating contrast
| Dimension | Data-only neural network | Physics-informed neural network | | --- | --- | --- | | Behavior inside training data | Excellent fit | Excellent fit | | Behavior outside training data | Brittle; extrapolates as a curve | Constrained to physically admissible trajectories | | Sensitivity to measurement noise | Fits the noise | Penalized by physics residual; resists overfitting | | Data volume required | Large for the regime of interest | Small if equations are known | | Applicable when no closed form exists | Yes, but unreliable | Yes, and the headline use case | | Applicable when no governing equation exists | Yes | No — there is nothing to embed |
The asymmetry matters. The PINN is not strictly better — it is better exactly when you can name the law. That is the strategic decision: not "should we use AI here," but "do we know enough physics to get an order of magnitude more out of the same data?"
The forward question
The result in this brief is not new in spirit — Raissi and colleagues laid out the PINN framework in 2019, and there is now a growing literature on scientific machine learning. What is useful about this study is that it is a clean, controlled, reproducible demonstration on a problem with no analytical solution, and it shows the noise-robustness behavior in an experiment small enough to fit in five pages and intuit immediately.
The operating implication for any organization doing engineering ML is this: before you spend the next quarter collecting more data, ask which of your problems have governing equations you have not put into the model. If the answer is "most of them," the leverage is not in more data. It is in writing the loss function correctly.
The harder question — and the one I am most interested in working through with operators — is which of your specific systems are PINN-shaped, and which look like they should be but quietly are not (because the governing equations are too uncertain, the system is too coupled with unmodeled effects, or the constraints conflict with the data in ways that destabilize training rather than regularize it). That is a conversation worth having before the architecture choice, not after.
This post is a companion essay to the technical brief Physics-Informed Neural Networks for Projectile Trajectory Prediction Under Quadratic Aerodynamic Drag (S. Batra, University of Minnesota ECE, 2026), now available in the netrii Wisdom Library. The brief contains the full equations, training-loss formulation, and figures.
