
I have written about differential form before (What are „differential forms), but there I have never adressed the elephant in the room, of why, we should consider \(df\) as a covector field. Neither have I given any explanation about what \(dx\) could mean. Even less so, what \(\frac{df}{dx}\) could be.

In this plot, we can see the typical representation of how one can calculate the derivative of a function \(f\) using rise over run. By taking the limit as \(dx\) approaches \(0\), we get the traditional derivative \[ \frac{df}{dx}\Bigg{|}_{ x_0 } := \lim_{h\to 0} \frac{f(x_0 + h)- f(x_0)}{h} \tag{1}\]
Implicitly we have evaluated our derivative at the position \(x_0\). By varying this \(x_0\), we get the derivative as it is often practically used, and it may also be written \(\frac{df}{dx}(x)\).
But how does this relate do \(dx\) being some covector field? As a reminder, a covectorfield is just a mapping from points in an underlying topological space to covectors – also known as linear functions or forms (remembder that bilinear forms are just linear functions in two arguments, so the name form does not seem too out of place here).
Well, lets take a step back and analyze Equation 1 again. If we take a Table 1, we can see that the \(df|_{x_0}\) here corresponds to a small nudge in the \(f\) direction starting at \(x_0\), which is \(f(x_0 + h) - f(x_0)\) in the limit above.
How does this relate to \(dx\)? Here the first misunderstanding might occure. After all \(x\) is a real number, not a function like \(f\), so how can \(d\) take both real numbers and functions as a parameter? The solution here is, that standard notation overloads the term “\(x\)”. The \(x\) in the defintion \(f(x) = \text{sin}(x)+1\) is a parameter of \(f\) and therefor in a different scope then the \(x\) which is used in \(dx\). The \(x\) in \(dx\) actually stands for a particular family of projections:
\[ \begin{align*} x &: \mathbb{R} \to \mathbb{R} & x(v_1) = v_1 \\ &: \mathbb{R}^2 \to \mathbb{R} & x\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = v_1 \\ &: \mathbb{R}^3 \to \mathbb{R} & x\begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = v_1 \\ & \vdots \\ &: \mathbb{R}^d \to \mathbb{R} & x\begin{pmatrix} v_1 \\ \vdots \\ v_d \end{pmatrix} = v_1 \\ \end{align*} \]
where we may pick the apropriate one depending on the current context. If we would like to be pedantic though, we could also fomalize the signature of \(x\) as follows
\[ x : \bigsqcup_{i=1, dots} \mathbb{R}^i \to \mathbb{R}, \quad (v_i)_{i=1, \dots, d} \mapsto v_1 \]
By doing this, we get some consistency between \(df|x_0\) which corresponds to \[\lim_{h\to 0} f(x_0 + h) - f(x_0)\] and \(dx|x_0\), which corresponds to \[\lim_{h\to 0} x(x_0 + h) - x(x_0) = \lim_{h\to 0} x_0 + h - x_0 = \lim_{h\to 0} h\]
Now we run into quite the conundrum though. If we were to simply define \(df|_{x_0} = \lim_{h\to 0} f(x_0 + h) - f(x_0)\), we would run into the problem, that we calculate our limits too eagerly. We would then get \[\frac{df|_{x_0}}{dx|_{x_0}} = \frac{\lim_{h\to 0} f(x_0 + h) - f(x_0)} {\lim_{h\to 0} h} = \frac{0}{0}\] where the denominator would be zero, making the result undefined. To solve this problem, we wish to “defer” the evaluation of our limits until the end. We wish to “extract” the limits from any of the algebraic manipulations we wish to apply to \(df\) and \(dx\) until we are ready to do so. This leads us to our first definition of the differential \(d\)
\[ \begin{align*} d &: \underbrace{C^0(\mathbb{R})}_{f} \to \underbrace{\mathbb{R}}_{x_0} \to (\underbrace{\mathbb{R}}_{h} \to \mathbb{R}) \\ df|_{x_0} &= h \mapsto f(x_0 + h) - f(x_0) \end{align*} \]
We use the notation of currying here for simplicity. Now we can postpone any limits until the end: \[ \frac{df}{dx}\Bigg|_{x_0} = h \mapsto \frac{f(x_0 + h) - f(x_0)}{x(x_0+h) - x(x_0)} \\ \] And we may use coercion to simplify our notation, so that \(\frac{df}{dx}\bigg|_{x_0}\) and \(\lim_{h \to 0}\frac{df}{dx}\bigg|_{x_0}(h)\) may be used interchangeably (the notation for that would be \(\uparrow\frac{df}{dx}\bigg|_{x_0} = \lim_{h \to 0}\frac{df}{dx}\bigg|_{x_0}(h)\)). This means, that for any formula, the limit may only be taken as a “post processing” effect on it.
Because both \(f\) and \(x\) (or any differentiable function \(g\)) must be differentiable at each point \(x_0\), which means that the limit above must exist there, we can in fact regard \(dg\) as a covector field:
\[ \begin{align*} dg &: \mathbb{R} \to \mathbb{R}^* \\ x_0 &\mapsto \left(\lambda \mapsto \lambda \cdot \underbrace{\left(\lim_{h \to 0} g(x_0 + h) - g(x_0)\right)}_{\text{scalar, because of } g \in C^0(\mathbb{R})}\right) \end{align*} \]
After this, we can now generalize \(d\) to the exterior derivative real manifolds \(\Omega\), by considering \(\bigwedge^0 \Omega\) as \(\Omega\) and extendint the above definiton to work on \(\Omega\) as well and not just \(\mathbb{R}\).
Additionally, this might give some insights into what \[ \int_a^b f(x) dx \] could mean. Due to the Riemann integral, we like to think about summming up increasingly small rectangles of height \(f(x)\) and width \(dx\). Here we again take the approach of wrapping the sum in a lazily evaluated formula to which we apply this limit post-processing which we get from \(dx\) for free.