Current Density Operator and Polarization
The current-density operator can be written in first quantization as
\hat{J}(x) = \sum_{n=1}^N \hat{j}(x, \hat{r}_n), \qquad \hat{j}(x, \hat{r}) = \tfrac{1}{2} \left\{ \tfrac{\hat{p}}{m}, \, \delta(x - \hat{r}) \right\}, \tag{1}
where \hat{p} = -i\hbar \nabla_r, and x is a parameter indicating the position.
Classically, the current density of a particle with position \mathbf{r}(t) and momentum \mathbf{p}(t) is
\mathbf{j}(\mathbf{x},t) = \frac{\mathbf{p}(t)}{m}\,\delta(\mathbf{x}-\mathbf{r}(t)).
That is, the particle velocity \mathbf{v} = \mathbf{p}/m multiplied by a delta function locating the particle.
In quantum mechanics, \mathbf{r} and \mathbf{p} are promoted to operators \hat{\mathbf{r}}, \hat{\mathbf{p}}. But since, \hat{\mathbf{p}} and \delta(\mathbf{x}-\hat{\mathbf{r}}) do not commute, one must be careful with operator ordering.
To resolve this ambiguity, the current operator is defined by symmetrizing the classical expression:
\hat{j}(\mathbf{x}, \hat{\mathbf{r}}) = \frac{1}{2m}\Big\{ \hat{\mathbf{p}}, \, \delta(\mathbf{x} - \hat{\mathbf{r}}) \Big\}.
This definition ensures:
Hermiticity: the operator is an observable and must be Hermitian.
Continuity equation: combined with the density operator \hat{\rho}(\mathbf{x}) = \delta(\mathbf{x} - \hat{\mathbf{r}}), one finds
\frac{\partial \hat{\rho}(\mathbf{x})}{\partial t} + \nabla \cdot \hat{\mathbf{j}}(\mathbf{x}) = 0,
which is the operator version of particle-number conservation.
The anticommutator form is therefore not arbitrary.
It arises because we need a symmetrized (Hermitian) operator that (i) reduces to the classical expression in the appropriate limit, and (ii) ensures the correct continuity relation between density and current.
Thus,
\hat{j}(x,\hat{r}) = \frac{1}{2} \left\{ \frac{\hat{p}}{m}, \, \delta(x - \hat{r}) \right\}
is the natural definition of the current-density operator.
Remember that a one-body operator
\hat{O}_1 = \sum_{n=1}^N \hat{o}_n,
is represented in second quantization as
\hat{O}_1 = \sum_{\mu,\nu} \langle \mu | \hat{o} | \nu \rangle \, \hat{c}^\dagger_\mu \hat{c}_\nu. \tag{2}
Current Density as the Time Derivative of Polarization?
The polarization field \mathbf{P}(\mathbf{x},t) is defined as the electric dipole moment per unit volume:
\mathbf{P}(\mathbf{x},t) = \frac{1}{V} \sum_i q_i \, \mathbf{r}_i(t) \, \delta(\mathbf{x}-\mathbf{r}_i(t)).
It measures how charges are displaced from equilibrium.
The microscopic current density is given by
\mathbf{j}(\mathbf{x},t) = \sum_i q_i \, \dot{\mathbf{r}}_i(t) \, \delta(\mathbf{x}-\mathbf{r}_i(t)).
Here \dot{\mathbf{r}}_i(t) is the velocity of each charge.
Taking the time derivative of polarization gives
\frac{\partial \mathbf{P}(\mathbf{x},t)}{\partial t} = \frac{1}{V} \sum_i q_i \, \dot{\mathbf{r}}_i(t) \, \delta(\mathbf{x}-\mathbf{r}_i(t)) + \cdots,
where the first term is exactly the microscopic current density. The omitted terms come from differentiating the delta function; these reorganize into divergence terms consistent with charge conservation. Thus, in the macroscopic sense,
\mathbf{j}(\mathbf{x},t) = \frac{\partial \mathbf{P}(\mathbf{x},t)}{\partial t}.
Physically…
- Polarization \mathbf{P} measures how much charge has been displaced.
- Current density \mathbf{j} measures how fast that displacement changes in time.
Therefore, current is naturally the time derivative of polarization.
In Maxwell’s equations with matter, this appears explicitly:
\mathbf{J}_{\text{bound}} = \frac{\partial \mathbf{P}}{\partial t} + \nabla \times \mathbf{M},
where \mathbf{M} is the magnetization.
In the absence of magnetization, the polarization current reduces to
\mathbf{J}_{\text{bound}} = \frac{\partial \mathbf{P}}{\partial t}.
Let point charges q_i have positions \mathbf{r}_i(t) and velocities \mathbf{v}_i(t)=\dot{\mathbf{r}}_i(t).
Define the microscopic charge density and polarization density by
\rho(\mathbf{x},t)=\sum_i q_i\,\delta(\mathbf{x}-\mathbf{r}_i(t)), \qquad \mathbf{P}(\mathbf{x},t)=\sum_i q_i\,\mathbf{r}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t)).
The microscopic current density is
\mathbf{j}(\mathbf{x},t)=\sum_i q_i\,\mathbf{v}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t)).
We now take the time derivative of \mathbf{P}(\mathbf{x},t). Using the product and chain rules,
\begin{aligned} \frac{\partial \mathbf{P}(\mathbf{x},t)}{\partial t} &= \sum_i q_i \frac{d}{dt}\big[ \mathbf{r}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t))\big] \\[6pt] &= \sum_i q_i \big[ \dot{\mathbf{r}}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t)) + \mathbf{r}_i(t)\,\frac{d}{dt}\delta(\mathbf{x}-\mathbf{r}_i(t))\big]. \end{aligned}
The derivative of the delta follows from the chain rule:
\frac{d}{dt}\delta(\mathbf{x}-\mathbf{r}_i(t)) = -\dot{\mathbf{r}}_i(t)\cdot\nabla_{\!x}\,\delta(\mathbf{x}-\mathbf{r}_i(t)) = -\mathbf{v}_i(t)\cdot\nabla\,\delta(\mathbf{x}-\mathbf{r}_i(t)).
Substitute this back:
\begin{aligned} \frac{\partial \mathbf{P}}{\partial t} &= \sum_i q_i \big[ \mathbf{v}_i\,\delta(\mathbf{x}-\mathbf{r}_i) - \mathbf{r}_i \,(\mathbf{v}_i\cdot\nabla)\delta(\mathbf{x}-\mathbf{r}_i)\big] \\ &= \underbrace{\sum_i q_i\,\mathbf{v}_i\,\delta(\mathbf{x}-\mathbf{r}_i)}_{\mathbf{j}(\mathbf{x},t)} \;-\; \sum_i q_i\,\mathbf{r}_i \,(\mathbf{v}_i\cdot\nabla)\delta(\mathbf{x}-\mathbf{r}_i). \end{aligned}
Now use the vector identity (\mathbf{v}\cdot\nabla)\delta = \nabla\cdot(\mathbf{v}\,\delta) because \mathbf{v}_i(t) depends on time (and particle index) but not on the spatial variable \mathbf{x} — therefore \nabla\cdot\mathbf{v}_i=0 with respect to \mathbf{x}. Thus
(\mathbf{v}_i\cdot\nabla)\delta(\mathbf{x}-\mathbf{r}_i) = \nabla\cdot\big(\mathbf{v}_i\,\delta(\mathbf{x}-\mathbf{r}_i)\big).
Apply this to the second term:
\begin{aligned} \frac{\partial \mathbf{P}}{\partial t} &= \mathbf{j}(\mathbf{x},t) \;-\; \sum_i q_i\,\mathbf{r}_i\,\nabla\cdot\big(\mathbf{v}_i\,\delta(\mathbf{x}-\mathbf{r}_i)\big) \\ &= \mathbf{j}(\mathbf{x},t) \;-\; \nabla\cdot\underbrace{\Big(\sum_i q_i\,\mathbf{r}_i\otimes\mathbf{v}_i\,\delta(\mathbf{x}-\mathbf{r}_i)\Big)}_{\mathbf{Q}(\mathbf{x},t)}. \end{aligned}
Here \mathbf{Q}(\mathbf{x},t) is a second-rank tensor field with components Q_{\alpha\beta}=\sum_i q_i r_{i,\alpha} v_{i,\beta}\,\delta(\mathbf{x}-\mathbf{r}_i), and \nabla\cdot\mathbf{Q} denotes the divergence on the second index producing a vector.
Rewriting,
\boxed{\qquad \mathbf{j}(\mathbf{x},t) = \frac{\partial \mathbf{P}(\mathbf{x},t)}{\partial t} \;+\; \nabla\cdot\mathbf{Q}(\mathbf{x},t).\qquad}
So microscopically the current equals the time-derivative of the polarization plus the divergence of the tensor \mathbf{Q} built from \mathbf{r}_i\otimes\mathbf{v}_i (this is the explicit term coming from differentiating the deltas).
Coarse-graining → the macroscopic relation
If we average (coarse-grain) the microscopic fields over a small volume V large compared to atomic scale but small on macroscopic scales, surface contributions from derivatives of delta functions become negligible in the interior. Formally, integrate both sides over a small volume V and use the divergence theorem:
\int_V \mathbf{j}\,d^3x = \int_V \partial_t\mathbf{P}\,d^3x + \int_V \nabla\cdot\mathbf{Q}\,d^3x = \partial_t\!\int_V \mathbf{P}\,d^3x + \oint_{\partial V} \mathbf{Q}\cdot d\mathbf{S}.
If V contains many microscopic charges, the surface integral \oint_{\partial V}\mathbf{Q}\cdot d\mathbf{S} is a surface (boundary) term; when forming a smoothly varying macroscopic polarization field by averaging, these boundary terms either cancel between neighboring averaging volumes or become negligible compared with the volume terms. After coarse-graining we therefore obtain the familiar macroscopic relation (pointwise for the coarse-grained fields):
\boxed{\qquad \mathbf{J}_{\text{mac}}(\mathbf{x},t) = \partial_t\mathbf{P}_{\text{mac}}(\mathbf{x},t). \qquad}
(If magnetization is present, the standard decomposition of the total bound current is \mathbf{J}_{\text{bound}}=\partial_t\mathbf{P}+\nabla\times\mathbf{M}. The \nabla\times\mathbf{M} term arises when internal circulations of charge (magnetic moments) are present; it is a different rearrangement of the microscopic \mathbf{Q}-type terms.)
Summary
- Differentiating \mathbf{P}=\sum_i q_i \mathbf{r}_i\delta(\mathbf{x}-\mathbf{r}_i) produces an explicit term containing derivatives of the delta, which can be rewritten as -\nabla\cdot\mathbf{Q} where \mathbf{Q}=\sum_i q_i \mathbf{r}_i\otimes\mathbf{v}_i\delta(\mathbf{x}-\mathbf{r}_i).
- Microscopically: \mathbf{j}=\partial_t\mathbf{P}+\nabla\cdot\mathbf{Q}.
- After coarse-graining (averaging over volumes containing many charges) the divergence term becomes a boundary/surface contribution and the local macroscopic relation reduces to \mathbf{j}_{\text{mac}}=\partial_t\mathbf{P}_{\text{mac}}.
In classical theory the polarization density is
\mathbf{P}(\mathbf{x},t) = \sum_i q_i\, \mathbf{r}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t)),
and the current density is
\mathbf{j}(\mathbf{x},t) = \sum_i q_i\, \dot{\mathbf{r}}_i(t)\,\delta(\mathbf{x}-\mathbf{r}_i(t)).
Differentiating \mathbf{P} gives \mathbf{j} plus divergence terms, which after coarse-graining reduce to the clean macroscopic relation \hat{\mathbf{j}} = \frac{d}{dt}\hat{\mathbf{P}}.
In quantum mechanics, however, positions \hat{\mathbf{r}}_i and momenta \hat{\mathbf{p}}_i are operators satisfying
[\hat{r}_{i,\alpha}, \hat{p}_{j,\beta}] = i\hbar\,\delta_{ij}\,\delta_{\alpha\beta}.
This non-commutativity complicates the definition of operator-valued polarization.
Polarization operator
A natural definition is
\hat{\mathbf{P}}(\mathbf{x}) = \sum_i q_i\, \hat{\mathbf{r}}_i \,\delta(\mathbf{x}-\hat{\mathbf{r}}_i).
But here the delta function is an operator-valued distribution. Differentiating it in time using the Heisenberg equation,
\frac{d\hat{O}}{dt} = \frac{i}{\hbar}[\hat{H},\hat{O}] + \frac{\partial \hat{O}}{\partial t},
produces extra terms where operator ordering matters. For instance, \hat{\mathbf{r}}_i\otimes \hat{\mathbf{v}}_i and \hat{\mathbf{v}}_i\otimes \hat{\mathbf{r}}_i are not equal.
Current operator
In quantum mechanics the current density operator is defined via the continuity equation for the charge density operator \hat{\rho}(\mathbf{x}) = \sum_i q_i \,\delta(\mathbf{x}-\hat{\mathbf{r}}_i)
The correct Hermitian form is given below and its anticommutator structure ensures Hermiticity and guarantees charge conservation.
\hat{\mathbf{j}}(\mathbf{x}) = \frac{1}{2m} \sum_i q_i \left[ \delta(\mathbf{x}-\hat{\mathbf{r}}_i)\,\hat{\mathbf{p}}_i + \hat{\mathbf{p}}_i \,\delta(\mathbf{x}-\hat{\mathbf{r}}_i) \right].
Thus \hat{\mathbf{j}} is not simply \frac{d}{dt}\hat{\mathbf{P}}.
Let the charge-density operator be \hat{\rho}(\mathbf{x}) = \sum_i q_i \,\delta(\mathbf{x}-\hat{\mathbf{r}}_i). We require the operator continuity equation \frac{\partial \hat{\rho}(\mathbf{x},t)}{\partial t} + \nabla\cdot \hat{\mathbf{j}}(\mathbf{x},t) = 0.
Work in the Heisenberg picture so \partial_t\hat{O} = \frac{i}{\hbar}[\hat{H},\hat{O}] (we assume no explicit time dependence in \hat{\rho}).
For a system of non-relativistic particles with Hamiltonian
\hat{H}=\sum_i \frac{\hat{\mathbf{p}}_i^2}{2m_i} + \hat{V},
the dominant contribution to \partial_t\hat{\rho} comes from the kinetic part (the potential \hat{V}, if it depends only on positions, commutes with \delta(\mathbf{x}-\hat{\mathbf{r}}_i) or produces only contact terms handled similarly). Consider the commutator with the kinetic term for particle (i):
\frac{\partial \hat{\rho}(\mathbf{x})}{\partial t} = \frac{i}{\hbar}\sum_i \left[\frac{\hat{\mathbf{p}}_i^2}{2m_i},\, q_i\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\right] + \cdots
Use the basic commutator identity (component-wise) and [\hat{p}_{i,\alpha}, f(\hat{\mathbf{r}}_i)] = -i\hbar\,\partial_{x_\alpha} f(\hat{\mathbf{r}}_i) \quad\text{(acting on the $x$ variable of the delta).}
Compute
\begin{aligned}
\left[\hat{\mathbf{p}}_i^2,\, \delta(\mathbf{x}-\hat{\mathbf{r}}_i)\right]
&= \hat{\mathbf{p}}_i\,[\hat{\mathbf{p}}_i,\delta] + [\hat{\mathbf{p}}_i,\delta]\,\hat{\mathbf{p}}_i \\
&= -i\hbar\left(\hat{\mathbf{p}}_i\cdot\nabla\delta + \nabla\delta\cdot\hat{\mathbf{p}}_i\right)
\\
&= -i\hbar\;\Big\{\hat{\mathbf{p}}_i,\;\nabla\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\Big\},
\end{aligned}
where \nabla differentiates with respect to the spatial argument \mathbf{x}.
Therefore
\begin{aligned}
\frac{i}{\hbar}\left[\frac{\hat{\mathbf{p}}_i^2}{2m_i},\, q_i\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\right]
&= \frac{q_i}{2m_i}\left(-\hat{\mathbf{p}}_i\cdot\nabla\delta - \nabla\delta\cdot\hat{\mathbf{p}}_i\right) \\
&= -\nabla\cdot\left(\frac{q_i}{2m_i}\big\{\hat{\mathbf{p}}_i,\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\big\}\right).
\end{aligned}
Summing over particles gives the continuity equation with the current operator \boxed{\qquad \hat{\mathbf{j}}(\mathbf{x}) \;=\; \sum_i \frac{q_i}{2m_i}\,\big\{\hat{\mathbf{p}}_i,\; \delta(\mathbf{x}-\hat{\mathbf{r}}_i)\big\}. \qquad}
This is exactly the Hermitian (anticommutator) form used in the literature: the anticommutator resolves ordering ambiguities and yields a manifestly Hermitian operator that satisfies the continuity equation.
But when does the relation return?
If we take expectation values in states and coarse-grain over space, operator-ordering subtleties wash out (see below for why). Then we recover
\langle \hat{\mathbf{j}}(\mathbf{x},t) \rangle = \frac{\partial}{\partial t}\langle \hat{\mathbf{P}}(\mathbf{x},t) \rangle.
This is the macroscopic relation used in electrodynamics of media.
Consider the integrated (total) current, \int d^3x\,\hat{\mathbf{j}}(\mathbf{x}) = \sum_i \frac{q_i}{2m_i}\int d^3x\; \big\{\hat{\mathbf{p}}_i,\; \delta(\mathbf{x}-\hat{\mathbf{r}}_i)\big\}.
Because the delta is a distribution in \mathbf{x}, the spatial integral collapses: \int d^3x\;\delta(\mathbf{x}-\hat{\mathbf{r}}_i)=\hat{\mathbb{I}} (the identity operator), and therefore \int d^3x\; \big\{\hat{\mathbf{p}}_i,\; \delta(\mathbf{x}-\hat{\mathbf{r}}_i)\big\} = \big\{\hat{\mathbf{p}}_i,\; \int d^3x\,\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\big\} = \{\hat{\mathbf{p}}_i,\; \hat{\mathbb{I}}\} = 2\hat{\mathbf{p}}_i.
Hence the integrated current becomes \int d^3x\,\hat{\mathbf{j}}(\mathbf{x}) = \sum_i \frac{q_i}{2m_i}\, 2\hat{\mathbf{p}}_i = \sum_i \frac{q_i}{m_i}\,\hat{\mathbf{p}}_i.
Now compare with the time derivative of the total dipole (total polarization moment) \hat{\mathbf{D}} := \sum_i q_i \,\hat{\mathbf{r}}_i, \qquad \frac{d\hat{\mathbf{D}}}{dt} = \frac{i}{\hbar}[\hat{H},\hat{\mathbf{D}}] = \sum_i \frac{q_i}{m_i}\,\hat{\mathbf{p}}_i, where we used [\hat{\mathbf{p}}_i^2/(2m_i),\hat{\mathbf{r}}_i]= -i\hbar\,\hat{\mathbf{p}}_i/m_i (and potential terms that depend only on positions either commute or give contact terms that vanish for the total).
Therefore \boxed{\qquad \int d^3x\,\hat{\mathbf{j}}(\mathbf{x}) \;=\; \frac{d\hat{\mathbf{D}}}{dt}. \qquad}
Key point: the spatial integral removes the \mathbf{x}-dependent delta, replacing it by the identity operator. That collapse kills all the operator-ordering issues that were tied to the non-commutativity of \hat{\mathbf{p}}_i and \delta(\mathbf{x}-\hat{\mathbf{r}}_i). After integration the anticommutator simply gives 2\hat{\mathbf{p}}_i and the usual velocity operator \hat{\mathbf{p}}_i/m_i appears without ambiguity.
Intuition / summary
- Locally (pointwise in space) we must keep an anticommutator \{\hat{\mathbf{p}}_i,\delta(\mathbf{x}-\hat{\mathbf{r}}_i)\} to obtain a Hermitian current operator and to satisfy the operator continuity equation.
- Globally (after integrating over all space) the delta integrates out and the anticommutator reduces to 2\hat{\mathbf{p}}_i, so ordering issues vanish: the integrated current equals the time derivative of the total dipole moment operator.
- Equivalently, expectation values and coarse-graining (which effectively average over space) also wash out these local ordering subtleties, which is why the classical/macroscopic statement \langle\hat{\mathbf{j}}\rangle = \partial_t\langle\hat{\mathbf{P}}\rangle holds.
If you’d like, I can (a) add the same derivation in component/index notation for full rigor, or (b) produce a short Quarto snippet showing the corresponding derivation for a system with identical masses (m) and charges (q) to be copy-pasted into your notes. Which would you prefer?
Summary
- Classically: \mathbf{j} = \dot{\mathbf{P}} + \nabla\cdot\mathbf{Q}, reducing to \mathbf{j} = \dot{\mathbf{P}} after coarse-graining.
- Quantum mechanically: \hat{\mathbf{j}} must be defined through the continuity equation with symmetrization, not as \dot{\hat{\mathbf{P}}}.
- At the level of expectation values, the simple relation re-emerges and matches the macroscopic classical result.
We can find a spatially integrated version of the relation and the Heisenberg derivation We work in the Heisenberg picture. The polarization operator in the Schrödinger representation is \hat{\mathbf P}(\mathbf r)=\sum_i e_i\,\hat{\mathbf r}_i\,\Lambda(\mathbf r-\hat{\mathbf r}_i), where \Lambda(\mathbf r-\hat{\mathbf r}_i) is a smoothed delta (or a localization kernel that reduces to \delta in the limit). In the Heisenberg picture \hat{\mathbf P}(\mathbf r,t)=e^{\,i\hat H t/\hbar}\,\hat{\mathbf P}(\mathbf r)\,e^{-i\hat H t/\hbar}. Hence \frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t} =\frac{i}{\hbar}\,[\hat H,\; \hat{\mathbf P}(\mathbf r,t)].
1. Start from the (local) current operator and integrate it against \mathbf A
The standard Hermitian current operator (for particle i) is \hat{\mathbf j}(\mathbf r)=\sum_i \frac{e_i}{2m_i}\,\big\{\hat{\mathbf p}_i,\;\Lambda(\mathbf r-\hat{\mathbf r}_i)\big\}, so the spatially integrated coupling to a (classical) vector potential \mathbf A(\mathbf r,t) is \begin{aligned} \int d\mathbf r\;\hat{\mathbf j}(\mathbf r)\!\cdot\!\mathbf A(\mathbf r,t) &= \sum_i \frac{e_i}{2m_i}\int d\mathbf r\; \big\{\hat{\mathbf p}_i,\;\Lambda(\mathbf r-\hat{\mathbf r}_i)\big\}\!\cdot\!\mathbf A(\mathbf r,t) \\ &= \sum_i \frac{e_i}{2m_i}\; \big\{\hat{\mathbf p}_i,\; \underbrace{\int d\mathbf r\;\Lambda(\mathbf r-\hat{\mathbf r}_i)\,\mathbf A(\mathbf r,t)}_{\,\mathbf A(\hat{\mathbf r}_i,t)\,}\big\}. \end{aligned}
In the last step the integral collapses the kernel \Lambda onto the operator position \hat{\mathbf r}_i, i.e. \int d\mathbf r\;\Lambda(\mathbf r-\hat{\mathbf r}_i)\,\mathbf A(\mathbf r,t)=\mathbf A(\hat{\mathbf r}_i,t), so \boxed{\qquad \int d\mathbf r\;\hat{\mathbf j}(\mathbf r)\!\cdot\!\mathbf A(\mathbf r,t) = \sum_i \frac{e_i}{2m_i}\,\big\{\hat{\mathbf p}_i,\;\mathbf A(\hat{\mathbf r}_i,t)\big\}. \qquad}
Thus the operator -\int d\mathbf r\,\hat{\mathbf j}\cdot\mathbf A appearing in minimal-coupling interactions equals the negative of the anticommutator form on the right above.
2. Compute the spatial integral of \partial_t\hat{\mathbf P}\cdot\mathbf A via Heisenberg equation
Using the Heisenberg equation, \frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t} = \frac{i}{\hbar}\,[\hat H,\; \hat{\mathbf P}(\mathbf r,t)]. Focus on the kinetic term in the Hamiltonian, \hat H_{\text{kin}}=\sum_i \frac{\hat{\mathbf p}_i^2}{2m_i}, since this gives the dominant contribution that produces the current-like terms. Evaluate the commutator piece (drop explicit time dependence on operators to keep notation concise):
\begin{aligned} \int d\mathbf r\;\frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t}\cdot\mathbf A(\mathbf r,t) &= \frac{i}{\hbar}\int d\mathbf r\; \big[\hat H_{\text{kin}},\;\hat{\mathbf P}(\mathbf r,t)\big]\!\cdot\!\mathbf A(\mathbf r,t) + \cdots \\ &= \frac{i}{\hbar}\sum_i \int d\mathbf r\; \bigg[\frac{\hat{\mathbf p}_i^2}{2m_i},\; e_i\,\hat{\mathbf r}_i\,\Lambda(\mathbf r-\hat{\mathbf r}_i,t)\bigg]\!\cdot\!\mathbf A(\mathbf r,t) + \cdots, \end{aligned} which is the operator form used in your equation (6.1.21) (we have pulled out charges and masses and written the commutator explicitly).
Write the \alpha component to match your notation: \frac{\partial \hat P_\alpha(\mathbf r,t)}{\partial t} = \frac{i}{\hbar}\sum_i \frac{e_i}{2m_i}\Big[\hat{\mathbf p}_i^2,\;\hat r_{i\alpha}\,\Lambda(\mathbf r-\hat{\mathbf r}_i)\Big] + \cdots, so (contracting with A_\alpha and integrating \mathbf r) reproduces the structure of (6.1.21): \boxed{\; \int d\mathbf r\;\frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t}\!\cdot\!\mathbf A(\mathbf r,t) = \frac{i}{\hbar}\sum_i \frac{e_i}{2m_i}\int d\mathbf r\; \Big[\hat{\mathbf p}_i^2,\; \hat{\mathbf r}_i\,\Lambda(\mathbf r-\hat{\mathbf r}_i)\Big]\!\cdot\!\mathbf A(\mathbf r,t) + \cdots \; }.
This is the integrated Heisenberg form that underlies your equation (6.1.21). (The omitted terms indicate contributions from potentials or explicit time dependence of \Lambda; they are handled in the same way and either vanish or produce contact pieces.)
3. Manipulate the commutator to recover the anticommutator/current form
Evaluate the commutator inside the integral using component indices and the basic identity [\hat p_{i\beta},\, f(\hat{\mathbf r}_i)] = -\,i\hbar\,\partial_{r_{i\beta}} f(\hat{\mathbf r}_i).
Compute \begin{aligned} \big[\hat{\mathbf p}_i^2,\; \hat r_{i\alpha}\,\Lambda(\mathbf r-\hat{\mathbf r}_i)\big] &= \hat p_{i\beta}\,[\hat p_{i\beta},\, \hat r_{i\alpha}\Lambda] + [\hat p_{i\beta},\, \hat r_{i\alpha}\Lambda]\,\hat p_{i\beta} \\ &= -i\hbar\left(\hat p_{i\beta}\,\partial_{r_{i\beta}}\big(\hat r_{i\alpha}\Lambda\big) + \partial_{r_{i\beta}}\big(\hat r_{i\alpha}\Lambda\big)\,\hat p_{i\beta}\right) \\ &= -i\hbar\; \big\{ \hat{\mathbf p}_i,\; \nabla_{r_i}\big(\hat r_{i\alpha}\Lambda(\mathbf r-\hat{\mathbf r}_i)\big) \big\}. \end{aligned}
Now insert this into the integrated expression and move the spatial derivative by parts (the derivative acts on the function of \mathbf r through the kernel \Lambda). After integrating over \mathbf r the gradient acting on \Lambda(\mathbf r-\hat{\mathbf r}_i) pulls the spatial dependence onto \mathbf A(\mathbf r,t) (integration by parts), generating terms of the form \{\hat{\mathbf p}_i,\;\mathbf A(\hat{\mathbf r}_i,t)\} (the boundary terms vanish for localized kernels or rapidly decaying fields). Concretely one finds
\int d\mathbf r\; \Big[\hat{\mathbf p}_i^2,\; \hat{\mathbf r}_i\,\Lambda(\mathbf r-\hat{\mathbf r}_i)\Big]\!\cdot\!\mathbf A(\mathbf r,t) = -\,i\hbar\; \int d\mathbf r\; \big\{\hat{\mathbf p}_i,\; \Lambda(\mathbf r-\hat{\mathbf r}_i)\big\}\!\cdot\!\mathbf A(\mathbf r,t),
and thus (using the collapse property as in section 1) \begin{aligned} \int d\mathbf r\;\frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t}\!\cdot\!\mathbf A(\mathbf r,t) &= \frac{i}{\hbar}\sum_i \frac{e_i}{2m_i}\left(-i\hbar\right)\int d\mathbf r\; \big\{\hat{\mathbf p}_i,\; \Lambda(\mathbf r-\hat{\mathbf r}_i)\big\}\!\cdot\!\mathbf A(\mathbf r,t) \\ &= \sum_i \frac{e_i}{2m_i}\int d\mathbf r\; \big\{\hat{\mathbf p}_i,\; \Lambda(\mathbf r-\hat{\mathbf r}_i)\big\}\!\cdot\!\mathbf A(\mathbf r,t) \\ &= \int d\mathbf r\;\hat{\mathbf j}(\mathbf r)\!\cdot\!\mathbf A(\mathbf r,t). \end{aligned}
Hence \boxed{\qquad \int d\mathbf r\;\frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t}\!\cdot\!\mathbf A(\mathbf r,t) = \int d\mathbf r\;\hat{\mathbf j}(\mathbf r)\!\cdot\!\mathbf A(\mathbf r,t). \qquad}
Moving the minus sign to match your equation (6.1.20) gives - \int d\mathbf r\;\hat{\mathbf j}(\mathbf r)\!\cdot\!\mathbf A(\mathbf r,t) = - \int d\mathbf r\;\frac{\partial \hat{\mathbf P}(\mathbf r,t)}{\partial t}\!\cdot\!\mathbf A(\mathbf r,t), which is exactly (6.1.20).
Remarks / interpretations
- The manipulations above are formal but standard: the commutator [\hat p^2,\; \hat r\,\Lambda] produces gradients of the localization kernel \Lambda, and integrating by parts transfers those gradients onto the test field \mathbf A(\mathbf r,t). Collapsing the kernel afterwards yields the anticommutator \{\hat p,\; \mathbf A(\hat r)\} (i.e. the integrated current).
- The key technical steps are (i) using [\hat p, f(\hat r)]=-i\hbar\nabla f(\hat r), (ii) integrating by parts in \mathbf r, and (iii) using the collapse \int d\mathbf r\,\Lambda(\mathbf r-\hat{\mathbf r}_i)\,\mathbf A(\mathbf r)=\mathbf A(\hat{\mathbf r}_i).
- This shows why the macroscopic-looking relation between \mathbf J and \partial_t\mathbf P does not hold as a local operator identity pointwise, but does hold after spatial integration (and, equivalently, for total dipole / integrated current). The ordering subtleties are eliminated by the spatial integral and by the integration-by-parts step that moves derivatives off operator-valued kernels onto the c-number field \mathbf A.