We are implementing FanPT. The basic idea is in the recent paper.
Recall that the FanCI equations are
$$
\braket{\mathbf{m}|\hat{H}|\Psi(\mathbf{c})} = E \braket{\mathbf{m}|\Psi(\mathbf{c})} \qquad \qquad \forall \mathbf{m} \in \mathcal{P}
$$
where $\mathcal{P}$ is the projection space (user-specified), $\mathbf{c}$ are the wavefunction parameters, and the nonlinear FanCI equations are usually overdetermined and solved in a least-squares sense. The set of determinants $\mathbf{n} \in \mathcal{S}$ is defined as the space of determinants that are linked to determinants in $\mathcal{P}$ by the Hamiltonian, typically just single- and double-replacements of the occupied orbitals.
We write the Hamiltonian as a zeroth-order part (for which the FANCI equations have been solved) and a first-order part,
$$
\hat{H}(\lambda) = \hat{H}(\lambda_0) + (\lambda - \lambda_0) \hat{H}'(\lambda_0) + \tfrac{1}{2} (\lambda - \lambda_0)^2 \hat{H}''(\lambda_0) +\cdots
$$
Going through some math, we get the equation
$$
\sum_{\mathbf{n} \in \mathcal{S}} f_\mathbf{n}(\lambda_0,\mathbf{c}) E'(\lambda_0) + \sum_{\mathbf{n} \in \mathcal{S}} \left(\braket{\mathbf{m}|\hat{H}(\lambda_0)|\mathbf{n}} - E(\lambda_0) \delta_{\mathbf{m} \mathbf{n}}\right) \frac{d f_{\mathbf{n}}(\mathbf{c})}{d \mathbf{c}(\lambda_0)} \cdot \mathbf{c}'(\lambda_0) = \sum_{\mathbf{n} \in \mathcal{S}} \braket{\mathbf{m}|\hat{H}'(\lambda_0) |\mathbf{n}} f_\mathbf{n}(\lambda_0,\mathbf{c})
$$
The key ingredients are the Hamiltonian matrix elements. The right hand side is easy to evaluate in terms of transition-2DMs, which appear on the right-hand-side,
$$
\text{RHS} = \text{Tr}[\hat{H}'(\lambda_{0}){pqrs} \Gamma{pqrs}(\lambda_{0})]
$$
My tendency would be to implement this as some functions that build the linear equations to solve, and then solve them. I realize that the (old) implementation is somewhat more general, but we may also want to focus on a specific one or two flavors if it makes the code a lot simpler/faster. Ramon, what do you think the best flavors are? (It will depend on how much computational complexity and code complexity we add by writing a general treatment.)