This should make the build-time tests a bit more robust, by using the -noreset option to avoid a race condition (see #981201).
With these changes, the flaky/known-failing tests are no longer installed as installed-tests at all, so remove them from the autopkgtest metadata.
This will let us distinguish between "fails by small differences caused by rounding/i387" and "completely different result", without having to move the whole build system to Meson, which seems like one variation too many during a transition.