If you want to measure pulling power, I would think you should use a fish scale type of apparatus.
I think there should be some additional "rules":
The same "fish scale" should be used for all tests
The same type of tangent track should be used for all tests..ie brand of track. A track with flat top rails would be best, with no rail joints.
The rail should be on a flat/level surface.
The transformer or power supply should have the power to perform this test.
The "stop point" for the test should be the final, steady state, volts and amps that the loco can reach without slipping.
Each loco should be weighed. For steam locos, engines and tenders should be weighed separately.
Locos with traction tires should be tested separately from locos with no traction tires. Traction tires should be new and not contaminated. (Sounds like "new" or unrun locos only?)
If it is easy to determine the gear ratio/final drive ratio of each loco, it should be recorded.
Each loco tested should be lubricated (all axles and pickups), and ideally have approximately the same number of break in hours/miles.
The track voltage should be measured. Some loco motors will work best with more (or less) voltage than others, so both volts and amps should be recorded with digital meters.
All parasitic loads like smoke and sound should be turned off. How about non LED headlights?
Steam locos should be tested with tender, since steam locos with no tender (unless tank engines or fireless) never operated without a tender.
Diesel testing should be limited to power units only. So if an "A" unit as part of a set was not powered, it should be removed for the test, OR, two powered "A" units should be used for the test.
A test run in this manner might be destructive, so suggest that magazine reviewers do this type of work!