Serious testing takes time and energy.
A-B-A testing is the ONLY method of properly quantifying the possible gains or losses of a "fuel-saving" (or power producing) device.
The absolute best method is usually not available to us. This would consist of a dyno, a rolling road, and a test lab with absolute controlled environment. This would remove most of the variables that on-road tests are generally prone to.
On-road tests with some form of data collection (ie - ScanGauge, SuperMID, data acquisition computer/software) are always at the mercy of the elements. Temperature, humidity, wind, and (most of all) the driver are variables that cannot be determined as to their affects on a given test.
However, a calm, warm day and a relatively flat road (or predetermined "track") will provide some data to at least make
educated assumptions on. At the very least it puts theories in the "plausible" category.
For sake of reference I will refer to the chosen roadway as a "track".
"A" will refer to the test vehicle in "stock" or "current" (if already modified) condition.
"B" will refer to the test vehicle with the "device" or "additive" installed.
Weather and track conditions must be recorded before each run. For optimum results the test vehicle should be warmed to operating temperture and then driven on the track numerous times in its "A" condition to establish a baseline and/or margin of error between runs. I personally feel that multiple runs are necessary since environment and driving style could cause fluctuations in fuel economy during each run.
Example: Test vehicle in "A" condition
Run1: 42.5 mpg
Run2: 43.3 mpg
Run3: 42.1 mpg
Run4: 42.9 mpg
This establishes some form of baseline for what the test vehicle is capable of on this particular day at this particular time. The margin of error is 1.2 mpg between the best and worst runs. Now, let's attach the device/additive in question and do more runs (it may even be best to drive the test vehicle normally for a set amount of time before performing each set of runs).
Example: Test vehicle in "B" condition
Run1: 43.5 mpg
Run2: 42.8 mpg
Run3: 43.1 mpg
Run4: 42.5 mpg
In this particular example I would personally consider this "device" as busted. All of the numbers easily fall within the previously established margin of error and I would feel no particular need to continue testing. However, what if the results had been like this:
Example: Test vehicle in "B" condition
Run1: 45.5 mpg
Run2: 45.7 mpg
Run3: 44.9 mpg
Run4: 46.1 mpg
In this case every run is higher than the car in "A" condition. As a plus, it is even higher than our 1.2 mpg difference we previously calculated. At this point, the test cannot be considered conclusive at all. First, the test vehicle must be returned to its "A" condition and the runs must be repeated to make certain it returns to figures close to the original baseline. If it does not, then something changed the test conditions and the results must be scrapped.
If it does return to its baseline, the vehicle must now be placed back in "B" condition and the runs repeated once more. This is called "repeatability". If the numbers once again improve we know that the device is "working".
However, this still is not an absolute test. For one, only one vehicle is being tested. For another, the test should be performed again on a different day. Different vehicles and testing on different days would ensure that the device is indeed causing an improvement in mileage. If it really does work we should see slight to obvious results every time the device is added regardless of vehicle or environment.
Sorry for being so long and wordy with this. Maybe I should have put it in the "articles" section...