When There Are Too Many Experiments (Tests) To Do! Part 2: Orthogonal Arrays

The simple standalone C++ code to generate orthogonal arrays that goes with this post can be found at https://github.com/Atrix256/OrthogonalArray

The last blog post talked about a way to design a series of experiments with N combinations that let you do only N/2 of the experiments, or N/4, or fewer. It also had a formalized framework for understanding what data you could and could not take from the results. That post is at: https://blog.demofox.org/2023/10/17/fractional-factorial-experiment-design-when-there-are-too-many-experiments-to-do/

This post is also about reducing the number of experiments needed, but uses orthogonal arrays, and is based on work by Genichi Taguchi (https://en.wikipedia.org/wiki/Genichi_Taguchi).

Let’s motivate this with a tangible situation. Imagine we are trying to find the best brownie recipe and we have three binary (yes/no) parameters. The first parameter “A” determines whether or not to add an extra egg. A value of 1 in a test means to add the egg, a value of 0 means don’t add the egg. The second parameter “B” determines whether to double the butter. The third option “C” determines whether we put nots on top.

Here are the 8 combinations of options, which you may recognize as counting to 7 in binary.

A (extra egg)B (double butter)C (add nuts)
Test 0000
Test 1001
Test 2010
Test 3011
Test 4100
Test 5101
Test 6110
Test 7111

Now we want to reduce the number of experiments down to 4, because we only have 4 brownie pans and also don’t want to get too fat trying to figure this out. Here are the 4 experiments we are going to do, along with a “result” column where we had our friends come over and rate the brownies from 1-10, and we took the average for each version of the recipe.

ABCResult
Test 00008
Test 10116
Test 21019
Test 31107

At first glance the tests done seem arbitrary, and the results seem to not give us much information, but there is hidden structure here. Let’s highlight option A being false as bold, and true as being left alone.

ABCResult
Test 00008
Test 10116
Test 21019
Test 31107

If you notice, we are doing two tests for A being false, and in those two tests, we have a version where B is true, and a version where B is false. The same is true for C; it’s true in one test and false in the other. This essentially makes the effects of the B and C options cancel out in these two tests, if we average the results.

Looking at the two tests for when A is true, the same thing happens. One test has B being true, the other false. One test has C being true, the other false. Once again this makes the effects of B and C cancel out, if we average the results.

Averaging the results when A is false gives us 7. Averaging the results when A is true gives us 8. This seems to indicate that adding an extra egg gives a small boost to the tastiness of our brownies. In our tests, due to the effects of B and C canceling out, we can be more confident that we really are seeing the results of the primary effect “A” (Adding an extra egg), and not some secondary or tertiary effect.

More amazingly, we can highlight the tests where B is false and see that once again, in the two tests for B being false, A and C have both possible values represented, and cancel out again. Same for when B is true.

ABCResult
Test 00008
Test 10116
Test 21019
Test 31107

The same is true if we were to highlight where C is false. Every parameter is “isolated” from the others by having this property where the other parameters cancel out in the tests.

Let’s make a table showing the average scores for when A,B and C are false vs true.

NoYes
A (extra egg)78
B (double butter)6.58.5
C (nuts on top)7.57.5

From this table we can see that doubling the butter gives the biggest increase in score to our brownies. Adding another egg helps, but not as much. Lastly, people don’t seem to care whether there are nuts on top of the brownies or not.

It’s pretty neat that we were able to do this analysis and make these conclusions by only testing 4 out of the 8 possibilities, but we also have missing information still – what we called aliases in the last blog post. Software that deals with fractional factorial experiment design (there’s one called minitab) are able to show the aliasing structure of these “Taguchi Orthogonal Arrays”, but I wasn’t able to find how to calculate what the aliases are exactly. If someone figures that out, leave a comment!

I just spilled the beans, in that the table of tests to do is called an “Orthogonal Array”. More specifically it’s an orthogonal array of strength 2.

For an orthogonal array of strength 2, you can take any 2 columns, and find all possible combinations of the parameter values an equal number of times. Here is the testing table again, along with the values of “AB”, “AC”, and “BC” to show how they each have all possible values (00, 01, 10, 11) exactly once.

ABCABACBC
Test 0000000000
Test 1011010111
Test 2101101101
Test 3110111010

Orthogonal arrays give this isolation / cancellation property that allows you to do a reduced number of tests, and still make well founded conclusions from the results.

A higher strength orthogonal array means that a larger number of columns can be taken, and the same will be true. This table isn’t a strength 3 orthogonal array, because it is missing some values. The full factorial test on these three binary parameters would be a strength 3 orthogonal array, which gives better results, but is full factorial, so doesn’t save us any time.

Unlike the last blog post, parameters don’t have to be limited to binary (2 level) values with these arrays. We could turn parameter A (add an extra egg) into something that had three values: 0 = regular amount of eggs, 1 = an extra egg, 2 = two extra eggs.

The full factorial number of tests would then become 12 instead of 8:

A (3 level)B (2 level)C (2 level)
Test 0000
Test 1001
Test 2010
Test 3011
Test 4100
Test 5101
Test 6110
Test 7111
Test 8200
Test 9201
Test 10210
Test 11211

Unfortunately in this case, there is no strength 2 orthogonal array that is smaller than the full 12 tests. The reason for this is because an orthogonal array needs pairs of columns to have an equal number of each possible value, for that pairs of columns. The column pairs of AB and AC both have 6 values, while the column pair BC has 4 values. If we did only 4 tests, the AB and AC column pairs would be missing 2 out of the 6 possible values. If we did 6 tests, the column pair BC would have 2 extra values, which can also be seen as missing 2 values to make it have all values twice.

To formalize this, the minimum number of tests you can do is the least common multiple of the column pair value combinations. In this example again, AB and AC have 6, while BC has 4. The least common multiple between 6 and 4 is 12. So, you need to do a multiple of 12 experiments with this setup to make an orthogonal array. The full factorial array is size 12, and in fact is the orthogonal array that we are talking about.

If you add another 2 level factor (binary parameter), the minimum number of experiments stays at 12, but the full factorial becomes 24, so we are able to beat the full factorial in that case.

A (3 level)B (2 level)C (2 level)D (2 level)
Test 00001
Test 10010
Test 20101
Test 30110
Test 41001
Test 51010
Test 61100
Test 71111
Test 82000
Test 92011
Test 102100
Test 112111

Robustness Against Uncontrolled Variables

I’ve only read a little about Taguchi but what I have read has really impressed me. One thing that really stood out to me is nearly a social issue. Taguchi believed making precise results was the correct ultimate goal of a factory. He believed erratic and low quality results affected the consumer and the company, and demonstrated how both ultimately affected the company. From wikipedia:

Taguchi has made a very influential contribution to industrial statistics. Key elements of his quality philosophy include the following:

  1. Taguchi loss function, used to measure financial loss to society resulting from poor quality;
  2. The philosophy of off-line quality control, designing products and processes so that they are insensitive (“robust”) to parameters outside the design engineer’s control.
  3. Innovations in the statistical design of experiments, notably the use of an outer array for factors that are uncontrollable in real life, but are systematically varied in the experiment.

I’d like to highlight his work in making processes robust against uncontrolled variables.

What you do is run the tests when the uncontrolled variable is a certain way, then when it’s another way, you run the tests again. You might do this several times.

Then, for each combination of settings you do have control over, you take the standard deviation for when the uncontrolled variable varies. Whichever combination of controllable settings gives you the lowest standard deviation (square root of variance), means that setting of variables is least affected by the uncontrolled variable.

It may not be the BEST result, but it lets you have a CONSISTENT result, in uncontrollable conditions. Sometimes consistency is more important than getting the best score possible. FPS in a video game is an example of this. A smooth frame rate feels and appears a lot better perceptually than a faster, but erratic frame rate.

Note that you can use this technique with CONTROLLABLE variables too, to see if there are any variables that don’t have a strong impact on the result. You can use this knowledge to your advantage by removing it from future tests as a variable for example.

Here is another example of where that could be useful. Imagine you were making a video game, and you provided the end user with a set of graphics options to help them control the game’s performance on their machine. If you run tests of different performance, for different graphics settings, on different machines, you might find that certain settings don’t help performance much AT ALL for your specific game. If this is the case, you can remove the graphics setting from the UI, and set it to a constant value. Simplifying the settings helps you the developer, by having fewer code paths to maintain, it helps QA by having fewer code paths to test, and it helps the user, by only presenting meaningful graphics options.

An Algorithm For Orthogonal Array Generation

I couldn’t find a singular orthogonal array generation algorithm on the net. From what I read, there are different ways of constructing different types of orthogonal arrays, but no generalized one. However, it’s easy enough to craft a “greedy algorithm with backtracking” algorithm, and I did just that. The source code can be found at https://github.com/Atrix256/OrthogonalArray.

Let’s think of the orthogonal array as a matrix where each row is an experiment, and each column lists the values for a parameter across all of the experiments. The goal we are after is to have each column pair have each possible value show up once.

Assuming that all columns have the same number of possible values for a moment (like, they are all binary and can either have a 0 or 1), we can see that when filling up any single column pair with options, we will also fill up all of the other column pairs. As a concrete example, if we have 3 binary columns A,B,C, we know that by filling AB with all the values 00, 01, 10, 11 by choosing experiments from the “full factorial list”, that there will also be four values in the columns AC and BC. So, we only need to find 4 rows (specific experiments) in that case. We can just fill up the first column pair, and the other column pairs will get filled up too.

We can do this by choosing any experiment that has 00 in AB, then 01, 10, and finally 11.

However, the values that fill up the other columns may not end up in the full set of values, and there may be repeats, even if we disallow repeats of specific experiments.. Column AC may end up being 00, 00, 11, 10 for instance which has 00 showing up twice, and 01 doesn’t show up at all.

So, when we choose an experiment (row) to use that matches the value we want from the AB column, we also have to make sure it isn’t a duplicate value in any of the other columns.

Doing this, we may end up in a spot where we have values left to fill in, but none of our options are valid. At this point, we take a step back, remove the last row we added, and try a new one. This is the backtracking. We may have to backtrack several steps before we find the right solution. We may also exhaustively try all solutions and find no solution at all!

Generalizing this further, our columns may not all have the same number of possible values – like in our examples where we mixed 3 level and 2 level variables. As I mentioned before, some column pairs will have 6 possible values, while others will have 4 possible values. The only way you can make an orthogonal array in that situation is by making a number of rows that are a multiple of the least common multiple of 6 and 4, which is 12. The 6 value column pairs will have all values twice, and the 4 value column pairs will have each value three times.

Lastly, there are situations where no solution can be found by repeating the values this minimal number of times, but if you do twice as many (or more), a solution can be found, and still be smaller than the full factorial set of experiments. An example of this is if you have five 3 level factors. You cant make an orthogonal array that is only 9 rows long, but you can make one that is 18 rows long. That is still a lot smaller than the full factorial number of experiments which is 3^5 or 243 experiments.

Closing and Links

From the number of reads of the previous blog post, this seems to be a topic that really resonates with people. Who doesn’t want to do less work and still get great results? (Tangent: That is also the main value of blue noise in real time rendering!)

In these two posts, we’ve only scratched the surface. Apparently there is still researching coming out about this family of techniques – doing a smaller number of tests and gleaming the most relevant data you can from them.

I think a faster algorithm than I described could be created using Algorithm X / Dancing Links (https://blog.demofox.org/2022/10/30/rapidly-solving-sudoku-n-queens-pentomino-placement-and-more-with-knuths-algorithm-x-and-dancing-links/).

The wikipedia article links orthogonal arrays to latin squares, mutually orthogonal latin squares, sudoku and more (https://en.wikipedia.org/wiki/Orthogonal_array#Latin_squares,_Latin_cubes_and_Latin_hypercubes). Interestingly, that also links it to “Interleaved Gradient Noise” in a way, which is an imprecise solution to a generalized sudoku that has too many constraints to exactly solve (https://blog.demofox.org/2022/01/01/interleaved-gradient-noise-a-different-kind-of-low-discrepancy-sequence/).

When starting on this journey, I thought it must surely have applications to sampling in monte carlo integration, and indeed it does! There is a paper from 2019 i saw, but didn’t understand at the time, which uses orthogonal arrays for sampling (https://cs.dartmouth.edu/~wjarosz/publications/jarosz19orthogonal.html).

The rabbit hole has become quite deep, so I think I’m going to stop here for now. If you have more to contribute here, please leave a comment! Thanks for reading!


Leave a comment