How to Turn a Horrible Spreadsheet Into Clean Calculation Code
Loïc Gelle9 min read
Implementing complex calculations in a project can be quite tricky, especially when your only calculation model is a gigantic spreadsheet full of horrible formulas. The aim of this article is to guide you through the construction of a clean and functional calculation code starting from such a spreadsheet, and to show you some tips to handle calculations without any pain.
To keep it simple, our working language in this article will be Javascript, but we will design abstract patterns that can be adapted to any language.
Let’s get started!
Basic Structure and Code Pattern
Let’s describe the example spreadsheet. It aims at forecasting the month-by-month evolution of a set of parameters - temperature, species distribution, air quality,… - in an ecological niche depending on given the initial state of the environment and some calculation parameters. This evolution can be calculated according to a model that describes the interaction between the different species and between the species and their environment.
Basically, we can identify two areas in the spreadsheet: an area where we can specify the calculation parameters, and the resulting data table that has a fixed number of columns to compute - the data that we want to compute - and a number of lines that depends on the forecast duration. The general structure is quite simple in this case: we will be doing a line-by-line computation, where the values in a line only depend on the past, thus on the previously computed lines.
We could be tempted to imitate the spreadsheet’s apparent data structure by putting the result values in a matrix. I strongly advise you against doing that, as you will end up with an unreadable code full of myMatrix[i][j-1]
that will be horrible to maintain. Our setup will be:
- an object for the input parameters;
- a table of objects for the results, each line corresponding to a calculation row.
This allows you to label the object’s values to clarify your code. Here is an example of parameters object:
var calculationParams = {
forecastDurationInYears: 8,
foodResources: {
percentageForA: 10,
percentageForB: 13
},
predation: {
rateAB: 0.1,
rateBA: 0,
considerAB: true,
considerBA: false
},
...
initialState: {
month: 0,
temperature: 28,
speciesA: {
population: 10000,
...
},
speciesB: {
population: 700,
...
},
...
}
};
You can provide new lines to your results table using a function:
function initLine() {
const emptyLine = {
month: 0,
temperature: 0,
speciesA: {
population: 0,
...
},
speciesB: {
population: 0,
...
}
};
return emptyLine;
}
Now the calculations structurally boil down to adding and filling new lines to your results table. Let’s write the main calculation function:
function computeResults(calculationParams) {
const monthsToCompute = calculationParams.forecastDurationInYears * 12;
var resultsTable = [];
resultsTable.push(calculationParams.initialState);
for(var i=1; i<monthsToCompute; i++) {
computeLine(calculationParams, resultsTable, i);
}
return resultsTable;
}
and the line calculator:
function computeLine(calculationParams, resultsTable, index) {
var newLine = initLine();
resultsTable.push(newLine);
computePopulationEvolutionA(calculationParams, resultsTable, index);
computePopulationEvolutionB(calculationParams, resultsTable, index);
// and so on!
}
What we just did looks like nothing, but the pattern we created allows you to handle the formulas and the calculation logic separately. You can now write the core calculation functions to compute the columns.
Writing the Formulas
What I suggest is to take a few minutes to analyse the formulas in your spreadsheet and to write them down in a documentation file in understandable terms. This extra step gives you a file to keep track of the formulas - essential if you want someone else to understand your code without the spreadsheet - and it accelerates development. Indeed, there is nothing more annoying than starting to code a feature without being 100% sure of what it will look like down to the last detail. To convince yourself of the importance of this step, try to imagine what inspires you the most between
=IF($B$4;IF($E50>$B$11;($E$2-$G$3*SUMIFS($H$10:$H49;$A$10:$A49;">$A50-36"))*$F50;0);IF($F50>0;($I$5-$I$6*(SUMIFS($H$10:$H49;$X50;true)-SUM($I$10:$I49)));0))
and
IF take predation A -> B into account
IF population A > predation threshold
populationEvolutionB = (reproductionRateA - predationRateAB * (sum on all the births for species B on the past 3 years)) * population B
ELSE
populationEvolutionB = 0
ELSE
IF population B > 0
populationEvolutionB = (birth rate B - death rate B) * ((sum on all the births of B in viable past months) - (sum on all the deaths of B over the past months))
ELSE
populationEvolutionB = 0
When coding, the second option is way better! When turning this formula into a function it gives us, without trying to refactor at first:
function computePopulationEvolutionB(calculationParams, resultsTable, index) {
var line = resultsTable[index];
line.speciesB.evolution = 0;
var sumBirths3Years = 0;
var sumBirthsViable = 0;
var sumDeaths = 0;
for (var i=0; i < index; i++) { var loopLine = resultsTable[i]; if (loopLine.month >= line.month - 36)
sumBirths3Years += loopLine.speciesB.births;
if (loopline.speciesB.viableBirth)
sumBirthsViable += loopLine.speciesB.births;
sumDeaths += loopLine.speciesB.deaths;
}
if (calculationParams.predation.considerAB)
if (line.speciesA.population > calculationParams.predation.threshold)
line.speciesB.evolution = line.speciesB.population * (calculationParams.reproduction.rateA - calculationParams.predation.rateAB * sumBirths3Years);
else
if (line.speciesB.population > 0)
line.speciesB.evolution = (line.speciesB.birthRate - line.speciesB.deathRate) * (sumBirthsViable - sumDeaths);
}
which is not so obvious when you take a look at the spreadsheet formula…
Avoiding Redundant Calculations
We drew the most general code pattern for this kind of line-by-line calculations. Of course, the calculations themselves can be optimized, but this is specific to your formulas. If you look carefully at my previous implementation of computePopulationEvolutionB
, you can notice that the calculation of the sums inside the function is completely inefficient, as I will need to recompute the sum every time I add a line to my results table.
An optimization workaround could be to create a helper object to keep track of the intermediate results:
function computeResults(calculationParams) {
const monthsToCompute = calculationParams.forecastDurationInYears * 12;
var resultsTable = [];
resultsTable.push(calculationParams.initialState);
var calculationHelper = {
sumBirthsViableB: 0,
sumDeathsB: 0,
sumBirths3YearsB: 0
};
for(var i=0; i<monthsToCompute; i++) {
computeLine(calculationParams, resultsTable, calculationHelper, i);
}
return resultsTable;
}
function computeLine(calculationParams, resultsTable, calculationHelper, index) {
var newLine = initLine();
resultsTable.push(newLine);
computePopulationEvolutionA(calculationParams, resultsTable, index);
computePopulationEvolutionB(calculationParams, resultsTable, index);
// and so on!
updateCalculationHelper(calculationParams, resultsTable, calculationHelper, index);
}
and to update them accordingly using a new function. You end up with a clearer and more optimized code:
function computePopulationEvolutionB(calculationParams, resultsTable, calculationHelper, index) {
var line = resultsTable[index];
line.firstColumn = 0;
if (calculationParams.predation.considerAB)
if (line.speciesA.population > calculationParams.predation.threshold)
line.speciesB.evolution = line.speciesB.population * (calculationParams.reproduction.rateA - calculationParams.predation.rateAB * calculationHelper.sumBirths3YearsB);
else
if (line.speciesB.population > 0)
line.speciesB.evolution = (line.speciesB.birthRate - line.speciesB.deathRate) * (calculationHelper.sumBirthsViableB - calculationHelper.sumDeathsB);
}
Handling Calculations Order
There is one last important thing I didn’t adress in the previous pattern. How do I know in what order I am supposed to compute the columns?
Indeed, the value A in a line can require some values B and C on the same line to be calculated first, but B may need a value D that is also a prerequisite to compute A… Argh! You could choose to draw a dependency graph out of the spreadsheet, but it is easy to make a mistake and this solution is not sustainable, for the formulas can evolve. I’ve even seen spreadsheets in which the order in which to compute the values depends on the line!
So how can you handle such a mess like spreadsheet softwares do?
One solution is to promisify your calculation functions. I will not give more details about this solution which is too specific to Javascript and which makes your code an immediate mess in this case.
I propose to choose a custom-made dependency resolver. Here is what we will do: we will store each computation function in an object, along with a state - computed or to compute and a depencency table that indicated the needed values before the calculation can be performed. In our case, it looks like
function returnComputeFunctions() {
const computeFunctions = {
evolutionB: {
compute: (calculationParams, resultsTable, calculationHelper, index) => {
var line = resultsTable[index];
line.firstColumn = 0;
if (calculationParams.predation.considerAB)
if (line.speciesA.population > calculationParams.predation.threshold)
line.speciesB.evolution = line.speciesB.population * (calculationParams.reproduction.rateA - calculationParams.predation.rateAB * calculationHelper.sumBirths3YearsB);
else
if (line.speciesB.population > 0)
line.speciesB.evolution = (line.speciesB.birthRate - line.speciesB.deathRate) * (calculationHelper.sumBirthsViableB - calculationHelper.sumDeathsB);
},
computed: false,
getDependencies: (calculationParams) => {
if (calculationParams.predation.considerAB)
return [computeFunctions.populationB, computeFunctions.birthsB];
else
return [computeFunctions.birthViableB, computeFunctions.deathsB];
}
},
populationB: {
...
},
...
};
}
Now let’s write a dependency resolver function:
function resolveCalcDependencies(valueToCompute, calculationParams, resultsTable, calculationHelper, index) {
if (!valueToCompute.computed) {
valueToCompute.getDependencies(calculationParams).forEach((dependencyValue) => {
resolveCalcDependencies(dependencyValue, paramsWrapper);
});
valueToCompute.compute(calculationParams, resultsTable, calculationHelper, index);
valueToCompute.computed = true;
}
}
As we are doing the calculations recursively, we just need to launch the process by telling the dependency resolver to compute the interesting values.
function computeLine(calculationParams, resultsTable, calculationHelper, index) {
var newLine = initLine();
resultsTable.push(newLine);
var computeFunctions = returnComputeFunctions();
resolveCalcDependencies(computeFunctions.evolutionB);
updateCalculationHelper(calculationParams, resultsTable, calculationHelper, index);
}
The resolver will pull all the dependencies step by step! Amazing, isn’t it? The computed param is here to make sure that the function terminates, and that each computation is only performed once.
Going further: some tips
At this point, your calculations should be working and you may want to weep with joy. But now is also the time of refactoring and testing to make your code prettier and more sustainable! Here is what you can do:
- Make your code modular. You may want to separate the value assignment from the formulas in different functions: it makes your functions shorter and testable.
- Refactor. You don’t want all your functions to have the same four parameters all along the code. Wrap these four parameters into a unique variable that you can pass to all the functions.
- Test. If you made your code modular by creating functions specific to the formulas, you can easily make unit tests on them to make sure that the returned values are correct. A tested code is a rock-solid code.
Conclusion
Some points to finish with:
- This article does not claim to be an absolute guide to how to run calculations in a project, but it shows you global patterns and a way to reach them. Do not hesitate to adapt them to your project!
- Cut your work into smaller pieces! Your first goal should be to display an empty results table with the index filled - it allows you to draw and code the general pattern -, then you can proceed column by column. The optimization steps can be included in future iterations.
- If the spreadsheet is way too complex or if you have any doubt, don’t hesitate to contact your client or the spreadsheet’s author. A 30-minute long workshop on the spreadsheet will help you develop your calculation feature faster.