I rewrote the pruning algorithm yesterday to use a method that’s slightly less accurate, but much quicker. After an exhaustive test that I ran all night on a data set of 356 sales/fails from my main business, it ended up giving a correct judgement 72.66% of the time.
Let’s say you get 200 leads a week, but only have time to follow up on 100 of them before the next week starts and the next 200 come in.
Let’s say your usual sale rate is 50%.
That means that every week, if you follow the leads one by one, you will make 50 sales.
But what if you could order them so that the more likely sales were first?
Because the SalePredict engine has a 73% accuracy, that means you will make an average of 73 sales a week.
That’s a 50% increase in sales.
Two days ago, I wrote a pruning algorithm that was able to take a network with hundreds of inputs, and gradually prune it down to just the inputs that made the biggest difference in success.
However, the algorithm took about 6 hours to run against 350 inputs.
This is because for every prune, it would train the network using a full set of inputs minus the one being tested for. That’s 350 neural net trainings for the first round, 349 for the next, 348 for the next. It was a lot!
I realised I could short-cut this by making an assumption – that the most useless input of any particular network was the one that had the lowest standard deviation in its connection weights to other neurons.
This meant that instead of (350+1)*(350/2)-(18+1)*(18/2)= 61254 network trainings, I just needed to to 350-18= 332. That’s 175 times quicker. And obviously even quicker the more inputs that need to be pruned.
Tonight, I’ll run the test again – I think I can get it even more accurate.
At the moment, the test assumes that if a website has a calculated value of more than 50%, it is a sale, and if lower than 50%, it is a fail. That’s not right, though – if it’s around 50%, then we should really say that there is simply not enough information available to state either way.
So I’ll run the test again, but this time record the calculated values of each website in a spreadsheet, so I can find out what “band” of values produce the best accuracy in the test.
I mean, of the websites that the system says are >90% a sale, exactly how accurate are those ones, compared to the websites the system says are 80-90%?
I don’t know, and I want to know.