Before there was “data science”, as most of us will recall, there was simply “data”. It was straightforward.
The blossoming discipline of data science has emerged. Here are ten phrases that surround the hot new world of data science.
1. Do we have an alpha?
Finding an “alpha” mostly refers to finding a formula for anticipating the ups and down of some stocks, bonds or currencies. An alpha is built around sets of data and selected algorithms.
The purpose of identifying this formula is to attempt to buy or sell at the most optimal time. The advantages are most often temporary, and the financial markets usually adjust to them.
Obtaining such an advantage can apply to many other fields such as spotting the best customers, the best executives to contact or the best timing to contact them (for example, following a recent promotion) for sales or other purposes.
2. Should we cluster?
When observing thousands of data points, grouping these data points by similarities in sets ranging usually from ten to fifty is called “clustering”.
Interpreting and leveraging the data can then become much easier. Algorithms are instrumental at optimizing similarities by minimizing the distances between data points.
Understanding the similarities in within 75,000 companies is a Herculean task. Using an algorithm to unveil the 10-12 most typical corporate organizations is bringing many insights.
3. How fast will the algorithm learn?
Algorithms are trained on an initial set of data. As both more data and more feedback are collected, the algorithm is recalculated periodically to provide better and better results.
Some algorithms become excellent after a few loops; still others can progress after thousands of loops as they solve more and more cases.
4. How many layers do we have?
Neural networks usually have less than ten layers. With only three layers, one can often mimic very complex sets of rules that would be impossible to otherwise formulate.
Some neural networks now have over 100 layers showing how computational these networks have become.
5. Have you found some good libraries?
Many advanced algorithms are openly available. Many sets of structured maps and dictionaries, for instance, are available. Knowing these libraries is a fantastic way to benefit from debugged, tested, verified codes and data sets.
Data science has become about assembling and gluing the right blocks. Selecting the best blocks is an art, and a key step toward success.
6. Do you speak Python?
The most common programming languages include Python, R, Lisp, Prolog and Java. Each of them has its “raison d’être”.
Yet, Python has become the most universal in recent years, and its developments and updates will most likely make it much easier over time.
7. What are the options to vectorize?
Algorithms are great at handling big vectors with 100s of figures and large matrixes. As such, the way a large practical problem is going to be morphed mathematically is through a set of “vectors”.
There are many ways to transform a picture, text, traffic or directory in a set of figures that subsequently can be handled and transformed technically.
Finding one that works is sometimes a daunting task but is the prerequisite to ultimate success.
8. Can the algorithm be interpreted?
Algorithms are sometimes so complex that they appear something like a magic black box.
We love them because they work well but also because we have not yet encountered cases in which they fail.
As a decision-maker, you may want to understand the principles on which the algorithms are working make sense and are safe. Encouragingly, this is a growing trend in data science.
9. What is the expected false positive rate?
When deciding to hire an applicant, the decision confronted is a simple yes or no.
But the challenge is to understand the “true positive” (you hire and were right to hire), the “false positive” (you hire but were wrong to hire) the “true negative” (you were right not to hire), the “false negative” (you did not hire and missed a great applicant).
Algorithms can be fine-tuned to identify and favor the most desirable compromises that confront business decisions.
10. Do we generate weird results?
If your algorithm is to optimize water, heat, light and fertilizer to grow tomatoes until they are gorgeously red, that is a great algorithm for growing red tomatoes.
But it is unlikely to be helpful if you are growing green beans to become red.
Algorithms excel at dedicated tasks. But human judgment ultimately is irreplaceable.