So, you want to build some fancy ML software. Or maybe you are planning the next unicorn, built on a foundation of ML.
Before you begin, familiarize yourself with the core realities of building and utilizing ML.
Machine learning software is extremely expensive to build.
All machine learning systems produce error. Often, it’s more error than you would expect.
A single ML model must be built and re-built dozens of times during initial development, while tuning various options in order to produce the right output.
Results may change at any time due to changes in external data and external dependencies (such as cloud services or ML APIs) that will require a new model, new feature extractors, and a newly-tuned system.
Required resources are specialized and expensive. ML components must be maintained by ML software engineers with years or decades of experience, at a cost ~50% higher than non-ML software engineers.
There is a better, faster and more efficient way to build ML products and systems that navigates the realities above.
These are the 4 central tenets of LeanML that are an extension to the Lean Startup methodology. If you’re not familiar with Lean Startup, we’d recommend starting there first.
All Lean Startups know that you figure out the user need first, then select the technology required. But with ML, it goes a step further: the human team must be able to conduct the entire ML task by hand, before the ML is built. If you, the human, can’t output what you’re asking an ML system to do, you’re not going to be successful.
There’s error in any ML system. In fact, there’s often more error than you would expect. Your choice of ML solution will be driven by the balance between different types of error produced by different technologies, and your user’s tolerance for error given the product use case. How much error will the user tolerate? What type of errors will the user tolerate?
Let us repeat -- all ML systems have error! But, that’s okay, and that’s because you can plan for it. After determining user tolerance for error, your process or product perspective should include design to handle the error. For example, do you need two different ML systems, one to generate and another to rank? Do you need a human verification step into the process before showing results?
Worldwide communities of thousands of computer scientists spend their entire careers developing machine learning algorithms, studying how they behave on different types of data, and producing open-source libraries for fast and accurate implementation. In addition, commercial platforms are available that wrap these libraries into super-easy user interfaces, complete with a few models to match your data. Your ML task isn’t unique: there is an entire background of previous research that will guide you in how you should and shouldn’t build your solution, and save you enormous amounts of time and resources.
There’s a wealth of resources available to help make your ML product or company a success. Geared towards the non-technical audience, they’ll help you think about your product in a brand new light.