IBM：有效AI的5个属性

来源：数据观时间：2018-04-26 17:29:21 作者：Dinesh Nirmal（IBM分析开发部门的副总裁）

几周前，一个垂头丧气的首席技术官告诉我，他的团队花了三个礼拜的时间才建立起一个机器学习的模型，我告诉他才三周就建起一个模型听起来相当不错了，他表示同意。那么，为什么要拉长了脸满是沮丧呢？因为11个月后，这个模型就会被束之高阁。

随着人工智能和机器学习与现实世界接触，优秀人工智能雏形和在运人工智能之间的鸿沟开始成为一个共同主题。原因是……实际上有很多因素，我们可以任中择取一些来看，但是在其他所有原因下，最本质的问题还是数据变如潮涌且奔流不息这个事实。

世界日新月异，数据瞬息万变。建立人工智能或机器学习模型意味着建立一种看待世界的方式。但随着世界和数据的变化，模型也需要适应。我认识的首席技术官开始意识到建立一个优异的模型也仅仅只是跨出了第一步。

IBM位于曼哈顿的沃森总部

一个模型本身对于现实世界来说太脆弱了，它需要作为一个更庞大的系统并且保证流畅。那么我们如何使人工智能系统的流畅呢?——通过在头脑中构建五个属性：

1.控管

对于人工智能和机器学习来说，要做实际而持久的工作，他们需要周到、持续和透明的基础设施，这始于识别数据管道，纠正不良数据或丢失数据的问题。它还意味着对模型进行集成数据治理和版本控制，每个模型的版本——你也可以同时使用数以千计的模型——显示其输入。你会想知道，监管者也会想知道。

2.弹性

流体性意味着从一开始就能接受人工智能模型的不同步。这种“漂移”可以根据现实世界的变化时而迅速或缓慢地发生，经常进行数据科学回归测试，但不会耗费你的时间。

这需要一个系统来设置准确的阈值和自动警报，让你知道什么时候需要注意模型。你是否要在旧的数据上重训模型，获取新的数据，又或者从头对特性进行重组？答案取决于数据和模型，但第一步是知道问题出在哪里。

3.竞争力

大多数人工智能在计算上都非常紧张——无论是在训练期间还是在部署之后。大多数模型需要以毫秒为单位，而不是分钟来进行交易，以防止欺诈或投机倒把。理想情况下，你可以在GPU上训练模型，然后将它们部署到高性能的CPU上，并提供足够的内存来进行实时评分。

当然，不管部署在何处，on-prem、云计算或multi-cloud，你都希望一切能准确无误地快速运行。

4.可测性

目前，对于人工智能和机器学习项目的预算还算慷慨，但如果数据科学团队迟迟不能提供具体结果，这些预算也将缺口。从一开始就考虑如何量化和形象化你正在学习的东西以及变化过程，这改进了数存取和数据卷，提高了模型的精确度，并最终提升了底线。

当你的数据科学工作已日趋成熟，你不仅要考虑现在需要权衡什么，还要考虑将来的权衡问题，系统是足够“流动”以便跟进这些长期目标?

5.持续性

一开始我就指出了数据并不是静止的。流体人工智能的第五个也是最后一个方面是随着世界的变化不断地进行学习。一定要使用像Jupyter和Zeppelin这样的工具，这些工具可以接入到进程的调度评估和再培模型中。

同时，当你从各种算法、语言，数据集和工具中汲取优缺势，你也期待自己的学习不断地成长和发展。流体人工智能要求对数据、工具和系统进行持续改进，但也需要每个人在工作上不断改进。

数据科学是一段旅程。俗套但真实。注意以上这五个特质，你将聚焦于每一个时刻，迫使自己去发现未来的清晰脉络。

注：本文由数据观编译自VentureBeat网站，作者/Dinesh Nirmal，译者/黄玉叶，图片来源于原文配图。转载请务必注明来源、出处及作者等信息。

以下为原文：

IBM outlines the 5 attributes of useful AI

A few weeks ago, a dejected CTO told me it took his team three weeks to build a machine learning model. I told him a model in just three weeks sounded great, and he agreed. So why the long face? Because 11 months later, the model was still sitting on a shelf.

That gap between great AI prototypes and AI in operation is starting to be a common theme as AI and machine learning make contact with the real world. The reason is … Actually, there are a lot of reasons, and we can look at a bunch of them, but underneath all the other reasons is the fact that data doesn’t sit still and never will.

Data changes as the world changes. Building an AI or machine learning model means building a way of looking at the world. But as the world and the data change, the models need to adapt. The CTO I met was realizing that building a great model is only the first step.

A model on its own is too brittle for the real world. It needs to live as a larger system that’s actually fluid. So how do we make AI systems that are fluid? By building them with five attributes in mind:

1. Managed

For AI and machine learning to do real and lasting work, they need thoughtful, durable, and transparent infrastructure. That starts with identifying the data pipelines and correcting issues with bad or missing data. It also means integrated data governance and version control for models. The version of each model — and you might use thousands of them concurrently — indicates its inputs. You’ll want to know, and so will regulators.

2. Resilient

Being fluid means accepting from the outset that AI models fall out of sync. That “drift” can happen quickly or slowly depending on what’s changing in the real world. Do the data science equivalent of regression testing, and do the testing frequently, but without burning up your time.

That takes a system that allows you to set accuracy thresholds and automatic alerts to let you know when models need attention. Will you need to retrain the model on old data, acquire new data, or re-engineer your features from scratch? The answer depends on the data and the model, but the first step is knowing there’s a problem.

3. Performant

Most AI is computationally intense — both during training and after deployment. And most models need to score transactions in milliseconds, not minutes, to prevent fraud or leverage some fleeting opportunity. Ideally, you can train models on GPUs and then deploy them on high-performance CPUs, along with enough memory for real-time scoring.

And of course you want everything to run fast and error-free regardless of where you deploy: on-prem, cloud, or multicloud.

4. Measurable

For the moment, budgets for AI and machine learning projects are generous, but those budgets will dry up if data science teams can’t deliver concrete results. Think from the outset about how you’ll quantify and visualize what you’re learning and how it changes: improvements in data access and data volume, improvements in model accuracy, and ultimately improvements to the bottom line.

Don’t just think about what you need to measure now but also about what you’ll want to measure in the future as your data science work matures. Is the system fluid enough to track those long-term goals?

5. Continuous

I started by pointing out that data doesn’t sit still. The fifth and final aspect of fluid AI is about continuous learning as the world changes. Make sure to use tools like Jupyter and Zeppelin notebooks that can plug into processes for scheduling evaluations and retrain models.

At the same time, expect your own learning to grow and evolve as you absorb the advantages and limitations of various algorithms, languages, datasets, and tools. Fluid AI demands continuous improvement for data, tools, and systems, but also continuous improvement from everybody doing the work.

Data science is a journey. Cheesy, but true. Pay attention to these five attributes and you’ll bring focus to each moment and force yourself to find clarity about the future.