Getting AI Data Ready
In AI projects, terms such as data readiness and data quality are used often, particularly in the marketing material of AI tool vendors. Clearly, AI and ML technologies are completely dependent on data, but what does “good quality” data mean and how should a an organisation verify its data readiness? Cyber Smart Consulting’s Lead Data and AIS Consultant, Shirley O’Sullivan explains:
Company executives may have been persuaded of the benefits of AI, believing it can infer patterns and predict outcomes beyond the potential of human analysis. That is a good thing. High level commitment to AI means the project should receive the budget and attention it deserves.
However, this optimism will rapidly come unstuck if AI implementation is seen simply as a technology acquisition. That somehow buying the right tools will guarantee an easy route to bring AI into the organisation. A successful AI project demands that the business is AI data ready; this is the single most critical aspect in the development process and is the most common reason cited for AI project failure. Here we explain what it means to be AI data ready and show how our AI Data Readiness Service can help you achieve it.
Why Is AI Data Readiness Misunderstood?
Artificial Intelligence (AI) and Machine Learning (ML) can be a compelling pitch. Early adopters and vendors can demonstrate models with outstanding abilities to learn from data, predict outcomes and recommend actions. This is not sleight of hand, or smoke and mirrors. The technology is genuinely capable of extraordinary analysis involving pattern finding and learning. In demonstrations, what is behind the curtain, metaphorically, will be a standardised, clean data library. However refined data of this quality is rarely available at the start of an AI implementation.
In the late 1980s businesses began to embrace a technology known as Data Warehousing. Although databases such as Oracle and SQL Server were highly efficient at processing transactions, their data structures were not suitable for reporting and analysis. Their customers were calling for customisable reports and analytical tools that could “slice and dice” multi-dimensional data to gain insights.
This landscape is still in place for many businesses. A data warehouse, probably in a proprietary format, structured for business intelligence reports and drilling down into financial results; A repository of database tables, often in departmental silos and optimised for transactions; Engineering information in impenetrable, supplier-dependent formats; Finally, inevitably, Excel spreadsheets.
Some organisations have made progress in collecting this data into a single common location, known as a “data lake” – often in the Cloud. This helps to impart an understanding of the quantity of data, where it originates and how often the authoring applications make updates. Where the process stops, however, is in cleaning and standardising.
Becoming AI data ready means implementing a refining process which can clean and transform the relevant parts of the data lake into a standard format that AI and ML can understand and learn from. The process also must carry out this transformation as an ongoing activity, predictably and repeatedly.
If that isn’t enough of a challenge, there are two more curve balls. Firstly, the process has to handle change management, as new authoring apps are added to the mix, extending the shape and size of the data. Secondly, analysis often turns up two (or more) applications that generate very similar data which then has to be reconciled.
What’s The Difference Between Training Data, Validation Data, & Testing Data?
In supervised learning, training an ML model requires two artefacts; an ML algorithm or “learning algorithm” and a set of training data with labels to indicate the desired answer. The algorithm searches the data for the desired answer, or “target attribute” by finding patterns in the data which it can map to the attributes of the target answer. The result of this process is an ML model that contains the patterns found.
Validation data is used subsequently to check whether the model is correctly identifying new data, or “overfitting” against the training data, meaning that the model is too skewed towards the training data. Finally, testing data is used to verify that the final model is accurate against the target. The ML model is then used to continuously learn from live data and as a foundation for the AI products.
How Can A Business Become AI Data Ready?
Given the importance of data to the AI project and the range of challenges that entails, it is unsurprising that businesses struggle to gain a foothold in formulating a data strategy. In any project where the team is facing uncharted territory, a methodology is essential, together with specialist tools to carry out unfamiliar tasks.
Our AI Data Readiness Assessment service, together with our industry leading proprietary tools, will guide the organisation through a series of essential stages in acquiring and processing data, ready to start the development of algorithms and prototypes. The project team will establish rules for how data flows through the business, understanding where it is sourced, how it is processed on-route, how it is managed and controlled and which areas of the business utilise it.Working from a set of business functions, or use cases, each one is analysed to shape the required dataset, resulting in data that can be used to train models reliably.
Completing the AI Data Readiness process means designing a data structure and balanced dataset that will:
- support the access methods for systems that utilise the data.
- meet defined objectives for compliance, resilience and speed of processing.
- produce accurately labelled training data.
- deliver highly accessible data.
- include processes for data management and safeguards, for example, security of sensitive or personal data.
- store data in the most optimal and accessible structure to promote re-use.
- address requirements around ethics, privacy, data quality and ownership.
What’s Involved In An AI Data Readiness Engagement?
Our AI Data Readiness service consists of key analysis and design steps, each contributing to a plan that will prepare the organisation to start AI development. These are:
- Evaluation – Data is evaluated in respect of behaviours and features. Each behaviour is analysed, each feature is predicted and the contextual data is obtained and evaluated.
- Robustness – The data is assessed for inherent strengths and weaknesses. These are identified, along with any vulnerabilities in the data which may be exploited.
- Datasets – Preparing to train and test the model requires building datasets that can be interpreted, and which are reproducible and traceable. Various datasets are built and labelled.
- Provenance – Data provenance is established, accounting for its origin and lifecycle.
- Quality – Data ownership and requirements for privacy and ethics are established.
- Processing – Rules are defined regarding how the data is processed in order to develop algorithms and prototypes.
These aspects are collectively rolled into a strategy for acquisition, processing and governance of data.
What Are The Benefits Of Being AI Data Ready?
Carrying out an AI Data Readiness Assessment means the organisation is in a great position to develop and implement AI systems. By using our proprietary tools, the technical team will have a thorough understanding of the data landscape and be ready to take on the challenge of designing AI systems.
- The business will be able to accelerate the process of data acquisition and its transition for use in ML and AI.
- The organisation will own a key artefact, a labelled dataset of high-quality representative data that can be used to train ML models. Further, the technical team will understand the process of developing and extending such datasets.
- There will now be an established set of rules around the data, governing access control, permissions and usage.
- Sensitive data will be protected by internal rules and external safeguards.
- Up to date data management procedures will ensure conformance to regulations such as GDPR.
- An established methodology for data collection will ensure that there is little or no bias in the data.
Many organisations are planning to adopt AI technology. Although they may say they are AI data ready, this could be a long way from the real story on the ground. Preparing for AI development means putting in the legwork on AI data acquisition, putting together a labelled training dataset and establishing rules for data governance. We can help your business to be ready for AI, ready to gain the benefits on productivity and decision making. Find out more here