Unstructured Data: A Hurdle in Embracing AI Technology

KEY TAKEAWAYS

Artificial intelligence is already proving its worth in tasks such as trend analysis and medical diagnosis support. However, its efficacy hinges on structured data, which can be a weakness due to data scarcity. AI's reliance on structured data poses hurdles when faced with unstructured or challenging-to-parse data, constraining its abilities. Encouragingly, efforts are underway to enhance AI's capacity to handle unstructured data.

Artificial intelligence (AI) is capable of doing things that were previously unimaginable.

Advertisements

It can distinguish between a pedestrian and a road sign to guide a self-driven car, review the tone of an article and provide feedback, provide helpful patient data to a doctor, and fulfil a thousand other time-saving and thoughtful jobs.

However, to do what it does, AI often depends on structured data, and that dependency can become its Achilles heel.

Advertisements

Sources of Unstructured Data

AI can handle all types of data from various sources – structured or unstructured. Examples include:

  • Text data from social media, blog posts, tweets, documents, web pages, news articles, and community forums. Text on web pages is usually bound by stylesheets, tags, and scripts. Text from these sources seldom follows any standard guidelines or structure.
  • Audio data from recordings, videos, and podcasts. These data are obtained after converting the audio to text through speech-to-text converters. Depending on the quality of the converters and the input, the quality of the output varies.
  • Visual data from images, videos, diagrams, screenshots, and infographics that the AI system must parse to understand.
  • Sensor data from various IoT devices, for instance temperature changes in the deep freezer in the kitchen of a big hotel based on the types of raw food stored.
  • Geospatial data obtained from various systems and tools like GPS, smartphones, and compasses.

Limitations of Unstructured Data

AI systems need a consistent data format, at least for large-scale tasks, but applying uniformity is a challenge when data from different sources are stubbornly varied and difficult to fit into a structure.

In order to pull the data into shape, the process of pre-processing it —   such as removing errors, unwanted spaces, and outliers — is a time-consuming process.

Advertisements

Data can also come in various formats, being fed in by APIs, JSON files, or spreadsheets, and new data formats emerge over time which can complicate the problem further.

Data confidentiality can also add to the complexity, and providers must be extremely cautious to prevent data leaks.

A Case Study: Using AI in Patient Care

Let’s use AI and medical imaging to understand how unstructured data hinders AI adoption, using X-rays, CT scans, and MRIs as test cases.

Ideally, AI should analyze imaging reports and enable radiographers and doctors to accurately and quickly diagnose the illness. However, the following factors severely limit AI’s ability to correctly interpret the imaging outputs:

  • Imaging variability

Variability in terms of quality, angle, lighting, and patient positioning makes it difficult for AI to understand the imaging, potentially returning errors or erroneous output.

  • Anatomical variation

Variability in terms of the anatomies of different patients is a challenge for AI systems to understand. AI loves uniformity and is still coming to terms with diversity in human anatomy.

  • Lack of annotations

Annotations enable AI to understand the imaging better – and a lack of them leaves AI to figure out the imaging plates on its own, which, without any helpful resource, is a challenge.

  • Rare or uncommon cases

AI requires uniformity and consistency of data, but imaging on uncommon or rare medical conditions severely limits its ability to process the data. Understanding such conditions requires AI systems to learn as it goes.

  • Noise and artifacts

Imaging can contain noise, artifacts, and distortions due to various factors such as machine problems, non-compliance of imaging protocols, or changes to patient body positions. Unstructured data results from such problems and makes understanding difficult for AI.

The Bottom Line

AI has a long way to go in solving multiple use cases due to a dependence on structured data. Meanwhile, for organizations, providing structured data is still a costly and time-consuming task.

Data provisioning and parsing needs to improve to unlock the full potential of AI and, simultaneously, a lot of work needs to happen to equip AI systems to handle unstructured data.

Advertisements

Related Terms

Advertisements
Kaushik Pal

Kaushik is a technical architect and software consultant, having over 23 years of experience in software analysis, development, architecture, design, testing and training industry. He has an interest in new technology and innovation areas. He focuses on web architecture, web technologies, Java/J2EE, open source, WebRTC, big data and semantic technologies. He has demonstrated his expertise in requirement analysis, architecture design & implementation, technical use case preparation, and software development. His experience has spanned different domains like insurance, banking, airlines, shipping, document management and product development, etc. He has worked with a wide variety of technologies starting from mainframe (IBM S/390),…