Concept Encoder Aiming to Improve Life Sciences Based on Evidence

The Ministry of Health, Labor and Welfare stated the appearance of "next generation health and medical systems" utilizing ICT, and noted actions to build it as mentioned in the "ICT Utilization Promotion Council in the Health and Medical Field" proposal published in October 2016. In the proposal, it declares that the tentatively named "next-generation healthcare management system", which analyzes latest evidence and medical care data using AI and supports optimal medical care in the field, will be in full-scale operation by 2025. This means that for diseases that are currently difficult to diagnose and treat through medical data analysis using AI, we will realize medical care that enables quick and accurate examination, diagnosis, and treatment according to an individual's signs and symptoms.

Medical, numerical and image data are analyzed using AI all over the world. However, AI advancement for electronic medical records and other data written in natural text has been limited despite containing a significant amount of valuable information. The challenge has been that natural language text has a variety of descriptive formats and content for each facility and individual, and is therefore difficult to handle as homogeneous data for analysis.
The AI engine "Concept Encoder (concept encoder)" – developed by FRONTEO Life Sciences AI Business Headquarters – can introduce a "statistical method" indispensable for EBM into natural language analysis by treating language as a vector. It can also be co-analyzed with non-language data. In other words, Concept Encoder effectively analyzes and utilizes accumulated medical data, as well as data that will continue to be added in the future, all which contain natural text sentences pertaining to evidence for the improvement of life sciences. (Patent registration number: Patent No. 6343667)

Concept Encoder's AI Technology Strengths and Examples of Utilization

Vector AI engine Concept Encoder

The area of research and development that the FRONTEO Life Sciences AI Business Headquarters is particularly focused on is "natural language analysis (text analysis)" by natural language processing (NLP).

In order to perform evidence-based text analysis in the life sciences field, it is necessary to quantify characteristics of text and enable statistical analysis. There are several methods for digitizing text information in NLP, such as "morpheme (word) analysis: evaluating the frequency of appearance of words between multiple sentences" and "syntax analysis: evaluating the dependency of sentences and quantifying them". Concept Encoder uses the method of "vectorization of words and documents" in morphological analysis to quantify characteristics of text. "Word and document vectorization" is a method of decomposing a natural sentence into words and then optimizing the words and documents by setting multidimensional variables.

Features of Concept Encoder

Concept Encoder extracts additional information during the analysis of natural sentences by "vectorizing words and documents". This process creates a multifaceted view of all document characteristics. Additionally, the data quantified by the "vectorization of words and documents" enables the analysis of similarities and relationships between words and documents using various statistical methods, all while maintaining the amount of extracted information.

However, to the user, Concept Encoder is an easy-to-use AI engine that applies the desired statistical analysis to targeted text-based data. Here are five of examples of how Concept Encoder can be applied:

1. You can compare documents

This example uses a machine learning method called "Word2Vec (word-to-vector)", also used in automatic translation. Word2Vec can vectorize words by elements (distributed representation) and compares features between words. Word2Vec can only compare words with each other, but Concept Encoder, which performs the "vectorization of words and documents", has the advantage of being able to compare not only words but also words and entire documents simultaneously. Concept Encoder usually analyzes the distributed representation in 300 to 1,000 dimensions.

2. Vector calculation allows you to calculate concepts

Concept Encoder behaves as if the vector contains a range of meanings for the word. This is one of the features found in Word2Vec vectorized words, but Concept Encoder provides an enhancement by applying the "words and document vectorization" process. As a result, it is possible to perform "addition" and "subtraction" across words and documents.

3. Language feature distribution can also be evaluated by clustering

Concept Encoder clusters words and documents specific to each classification to enable an efficient “sorting” process.

4. Transfer and share personal knowledge

Concept Encoder enables you to transfer knowledge effectively since data is shared and accessible by anyone in the group, thereby allowing research and development to proceed efficiently without wasted effort.

5. You can explore ideas

Concept Encoder can search for documents that resemble the content of your own ideas. After reading relevant documents from PubMed, etc., and writing your own research ideas in the Concept Encoder, you are then able to locate documents related to your ideas in descending order of similarity with content descriptions. In addition, it is possible to automatically extract keywords considered important from the literature. Note an automatic sentence summarization function is currently under research and development.

Can be widely used in the Life Sciences Industry

FRONTEO Life Sciences AI Business Headquarters proposes to improve work efficiency by utilizing the AI engine Concept Encoder. Concept Encoder has started operation in a wide range of life sciences industries such as "diagnosis support", "life sciences business support", and "pharmaceutical industry support". The text data to be analyzed varies from medical information search in English (papers, genetic information, clinical trial information, etc.) to data written in Japanese on electronic medical records. FRONTEO Life Sciences AI Business Headquarters will contribute to the development of life sciences by making full use of natural language and statistical analysis, conducting research based on medical and scientific evidence, and by developing products and providing solutions.

Contact us