Prototypes for Content-Based Image Retrieval in Clinical Practice

Depeursinge, Adrien; Fischer , Benedikt; Müller , Henning; Deserno , Thomas M

Prototypes for Content-Based Image Retrieval in Clinical Practice

Adrien Depeursinge^{*, 1, 2}, Benedikt Fischer ³, Henning Müller ^{1, 2}, Thomas M Deserno ³

¹ Business Information Systems, University of Applied Sciences Western Switzerland (HES–SO), TechnoArk 3, 3960 Sierre, Switzerland

² Service of Medical Informatics, University and University Hospitals of Geneva (HUG), Rue Gabrielle–Perret–Gentil 4,1211 Geneva 14, Switzerland

³ Department of Medical Informatics, RWTH Aachen University, Pauwelsstr. 30, D-52057 Aachen, Germany

Article Information

Identifiers and Pagination:

Year: 2011
Volume: 5
Issue: Suppl 1
First Page: 58
Last Page: 72
Publisher Id: TOMINFOJ-5-58
DOI: 10.2174/1874431101105010058

Article History:

Received Date: 16/5/2011
Revision Received Date: 20/5/2011
Acceptance Date: 20/5/2011
Electronic publication date: 27/7/2011
Collection year: 2011

Article Metrics

CrossRef Citations:

Total Statistics:

Full-Text HTML Views: 6556
Abstract HTML Views: 2296
PDF Downloads: 304
Total Views/Downloads: 9156

Unique Statistics:

Full-Text HTML Views: 3041
Abstract HTML Views: 1219
PDF Downloads: 213
Total Views/Downloads: 4473

© Depeursinge et al.; Licensee Bentham Open.

open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.

^* Address correspondence to this author at the Business Information Systems, University of Applied Sciences Western Switzerland (HES–SO), TechnoArk 3, 3960 Sierre, Switzerland; Tel +41 27 606 9023; Fax +41 27 606 9000; E-mail: adrien.depeursinge@hevs.ch

Content-based image retrieval (CBIR) has been proposed as key technology for computer-aided diagnostics (CAD). This paper reviews the state of the art and future challenges in CBIR for CAD applied to clinical practice.

We define applicability to clinical practice by having recently demonstrated the CBIR system on one of the CAD demonstration workshops held at international conferences, such as SPIE Medical Imaging, CARS, SIIM, RSNA, and IEEE ISBI. From 2009 to 2011, the programs of CADdemo@CARS and the CAD Demonstration Workshop at SPIE Medical Imaging were sought for the key word “retrieval” in the title. The systems identified were analyzed and compared according to the hierarchy of gaps for CBIR systems.

In total, 70 software demonstrations were analyzed. 5 systems were identified meeting the criterions. The fields of application are (i) bone age assessment, (ii) bone fractures, (iii) interstitial lung diseases, and (iv) mammography. Bridging the particular gaps of semantics, feature extraction, feature structure, and evaluation have been addressed most frequently.

In specific application domains, CBIR technology is available for clinical practice. While system development has mainly focused on bridging content and feature gaps, performance and usability have become increasingly important. The evaluation must be based on a larger set of reference data, and workflow integration must be achieved before CBIR-CAD is really established in clinical practice.

Keywords: Content-based image retrieval, medical image retrieval, diagnosis aid, prototypes.

View Abstract Download PDF

1. . INTRODUCTION

1.1. History of Medical CBIR

In the early 1990s, the query by image content (QBIC) system of IBM was one of the first approaches to content-based image retrieval (CBIR), and the query by image example (QBE) paradigm has since been established [1]. Representing images by means of numerical features (signature), relevant images are identified by comparing the signature of an example with all signatures in a repository. Initially, CBIR was applied to images from the Internet or large volumes of photographs [2]. The signatures were obtained from color, texture, and shape. Since color has been identified as the most relevant structure for CBIR, the semantic gap was recognized. It describes the differences between image similarity on the high level of human perception and the low level of a few numerical numbers describing a mean color.

A comprehensive review of CBIR systems in medical applications is given by Müller et al. [3]. To narrow the semantic gap, first medical computer-aided diagnosis- (CAD) CBIR approaches focused on a particular modality and application domain, such as microscopy, photography of the skin, radiographs of spine, teeth, and mamma, computed tomography (CT) of lungs, and magnetic resonance imaging (MRI) of the head (Table 1). According to Tagare et al., medical image information further contains spatial data, and a large part of image information is geometric [4]. Accordingly, initial attempts in bridging the semantic gap were based also on local image signatures referring to pre-segmented regions of interest (ROI) and relative positions of relevant objects (Table 1).

Table 1.

Early Medical CAD-CBIR Systems in 2000

Table 2.

Field of Engineering for CBIR in Clinical Practice

Table 3.

Statistics of CAD Demonstration Workshops

Table 4.

Comparison of CBIR Systems

Table 5.

Gap Classification of Prototypes According to [20]

With the automatic search and selection engine with retrieval tools (ASSERT) system, for instance, the physician manually delineates the pathology bearing ROI and a set of anatomical landmarks when an image is entered into the database [5]. However, driven by the ever-increasing amount of medical image data acquired directly in the digital form in today’s clinical practice, manual annotations are time consuming, imprecise, irreproducible, and simply impracticable.

The development of rather general approaches such as I-Browse and KMeD began about ten years ago. The established frameworks for medical CBIR systems today are the medical GNU image finding tool (MedGIFT) and the image retrieval in medical applications (IRMA) project [15,16], which started in 2002 and 2001, respectively.

1.2. Fields of Engineering

Simple CBIR prototypes can be developed quickly by computer scientists from the fields of image processing and machine learning, being central to the characterization of image content. To build large-scale CBIR systems however, expertise from several fields of engineering must be combined to fulfil the needs of specialized user groups including a high level of interaction (Table 2) [2]. Initially, raw, low-level information is extracted from original images using image processing techniques. These features are usually characterizing color, shape and texture, either globally or within an image region (locally). Then, computer vision, machine learning as well as knowledge from human vision and psychology are required to aggregate, optimize and map the low-level features to high-level semantic concepts driven by user’s intents and visual perception of the image collection of concern [17]. Another important cue is to maximize human-computer interaction with appropriate interfaces for query formulation and result visualization. In particular, allowing physicians to efficiently draw a sketch or quickly mark a volume of interest in 3D or in multiple-captured volumes (3D+time) is still an insufficiently solved problem. Studies in information retrieval also showed that involving the user in the loop for query refinement enables quicker convergence between user’s intents and retrieved results [18]. On the technical side, efficient database management and high-performance computing are both required to optimize the retrieval quality offline (e.g., parameter optimization over large image collections) and to ensure quick online system response [19].

1.3. CBIR vs CAD

The most popular approach to image-based CAD aims at providing automated interpretation of image examinations as a second opinion to radiologists. These systems have proved to be particularly useful for analysing large amounts of data containing easily detectable lesions but with low prevalence [22]. When compared to human readers, computers can efficiently and exhaustively analyse large numbers of images with high reproducibility. In the literature, two types of CAD systems are distinguished:

Computer-aided detection (CADe) aims at pre-analyzing images and automatically annotates suspicious regions in order to support the radiologist in reading. No classification of the ROIs is done.
Computer-aided diagnostics (CADx) aims at deriving a diagnostic decision by adding a classification step, where the identified ROIs are analysed, classified, and a semantic conclusion is drawn, which might serve the radiologist as second opinion.

The initial attempts to CADx were carried out more than forty years ago in chest X-ray imaging [23,24]. These systems aimed at replacing radiologists as the originators relied on the assumption that computers were better at performing certain tasks than human beings. However, it quickly became clear that physicians and radiologists have to take the final decision and the outputs of CAD systems must be used as second opinions and information providers [25]. Recently, CADe systems have been used in clinical practice in the rather mature field of cancer screening in mammograms and allowed to improve the detection of non–palpable cancerous masses [26]. Other notable examples are the assisted detection of lung nodules in chest radiographs such as Riverain Medical’s OnGuard¹ which have obtained approval from the United States’ Food and Drug Administration (FDA). Furthermore, CADe is expected to be introduced into clinical routine for several other domains such as the chest, colon, brain, liver, skeletal and vascular systems [22].

In summary, medical CBIR systems are well aligned with early 1990s’ conclusions that CAD should be used as second opinion and information providers [25], rather than independent automatic diagnostic systems. However, despite almost 20 years of intensive research in academia, CBIR has not reached as far as beyond research labs and, to our knowledge, no commercially available medical CBIR system exists yet. It is all the more surprising that the techniques in image processing and machine learning used for CADe and CBIR are similar in terms of structure, and the major disparities between the two occur in graphical user interfaces (GUI), clinical workflows and integration.

In this work, we try to answer the question why CBIR systems did not reach clinical practice yet. We provide a detailed analysis of CBIR systems that are close to be integrated and analyse their strengths and pitfalls. The corresponding unaddressed gaps are identified, and, from that, future directions are provided in a hope to foster the adoption of CBIR systems in clinical radiology.

2. MATERIALS AND METHODS

2.1. State of the Art in Reviewing Systems

Reviewing medical CBIR systems is an often discussed issue with the first paper appeared in 1997 [4]. Thereafter, reviews usually are specialized on a certain medical or technological application domain such as forensics [27] or Web 2.0 [28], respectively. Rather general reviews have been published by Müller et al. [3] and Akgul et al. [29] referring to CBIR in radiology by current status, clinical benefits, and future directions. Ending up with 187 or 77 references from the two reviews, respectively, inclusion or exclusion criteria are not fully clear, which limits the impact of such work. However, a somewhat more systematic methodology for classifying CBIR systems has also bee proposed [20,30].

2.1.1. Defining the Characteristics of Medical CBIR Systems

In [20], for instance, a set of 14 so called gaps are defined to classify medical CBIR systems, which are enriched with additional 7 characteristics. The gaps were identified as being responsible for potential pitfalls and inadequacy of current medical CBIR systems. For instance, the “semantic gap” describes the discrepancies between a high-level of semantic in human image perception and understanding and the simple numerical signature that is extracted by a machine in terms of color, texture and shape. More systematically, the authors define four clusters of gaps:

Content: The content gaps address the level of image understanding (1 – semantic gap) as well as the imaging and/or clinical context in which a CBIR system may be used (2 – use context gap). Obviously, designing a medical CBIR system for a broad use is more challenging, since the level of image details being relevant for the retrieval, the type of image data (modality) to handle, and other system preferences are highly variable.
Feature: The feature gaps address the automation of feature extraction (3 – extraction gap), the granularity or dimensionality of structure of image objects recognized by the system (4 –structure gap), of visual details in the image processed by the system (5 –scale gap), of spatial and time inputs actually used to compute the signature (6 – space & time dimension gap), and of channel inputs actually used to compute the signature (7 – channel dimension gap).
Performance: The performance gaps describe the levels of actual implementation of the system (8 – application gap), of integration into patient care information systems (9 – integration gap), of support for fast database searching (10 – indexing gap), and to which the system validity of retrieval has been evaluated (11 –evaluation gap).
Usability: The usability gaps address the levels to which user may use and combine text and visual queries (12 – query gap), to which the system helps the user to understand query results (13 – feedback gap), and to which the system enable the user to refine and improve query results (14 –refinement gap).
These gaps highlight the need of contributions from several fields of engineering in order to successfully design, implement, and integrate medical CBIR systems into clinical practice (Table 2). With respect to our intention, the integration gap is of particular relevance. Already in 1999, Eakins & Graham explicitly have stated with respect to the non-medical CBIR application domain that “the experience of all commercial vendors of CBIR software is that system acceptability is heavily influenced by the extent to which image retrieval capabilities can be embedded within users’ overall work tasks” [32]. This is entirely true for medical applications as well.

2.1.2. . Defining the Review Methodology

According to previous work in general image retrieval [33], Long et al. proposed to formalize the review methodology to retrospectively assess the state of the art and future directions of medical CBIR systems [30]. Different criteria were defined:

Journals: The journals were identified using informal selection criteria, but with the goal of providing a broad representation of the major publications reporting medical image retrieval research results. Ten journals from engineering and medicine were included.
Date: Aiming at reviewing recent work, the date of publication was set between 2001 and 2010. Defining an ending date may be useful since the processes of paper writing, reviewing, and publishing may take up to five years [34].
Database: Several bibliometric databases and search tools are available. For instance, Pubmed (National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA), Google Scholar (Google Corp., Mountain View, CA, USA), and the ISI Web of Science (Thomson Reuters Corp., New York, NY, USA) are most recognized. However, the visibility of medical informatics regarding may differ with respect to coverage and completeness [35].
Terms: The search phrases must also be determined in order to produce reproducible results. In the work of Long et al., the terms [“medical image retrieval AND search_phrase] were used, where search_phrase was one of eleven CBIR-related phrases including, for instance, “content-based image retrieval], “Indexing” , “Performance” , or “Relevance Feedback” [30].

2.1.3. Evaluations with Clinical Practitioners

Another idea to assess the “user readiness” of CBIR systems was proposed by Antani et al. [31]. The authors applied a set of usability evaluation methods known from quantitative and qualitative research to evaluate exemplarily a CBIR system supporting the access to 100,000 cervigrams and related, anonymized patient data. These methods include:

Questionnaire: Designed for the purpose of gathering information from respondents, a series of questions and other prompts or scales are presented either paper-based or electronically. Such questionnaires are filled by the user and usually analyzed statistically.
Structured Interview: Similar to paper-based surveys (questionnaire), the users are presented with exactly the same questions in the same order to support data aggregation. Such interviews are also referred to as (researcher-administered survey).
Focus Group: A group of people is asked about their perceptions, opinions, beliefs, and attitudes towards a system. The questions are asked in an interactive setting where participants are free to talk with other group members.
Think-aloud Method: Participants are encouraged to voice their thoughts on the system as they perform given tasks, while an expert facilitator is guiding the process of the session, and comments the voice recordings.

The methods also help to identify obstacles that hamper practical use of such systems. In the above given example, for instance, the user uncovered many problems such as it was (i) challenging to obtain a clear understanding of the purpose and functionality of the tool without any training on the tool’s capabilities, (ii) difficult to discern how to properly formulate a visual query, which is a critical component in CBIR systems, and (iii) almost impossible to use the interactive drawing tools successfully [31].

2.2. Definition for CBIR in Clinical Practice

However, complex schemes of terminology do not guarantee unambiguous location and complete inclusion of relevant work. Although a well-defined ontology is most important to science, it may still remain ambiguous to apply these terms correctly, since relevant information may be missing in the reports that are published as scientific article. The same holds for a detailed search strategy. Including or excluding systems by user evaluation studies delivers the most objective assessment of readiness for clinical practice, but is too costly regarding both, system and man power.

In order to identify content-based image retrieval systems that are really near to clinical practice, we assume they have been demonstrated in one of the recent workshops on CAD. Such workshops may be organized by:

CARS: The Computer-Assisted Radiology and Surgery private initiative, Kuessaberg, Germany, along with the annual CARS Conference,
RSNA: The Radiological Society of North America, Oak Brook, IL, USA, along with its annual meeting,
SIIM: The Society for Imaging Informatics in Medicine, Leesburg, VA, USA, along with its annual meeting,
IEEE: The Institute of Electrical and Electronics Engineering, Piscataway, NJ, USA, along with the international Symposium for Biomedical Imaging (ISBI),
SPIE: The Society of Photo-Optical Instrumentation Engineers, Bellingham, WA, USA, along with its annual International Symposium on Medical Imaging,
MICCAI: The Medical Image Computing and Computer Assisted Intervention Society, Minnesota, USA, along with the annual MICCAI Conference.

The workshops were analysed in the years between 2009 and 2011, where “Computer-aided Diagnosis” and “retrieval” were in the workshop and software presentation title, respectively.

2.3. Applying Criteria

Among the six targeted conferences, two have dedicated sessions for live demonstrations of CAD systems. The events are:

CADdemo@CARS in years 2009 and 2010 (2011 had not happened yet at the time when this paper was written),
SPIE Medical Imaging (MI) CAD workshop in years 2009, 2010 and 2011.

Other targeted conferences (i.e., RSNA, SIIM, IEEE ISBI and MICCAI) may have live software demonstrations in the context of commercial exhibitions, but these are not dedicated to CAD systems and very few technical details can be found for commercial systems.

In total, 4 systems presented in the five workshops (CADdemo@CARS 2009-2010 and SPIE MI CAD workshops 2009-2011) and containing “retrieval” in the title were analyzed. Some other CADx systems that did not contain “retrieval” in the title may contain CBIR features but were not in the main focus of the developments. At the CADdemo@CARS 2010 in Geneva, three CBIR-based CAD systems were presented:

A platform for bone age assessment, developed by the Rheinisch-Westfälische Technische Hochschule (RWTH) in Aachen, Germany entitled “Web-based bone age assessment using case-based reasoning and content-based image retrieval”.
Retrieval of fracture cases for operation planning developed at the University Hospitals of Geneva (HUG), Switzerland entitled “Case-based visual retrieval of fractures”.
Analysis and retrieval of high-resolution CT (HRCT) images from patients affected with interstitial lung diseases (ILDs) developed by the HUG entitled “Content-based retrieval and analysis of HRCT images from patients with interstitial lung diseases: a comprehensive diagnostic aid framework”.

At the SPIE MI CAD workshops 2009 and 2010, two CAD systems for the characterization of breast masses in mammograms were found with “retrieval” in the title:

“Expert-guided content–based mammographic mass retrieval system”, developed by Georgetown University Medical Center, Washington DC, USA.
“Content-based image retrieval (CBIR) CADx system for characterization of breast masses”, developed by University of Michigan Medical Center, USA.

The collaboration between the two research groups resulted in a publication at SPIE MI 2011 [36], which was used for the description of the CBIR system in Section 3.1.4. These four systems are described in details in Section 3.1, compared in Section 3.2. and analyzed in terms of gap identification in Section 3.3.

3. RESULTS

The programs of 6 conferences were analyzed. In years 2009-2011, workshops and special sessions dedicated to live demonstration of CAD systems were found in CARS and SPIE MI conferences. As CAD@CARS2011 had not happened yet at the time this paper was written, the program of five workshops (CAD@CARS 2009-2010 and SPIE MI CAD demo 2009-2011) were sought for CBIR systems. 70 CAD systems were presented, and 5 of them (7%) contained “retrieval” in the title (Table 3).

3.1. System Descriptions

3.1.1. Bone Age Assessment

With its underlying flexible structure of image processing and image retrieval algorithms, the IRMA framework was adjusted to enrich CAD in the context of bone age assessment (BAA). A live demonstration of this system was presented at the CADdemo@CARS 2010 in Geneva.

Background and Objective

Bone age assessment based on hand radiographs is a frequent and time-consuming task for radiologists to estimate the maturity of patients. Relating the bone to chronological ages and the current status of growth allows estimating adult height of pediatric subjects, as well as diagnosing and tracking endocrine disorders or pediatric syndromes [37]. Clinically, the methods by Greulich & Pyle (GP) [38] or Tanner & Whitehouse (TW3) [39] are applied. In the former method, radiologists compare all bones of the hand to those shown in radiographs from the standard atlas. In the latter case, a certain subset of bones is examined.

Several approaches have been taken to (partially) automate the BAA-process, and recently, a commercial application was reported [40]. In general, all existing approaches rely on computation and measurements of image- or region-related features, which are usually incomprehensible to the user. Providing more transparency, the IRMA-based BAA application aims at merging CBIR with case-based reasoning (CBR). This is achieved by retrieving similar radiographs with validated ages from a case database, and subsequently presenting these to a radiologist along with a suggested bone age deduced from similarity and bone age of previous cases.

Methods and Application

The development of the hand bones is most evident in certain image regions, namely the epiphyses, the carpal bones, as well as the distal radius and ulna. Therefore, ROIs are subjected to CBIR queries rather than the entire radiograph. In the current IRMA-BAA prototype, only the epiphyses are considered, while the other bones are proclaimed to be included in future releases.

For a new hand radiograph, the processing pipeline consists of four steps (Fig. 1):

Fig. (1).

CBIR approach to bone-age assessment.

Fig. (2).

Result display for the query image indicated at the top left.

Fig. (3).

A Screen shot of the GUI for visual retrieval of fracture cases.

Fig. (4).

A screen shot of the GUI for the 3D categorization of the lung tissue.

Fig. (5).

Left: the query interface for clinical parameters. Right: a ranked list of retrieved cases.

Fig. (6).

Two examples of a 3D query and retrieved results.

Fig. (7).

Left: the query interface allowing for online selection of specific image features to be used for retrieval and result visualization. Right: A detail view of the mass region and its localization in the mammogram.

Center localization: At first, the centers of the epiphyses are localized. This can be attempted automatically, but a manual localization was proven to be more reliable.
Region extraction: A bounding box is oriented automatically around these centers, scaled and extracted, yielding the query epiphysial ROIs (eROIs) for the CBIR part of the IRMA-BAA.
Case comparison: With each extracted eROI, a query by example is send to the case database, which contains complete radiographs, corresponding eROIs, and meta-information such as gender, ethnic origin, chronological age, and the validated bone age from expert readings. For each query, the K most similar eROIs are returned with similarity scoring and validated bone age. Currently, the similarity is determined by cross-correlation, the image distortion model, and the Tamura texture features [41].
Age assessment: The overall bone age is predicted from the similarities of the K retrieved eROIs and the validated bone ages:

(Eq.1)

Here, R is the number of eROIs used for the query, age^validated(r,k) is the validated bone age of the k-th similar

eROI to the r-th query eROI, while ∂ (r, k) provides the corresponding similarity. In (Eq.1) only corresponding bones are compared, e.g., an eROI of distal phalanges of the index finger is compared solely to other eROIs of distal phalanges of the index finger.

Since the IRMA system already provides mechanisms for the inclusion of CBIR into web-based interfaces [42], the BAA-application makes use of these features and a research prototype is available as an online demo². The prototype allows BAA for a radiograph of the demo database or the analysis of a user-uploaded image. The integration into clinical information systems can be achieved by Digital Imaging and Communications in Medicine (DICOM) Hosted Applications and DICOM Structured Reporting.

The result of the CBIR-based age estimation is presented to the user (Fig. 2). Query image and extracted eROIs are shown at the top-most area of the web-based interface. Their most similar counterparts retrieved from the database are shown below (scrollable) in decreasing similarity and with the validated bone age. The estimated bone age is shown below the query image. If the query image is contained in the demo database, the validated bone age is also provided. A click on one of the thumbnails opens the full resolution image. The display mode can also be switched to show the hands belonging to the retrieved eROIs.

Validation

In order to estimate the potential for clinical use, the research prototype was evaluated in terms of the mean absolute prediction error of the estimated bone age in comparison to the validated bone age. As ground truth, the publicly available hand atlas provided by the University of Southern California (USC) is used, providing 1,102 radiographs with gender, ethnic origin (Caucasian, African American, Asian, Hispanic), and bone age readings by two experienced radiologists. The mean of the two readings is defined as the validated bone age in (Eq. 1). In leaving-one-out experiments, a mean absolute error of 0.97 years and a variance of 0.63 are observed over all ages and regardless of gender.

Summary and Perspective

Although the CBIR-CAD for bone age assessment performs less exact than the commercial application BoneXpert, where a root mean square error of 0.61 years is reported [40], it provides the radiologist not only the estimated age as a number but also the images to compare with. Further developments in the similarity computation, the inclusion of carpal bones and a gender-specific retrieval are expected to improve the performance. From a CBIR perspective, relevance feedback on the results should be established allowing the radiologist to express his level of agreement to the retrieved images, and restart the CBIR cycle (query refinement) to improve the age estimation.

3.1.2. Bone Fractures

The application described in this section is a case-based CBIR system for surgical planning of bone fractures developed at the HUG [43]. A live demonstration of this system was presented at the CADdemo@CARS 2010 in Geneva.