Sunday, June 2, 2019
Limitations Of Text Based Image Retrieval Psychology Essay
Limitations Of Text Based Image reco actually Psychology EssaySome generation a relevant photo might be left break owed to the absence of specific keywords. While often there might be no relevant textual matter surrounding the escorts or pictures, but they atomic number 18 relevant. In fact, there might exist images or movies where the surrounding text has nothing to do with them. In these cases, these returned results might be irrelevant and hit nothing in usual with the required images and videos.The early(a) approach uses the annotation of the images and vides and is often a manual task. The text- ground technique first annotates with text, and then uses text-establish recovery techniques to per nisus image and video convalescence. Annotation of images and videos lets the substance abuser to annotate the image with the text (metadata) that is considered relevant. The text preservenister be time, event, location, participants or whatever the user finds relevant.2.6.1 .1. Limitations of Text based Image RetrievalNevertheless, there exist deuce major bafflingies, especially when the volume of image collections is large with hundreds of thousands samples. One is the huge amount of kind labor required in manual image/video annotation and is very time-consuming. Textual based recuperation cannot append the perceptual significant optic vaunts like dissimulation, plaster cast, texture Bimbo. 1999. The other difficulty comes from the rich heart and soul in the images and the subjectivity of human perception which is more essential. The annotation of the image and videos completely take cares on the annotation interpretation Enser et al 1993 i.e. antithetic people may perceive the same image differently as shown in the figure 3 . The perception subjectivity and annotation impreciseness may cause unrecoverable mismatches in later retrieval processes. And to retrieve the required data the user constructs a query consisting of the keywords that de scribes the desired image and video. Although the text based retrieval system has gained benefits of traditionally successful schooling retrieval algorithms and techniques.Figure 3 Multiple interpretation of same images Park like Tree, Sky, Horse, People, Ridding, delightful Day, OutdoorCritics of text-based approach dispute that for accurate image annotation it must be automated. The automatic annotation is limited due to its deficiency of extracting semantic education from the images and videos. Only automatic annotation of images and videos in integration with pure text-based image retrieval will be inadequate. The available metadata is mostly restricted to the technical information surrounding the image or video, such as time, resolution of the image or video and name of the image or video.The users may find it difficult to use text to perform a query for some portion of the content of an image or video. Text-based retrieval techniques argon absolutely limited to seek the me tadata that is tagged to the image or video. If the text queried is not annotated with the same tag as attached with the image or video, the data is not returned. This means that if a particular piece of the image or video is interesting this must be explicit included in the metadata. If the desired object is not a main part of the image or video, sometimes it may happen that is not described in the metadata and hence cannot be a retrieve as a result from a query describing such portions of the image or video.One of the disadvantages of text-based image retrieval is that a word can have different meanings. This problem is best illustrated with an example, chaseing for the images or videos of jaguar or Apple. The system cant differentiate either the user is looking for the jaguar elevator car or jaguar animal as shown in the figure 4. The two concepts have the same name but contain an entirely different semantic idea. The retrieval systems dont have reliable manners to separate th e concepts. These problems argon present even in systems with automatic synonym lists or thesaurus capabilities Schank et al. 2004. There exist several(prenominal) text-based image retrieval services today, Google is a large player. Google is the largest player but still faces the same problem.Figure 4 Same name different SemanticsAttempts have been made to make the tags attached to the image or videos more flexible by attaching vast number of descriptive words. The thesaurus based annotation or knowledge based annotation has gained lots of the researchers attention Tring et al. 2000. Recent development in video retrieval has foc utilize on baffles that combine several modalities for critical point indexing and retrieval.Consideration to the demands, researchers concluded that opthalmic traits play a crucial role in the effective retrieval of digital data. This initiates to the development of the content based image and video retrieval Venters et al. 2000.2.6.2. Content based I mage RetrievalThe need to manage these images and locate target images in response to user queries has change by reversal a significant problem. One way to solve this problem would be describing the image by keywords. The keyword based approach has a bottleneck of manually compose and classifying the images and videos, which is im applicative for the overwhelm corpuses. The human perception subjectivity problem may affect the performance of the retrieval system.Current commercial image and video search engines retrieve the data mainly based on their keyword annotations or by other data attach with it, such as the file-name and surrounding text. This relinquishes the actual image and video more or less ignored and has been following limitations. First, the manual annotation of images requires significant effort and thus may not be practical for large image collections. Second, as the complexity of the images increases, capturing image content by text alone becomes increasingly more difficult.In seeking to overcome these limitations, content-based retrieval (CBR) was proposed in the early 1990s Baeza-Yates et al. 1999. Content-based means that the technology makes direct use of content of the image and video rather than relying on human annotation of metadata with keywords. Content-based retrieval (CBR) research endeavors to devise a retrieval system that exploits digital content in the retrieval process in a manner that is eventually self-employed person of manual work. CBR is an umbrella term for content-based multimedia retrieval (CBMR), content based visual information retrieval (CBVIR), content-based image retrieval (CBIR), content-based video retrieval (CBVR) and content-based audio retrieval (CBAR). CBR may also be termed as multimedia information retrieval (MIR).Content based retrieval extract the rollick of the image or video themselves and use it for retrieval rather than the user generated meat data. CBR uses the primitive features of the image an d video like the saturation, shape, texture, motion etc. Sharmin et al. 2002. Content based system index the images and videos automatically by development different techniques for their visual contents.For the computer, a video is merely a group of frames with a temporal feature, where each frame is basically an image. The computer take each image as a combination of pixels characterize by the low-level change, shape and texture. CBR represents these features in the form of vectors called the descriptors of the image or video. CBR extract these primitive features by using automated techniques and then further use it for searching and retrieval. Thus, these low-level visual features extraction from images and videos has initiated to the many research in the CBR Veltkamp et al 2000.A typical CBIR system should be able to interpret the content of the images in a query and a collection, compare the alike(p)ity surrounded by them, and rank the images in the collection according to their degree of relevance to the users query Tamura at al. 1984. The figure 5 shows the typical content based retrieval system. Retrieval deals with the problem of finding the relevant data from the collection of images or videos according to the user request. The user request may be in the form of the textual data or in the form of query by example. Its relatively palmy to extract the low level features from the images and videos in the query as well as in the collection and then compare it.Figure 5 natural Architecture of Content Based RetrievalThe paramount objective of CBR is efficiency during image and video search and retrieval, thereby reducing the need for human intervention. Computer can retrieve the images and videos by using CBR techniques from the large corpus without the human assumption. These low level extracted features then represent the image or video and these features are used later on for performing the similarity comparison between the other images or videos in the corpus. These extracted features serve like a signature for images and videos. Images and videos are compared by using different similarity comparison techniques. They are compared by calculating the dissimilarity of its characteristic components to other image or video descriptors.CBR approach shows substantial results with the queries like show me the images or videos of the red excuse, Show me the image with low seeming is above the green annotate etc. The available automated CBR techniques deal such a type of queries elegantly but flunk to cope with the high level semantic queries like Show me the images or videos of the people in the park, people on the beach, car on the road etc. Such type of queries cannot be tackled successfully by the CBR systems. These queries require more sophisticated techniques to extract the actual semantics abstracted inside it. Related work in CBR from the perspective of images can be found from the overview studies of Rui et al. 1999, Sm eulders et al. 2000, Vasconcelos et al. 2001, Eakins 2002, Kherfi et al. 2004, Datta et al. 2005 , Chen et al. 2004, Dunckley 2003, Santini. 2001 Santini et al.2001, Lew et al. 2001, and Bimbo et al. 1999.CBIR has received considerable research interest in the last decade Vasconcelos et al. 2001 and has evolved and matured into a distinct research field. The CBIR mainly comprises of two main steps feature extraction and the similarity measurement. These key technical components of the CBIR system will be introduced in the following sections.2.6.2.1. Feature ExtractionImages are described by visual words just like text is trammeld by textual wordsIn fact, an image or a video frame is merely a rectangular grid of contorted pixels for a computer. And to a computer an image doesnt mean anything, unless it is told how to interpret it. Image and video descriptors are intended for the motive of image or video retrieval. Descriptors seek to apprehend the image or video characteristics in such a way that it is facile for the retrieval system to identify how similar two images or videos are according to the users interest. CBR system index images or videos by using the low-level features of the image and videos itself, such as rubric catch et al. 1998, Smith et al. 1996a, Swain et al. 1991, texture Manjunath et al.1996, Sheikholeslami et al. 1994, Smith et al. 1996b, shape Safar. M et al. 2000, Shahabi et al. 1999, Tao et al. 1999 , and structure features Pickering et al. 2003, Howarth et al. 2005. The color, shape and texture are the principal features of the images. The visual contents of images and videos are then symbolized as a feature vector of floating numbers. For example, the touch, texture and shape features extracted from an image form an N-dimensional feature vector, and can be written asWhere is a vector of its own, and is the colour, is texture and n3 is the shape. While for the video there is an additional vector, where is the motion. In the followi ng section, we introduce the visual features to give an impression of how images and video framesncan be converted into a representation that the retrieval system can work with.2.6.2.1.1. pretextA very common way to see at images is by analyzing the colour in they contain. Colour is the most prominent visual feature in CBIR since it is well correlated with human visual perceptions of objects in an image. A digital colour image is represent as an array of pixels, where each pixel contains three or four tuples of colour components represented in a numerical form. The abstract mathematical representation of colours that computers are able to use is known as the colour model.The similarity between the images and the videos is steerd by using the color histogram value. The histogram depicts the specific determine of the pixels inside the image or video frame. The current color based retrieval techniques divides the image into regions by using color proportion. The color based techni que doesnt depend on the size and orientation of an image.Since 1980s various color based retrieval algorithms have been proposed Smith et al. 1996 c. A most basic form of color retrieval involves specifying color values that can be further used for retrieval. Indeed, Googles image and Picasa 3.0, can also provide the facility to the user to search the images that contain homogenous color composition. The most common representation of color information is in the form of color histogram and color upshot. pretense anglogram Zhou X.S. et al. 2002, correlogram Huang J. et al 1997, color co-occurrence matrix (CCM) Shim S. et al. 2003 are some of the other feature representations for color.Figure 6 colorise based image interpretation2.6.2.1.1.1. Color SpacesThere are many color spaces designed for different systems and standards, but most of them can be converted by a simple transformation.i. RGB (Red-Green-Blue) Digital images are normally represented in RGB color space it is the mos t commonly use color space in computers. It is a device dependent color space, which used in CRT monitors.ii. CMY (Cyan-Magenta-Yellow), CMYK (CMY-Black) It is a subtractive color space for printing, it models the effect of color ink on white paper. Black component is use for enhancing the effect of black color.iii. HSB (Hue, Saturation, Brightness) or HSV (Hue, Saturation, Value) It was used to model the properties of human perception. It is an additive color model. However it is inconvenient to calculate color distance due to its discontinuity of hue at 360.iv. YIQ, YCbCr, YUV Used in television broadcast standards. Y is the luminance component for backward compatibility to monochrome prognostic and other components are for chrominance. It is also used in some image compression standards (e.g. JPEG) that process luminance and chrominance separately.Figure 7 The additive colour model HSV2.6.2.1.1.2. Color ModelsA color model is an abstract mathematical model describing the way col ors can be represented as tuples of numbers, typically as three or four values or color components. When this model is associated with a precise description of how the components are to be interpreted (viewing conditions, etc.), the resulting set of colors is called color space1. A color model is a formularized system for composing different of colors from a set of primary colors. There are two types of color models, subtractive and additive.An additive color model uses light emitted directly from a source. The additive color model typically uses primary color i.e. red, green and blue light to produce the other colors. Combination of any two of these additive primary colors in equal amounts produces the additive secondary coil colors or primary subtractive model colors i.e. cyan, magenta, and yellow. Integration of all these three colors RGB in equal intensities constitute white as shown in the Figurea8 a.Figure 8 (a) RGB Additive Color for light-emitting computer monitors. Each co lored light add to the previous colored lights.A subtractive color model illustrates the blending of paints, dyes, and natural colorants to produce a full series of colors, each generated by subtracting (absorbing) some wavelengths of light and reflecting the others. Colors observed in subtractive models are the due to reflected light. Different wavelength lights constitute different colors. The CMYK model (Cyan-Magenta-Yellow-blacK) model is the subtractive model. The combination of any two of these primary subtractive model color i.e.(Cyan, Magenta, Yellow) results in the primary additive model or secondary subtractive model color i.e. red, blue, green and the convergence of it constitute black color as shown in the figure 8 b.Figure 8 (b) CMYK Subtractive colors for Printer. Each color added to the first color blocks the reflection of color, thus subtracts color.For some of the concepts the color scheme helps in achieving suitable results like forest, sky, tree, grass, sea etc. T he color descriptor will help in retrieving the accurate results. But for the categories like the car, house, road etc. Color descriptors cant play a vital role. The color descriptor will fail in a situation of the same car with different colors as shown in the figure 9. For the retrieval based on the color two most frequently used representative are color histogram and color moment. These representatives are represented in the section below.Figure 9 Same Car with different color compositiona. Color HistogramA histogram provides a summary of the distribution of a set of data. A color histogram provides a comprehensive overview of the image or video frame in monetary value of color. A colour histogram for a coloured image describes the different intensity value distributions for colours found in the image. The histogram intent to define the number of times each color appears in an image/video frame. Statistically, it utilizes a property that images having similar contents should ha ve a similar color distribution. One simple approach is to count the number of pixels of each color and plot into a histogram. The histogram h of an image I is represented asH(I)=Where pi is the percentage of i-th color in the color space, N is the number of colors in the color space. To enable scaling invariant property, the histogram sum is normalized to 1. The percentage is proportional to the number of pixels in the image.Figure 10 Shows the Color HistogramMostly commercial CBR systems like Query-By-Image-Content uses color histogram as one of the feature for the retrieval. Colors are normally class in bins, so that every occurrence of a color contributes to the overall score of the bin it belongs to. The bin explains the intensities of different primary color i.e. quantity of red, blue or green for a particular pixel. It doesnt define individual color of the pixels. Histograms are usually normalized, so that images of different sizes can be fairly compared. The colour histogra m is the most commonly and effectively used colour feature in CBIR Swain et al. 1991, Faloutsos et al. 1994, Stricker et al. 1995, Deselaers et al. 2008, Chakravarti et al. 2009 and Smeulders et al. 2000. Retrieving an images based on the colors technique is widely used because it does not depend on image size or orientation.The most common method to create a colour histogram is by splitting the range of the RGB intensity values into equal-sized bins. For example, a 24-bit RGB colour space contains 224 possible (RGB) values. Since this gives us approximately 16.8 million bins, it will be too large to be dealt with efficiently. Therefore, we need to quantize the feature space to a smaller number in order to reduce memory size and processing time as examples Stricker et al. 1995, Swain et al. 1991 have proposed techniques for colour space quantization. After having defined the bins, the numbers of pixels from the image that fall into each bin are counted. A colour histogram can be use d to define the different distributions of RGB intensity values for a whole image, known as a global colour histogram, and for specific regions of an image, known as a local anaesthetic colour histogram. For a local colour histogram, the image is divided into several regions and a colour histogram is created for each region.A histogram refinement strategy has been proposed by Pass for comparing the images Pass et al.1996. Histogram refinement splits the pixels in a given bucket into several classes, based upon some local property. Within a given bucket, only pixels in the same class are compared. They describe a split histogram called a color coherence vector (CCV), which partitions each histogram bucket based on spatial coherence. Han et al. 2002 proposed a new color histogram representation, called fuzzy color histogram (FCH), by considering the color similarity of each pixels color associated to all the histogram bins through fuzzy-set membership function. This approach is prove s very fast and is further exploited in the application of image indexing and retrieval.The paradigm of the color histogram works on the assumption that all the images or videos frames with the similar color composition are similar Jain et al. 1995. It will retrieve all the data whose color composition is similar to the given query. This will be square(a) in some cases. Color composition cant be the identity of the image or object inside the image.Color MomentColor moment approach was proposed by Stricker et al. 1995. It is a very compact representation of color feature. The mathematical meaning of this approach is that any color distribution can be characterized by its moments. Moreover, most of the information is concentrated on the low-order moments, only the first moment, second and third central moments (mean, variance and skewness) were extracted as the color feature representation. Color similarity can be measured by Weighted Euclidean distance. Due to the ease and sound per formance of color histogram technique it is widely used in color based retrieval systems.Color is the human visual perceptual property. Human discriminate an images or objects initially on the undercoat of colors. Color can be extracted from the digital data easily and automated and effective functions are available for calculating the similarity between the query and the data corpus. Color feature are effectively used for indexing and searching of color images in corpus.The existing CBIR techniques can typically be categorized on the basis of the feature it used for the retrieval i.e. color, shape, texture or combination of them. Color is an extensively utilized visual attribute that plays a vital role in retrieving the similar images Low et al. 1998. It has been observed that even though color plays a crucial role in image retrieval, when combined with other visual attributes it would yield much better results Hsu et al. 1995. This is because, two images with entirely similar col or compositions, may have different color composition and sometimes two images have same color composition but they are not similar as shown in the figure. Hence something that looks similar is not semantically similar. The color composition of both(prenominal) the images in figure 11 is same but they depict the entirely different semantic idea. By analyzing both the images using the color based retrieval techniques both the images are similar
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.