Description of User Requirements and Interface for Program Mining

views updated

Chapter 5
Description of User Requirements and Interface for Program Mining


If we compare program mining with the process of problem solving, it is a process of finding out the corresponding service composition by analyzing the user requirements, so as to meet them. During this process, the first step, also the critical step, is how to make the program mining system understand the user requirements accurately, and describe them in a formalized way, which enables the computer system to analyze and process them, and to find the results meeting the requirements.

It is mentioned in Chapter 1 that user requirements are developing in intelligent, personalized, and comprehensive directions. Thus, it is imaginable that to the program mining system, the inputting of user requirements is diversified with great fuzziness and uncertainty.

Besides the great diversity in contents, there are multiple input modes: natural language input, icon navigation, voice input, and so on.

This diversity in contents and inputting modes of user requirements imposes very high demands on the human–machine interface and analysis ability of the program mining system.Taking a look at the development history of computer and networks, we find that the development of human–machine interface technology aims at providing computer users with an increasingly natural mode to interact with computers. Similarly, the design of the human–machine interface in the program mining system aims at providing users with an input mode that is as natural as possible. Thus, it

can correctly process the user requirements and describe them precisely with formalization methods, so as to provide proper objects and contents for the subsequent transactions of the program mining system, such as function decomposition, component retrieval, and composition.


In program mining, the analysis and acquisition of user requirements go through the following steps.

First, by various natural input methods, including text, voice, and video and so on, users can deliver their requirements to the program mining system.

After that, the corresponding module in the interface of the program mining system will analyze and retrieve the input information according to the inputting methods. Generally speaking, it is relatively easier to analyze and retrieve information that is inputted using the command line or icon navigation, since the supporting tools of the operating system and Web sites are able to understand the input information themselves. In contrast, when users input information in natural language, video or audio, analyzing and retrieving users' information becomes much more difficult. So far the analyzing and understanding of natural language remains one important problem not yet fully solved in the field of AI. Suppose that user's input is a finite set of a given natural language, then the understanding and analyzing module extract the keywords describing user requirements, which is called Information Extraction (IE), and thereby determine the domain of the user requirements and the keyword table needed for searching for the related component.

Let us take natural language text-based users' input as an example to introduce requirements analyzing in program mining. Figure 5.1 illustrates the process of requirements analyzing and function discomposing when the input is some text of a natural language. If the input is in other forms, the only thing we need to do is to change the understanding and analyzing module of the natural language into other corresponding modules.

As shown in Figure 5.1, the user requirements are input as a finite set of a natural language. Having received the input of the user requirements, the system will first call the understanding and analyzing module of natural language, then extract the user requirements from a keyword table consisting of the keywords and their hierarchical relationships. The keyword table serves

as the foundation for further requirements analyzing and component retrieving.

User domain keywords are an important part of the user requirement keyword table. The confirmation of the user's domain can limit the subsequent requirement analyzing within a finite domain range, thus improving understanding accuracy and cutting down ambiguity and, meanwhile reducing the range of component retrieving and raising the computing efficiency.

The dictionary of the application domain is used in the process of confirming the keywords of the user's domain. The words stored in the application domain dictionary are the feature words and their parasynonyms that can represent the domain. The entries in this dictionary are extracted by the developers from the attributes of the related components during the process of building the component warehouse. In addition, in order to facilitate human–machine interface of program mining to retrieve keywords and build up the keyword table, we define an attribute word table made up of related attributes of components.

Both the keyword table of the user requirements and attribute word table are described using XML.

The two purposes behind using XML to describe these word tables are as follows: one is making it convenient to organize the component warehouse and searching for components, which means to search the components in

the component warehouse according to the attribute word table and keyword table described in XML; the other is to provide a friendly interface for interactively retrieving keywords for users, clearly depicting the requirement functions.

The keyword table of the user requirements only reflects the elementary functions of the user requirements. However, on many occasions, the functions of the user requirements cannot be described clearly merely with the keywords in the table. In other words, we might not be able to obtain components exactly matching required functions with keywords from the keyword table. At this time, the functions need to be subdivided according to these keywords so as to get a more particular function-decomposing scheme and more corresponding keywords. In the decomposing scheme of the user requirements functions, the overall function of the user requirements is subdivided into many subfunctions; each subfunction needs to accomplish at least one component. To ensure that at least one or more components can be found in the component warehouse to implement the subfunction, the system will provide users with the function sets of components for further choosing in advance. The final requirements decomposition scheme of user requirements will be decided through the interaction between users and the system.

We refer to the keyword table obtained from the first round of analysis as the First Level Keyword Table, and the keyword obtained after function decomposing based on the First Level Keyword Table, as the Second Level Keyword Table.



XML—eXtensible Markup Language—is a subset as well as a simplified version SGML (Standard Generalized Markup Language).

XML is a meta-markup language and also a semantic/structured language, describing the structure and semantics of the document. For large-scale complex documents, XML is an ideal markup language, as it allows us to not only specify the words in the documents but also to specify the logic structures of the documents through Document Type Definition (DTD).

In the following parts, we will introduce the document grammar of XML first and then the Document Type Definition of XML.

  1. XML Grammar. An XML document has a text-based grammar format similar to that of HTML. It is constituted of “elements”, which are logic tag fragments and each element begins with a starting tag symbol (“<”) and ends with an ending symbol(“>”). For example:An element may contain other elements, parsed character data (PCDATA), or simply be empty. In the beginning tag symbols, attributes can be used and values of attributes may be put inside single or double quotation marks. For instance:An element is a logic unit of information, while the attributes are exactly the feature of the information. Generally, the attribute may be bounded to a predefined list of enumerated values and the default value of the attribute may be specified. The character string is the most used attribute type.
    The following is an XML file—PMUsers.xml—which describes the user's information of program mining:This is a typical XML document, which can be divided into two parts: document preface and document body. In the document, the first line is the standard XML preface format:This preface tells the processing program (intepreter and browser) that this is an XML document. The item “version” is a must, as it indicates the standard version of the XML document used, which is XML1.0 here. The item “encoding” indicates the type of character encoding used in the XML document.
    The second line is about the DTD files that the XML document needs to match:Here, the name of this DTD file is PMUser.dtd, in which the root element type is PMUsers.
    The third line is a comment on the XML document:The comments format is somewhat the same as that of HTML, except that all beginning tags must match their end tags in XML. The comments can occur at any places of the XML file, beginning with <!-- and ending with -- >.
    The fourth line is about the defining of the XML display format:In XML, the content and display formats are separated. The code of this line indicates that the display format of the content is defined by the cascading style sheet (CSS) “SimpleSample.css.”
    The document main body is constituted of the beginning tag <PMUsers> and ending tag </PMUsers>, the element “PMUsers” is called the root element of the XML document. There is one and only one root element in an XML document, which contains all other elements and texts in the document. The definition format of other elements is the same as that of the root element. The element definitions are required not to overlap, besides, they must be marked with end tags and be case sensitive.
    The root element <PMUsers> represents a list of all users in the program mining system. Under it, we use the element <PMUser> to particularly define the detailed information of each user, which consists of the elements <name>,<age>,<sex>,<address>, and so on, respectively, corresponding to the information of the user's name, age, sex, address, and so on.
  2. DTD Grammar. An XML document, according to whether it satisfies the XML grammar regulation and/or the specified logic structure, may be classified into one of the following three types:
    1. Invalid Document—not abiding by the grammar rules defined in XML regulation;
    2. Well Formed Document—abiding by the XML grammar, but not abiding by the specified logic structure;
    3. Valid Document—abiding by the XML grammar as well as the specified logic structure.
    The Document Type Definitions (DTDs) are used to define logic structures of the XML document to decide how XML texts are organized. The XML processors may use DTD to verify the validity of an XML document logic structure in its running time.
    The DTD uses grammars different from that of XML to declare the element types and attributes of XML documents. According to the relative location with XML, DTD is classified into two kinds: exterior DTD (within the XML document, call a DTD out of the document) and interior DTD (within the XML document, directly defined DTD). The exterior DTD has the advantage that it allows multiple XML documents to share one unified and well-defined DTD.
    A DTD consists of tag declarations, parameter entity reference, and beginning and ending tags. The tag declaration of DTD includes element type declaration, element attribute declaration, entity declaration, comment declaration, and so on. The following is a brief introduction to element type declaration and element attribute declaration.
    1. Element Type Declaration. Element type declaration is constituted of the keyword ELEMENT, element name and contents. The contents of the element can be texts, other elements, or EMPTY.
    2. Element Attribute Declaration. Element attribute declaration is an important part of DTD, which makes the contents of XML documents much richer. An element may not contain attributes or contain multiple attributes. The element attribute declaration uses the keyword ATTLIST that includes three parts: Name, Type, and Feature.
      Among these, Name shows the name of the attribute; Type represents the type of the attribute, for example, CDATA means that the attribute only contains character data; Feature indicates other natures of the attribute:
      • #REQUIRED: All instances of this element must have the value of this attribute.
      • #IMPLIED: Ignore this attribute if there is no assigned attribute value of this element in the instances of this element.
      • #FIXED: The value of this attribute is a fixed one.
      • Defaultvalue: The attribute value will be this character string if there is no assigned attribute value.
      For instance, in the example above, the element PMuser has one attribute, with a name of number, the type of number is CDATA—it indicates the attribute is character data, #REQUIRED means the number cannot be omitted.
    Here is an example of a DTD file: PMusers.dtd (the DTD file of the previous XML document PMUsers.xml):The root element type PMUsers contains subelement type PMUser, in which, the “*” means the subelement type PMUser will not appear or appear many times. There are other marks indicating the number of times the element appears. For example, “?” means no appearance or merely once; “+” represents once or many times; if there is no any sign, it means occurrence only once. For instance, the subelement types of element type “name”, FirstName and LastName appear only once. The element types FirstName, LastName, age, sex, city, street, and zipcode are defined as #PCDATA, meaning that they can contain text elements but not marks. The element “hobby” has attribute a type. To depict the XML document more clearly, this book uses the graphic-based methods to define XML document structures.
    Taking the document structure of PMUsers.xml as an example, as shown Figure 5.2, the root element of <PMUsers> consists of none or multiple <PMUsers>. Each <PMUser> consists of elements <name>, <age>, <sex>, and <address>. The element <name> is further defined with subelement of <FirstName> and <LastName>, and the element <address> is further defined by <city>, <street>, and <zipcode>.

In program mining, the communication between various parts is implemented with XML and the general interfaces of the architecture are also XML-based. Therefore, in the process of requirement acquisition and decomposition, we use XML to describe the application domain word table, the user requirements keyword table, and related description information. In this section, we give a brief introduction to the XML document structures that express the First Level Keyword Table and Second Level Keyword Table.


After the user requirements described in natural language are analysed and processed, the information is extracted in the form of a pure text character string, and the results are put into the XML markers that are labeled as assigned message type and then it forms the keyword document of the user requirements. The keyword document is made up of a group of keyword entities that describe the user requirements and relationships between entities.

Different keywords describe the user requirements from different angles, illustrating the information regarding services and function sets that the user needs. Figure 5.3 shows the content contained in the keyword document after understanding and keyword extracting on the first level between the human–machine interface of program mining and a user's service requirement.

In Figure 5.3, each field represents the content as follows:

  • RequestID. The unique identifier the system allocates to the user, used to identify the user who puts requirements into the program mining system.
  • Name. Name of the service required by the users, which may not be necessarily identical with the service name as defined by the systems.
  • Classification. Classifications of user requirements, such as domain, function, and so on.
  • Description. Detalied description of user requirements, the pointer pointing to the area where the original text of user requirement is stored.
  • KeywordList. Keyword list retrieved from the user requirements description.
  • Keyword. Keywords in the keyword table, corresponding to subfunctions. If the keyword needs to be decomposed further, then it corresponds to a certain field in the of the Second Level Keyword document.
  • Relations. Function sequence table of subfunctions corresponding to the keywords in the keyword table.

During the process of retrieving the First Level Keyword Table, the most crucial question is how to identify the keywords. The identification of users' keywords is related to the following two factors. The first one is the domain dictionary and attribute word table obtained from component attributes. According to the dictionary and the table, the human–machine interface of

the program mining system can extract the keywords that are similar to those in the dictionary and the table with various natural language analyzing methods. The second one is the user who submits the requirements. When the users find their anticipated keywords don't appear in the keyword table produced by the system, they can input the user-required keywords interactively.


The result of the user requirements analysis is to subdivide the user anticipated functions into many relatively independent function modules, retrieve the corresponding keywords and restraint conditions from these function sets, and use these restraint conditions (such as precondition, postcondition and invariant) to describe the calling and dependent relations between modules. Generally, the user requirements can be subdivided into several sets of subfunction modules using different strategies. Each set corresponds to a decomposing strategy and, the calling relation and the granularity of each submodule is different in different strategies. Figure 5.4 shows the Second Level Keyword document structure produced by the function decomposing module.

The content of the fields in Figure 5.4 is as follows:

Subfunction comes from the First Level Keyword Document, here representing the subfunctions associated with a certain keyword.

ID: the unique identifier of the subfunction module

RequestID: the ID of the user to whom the subfunction belongs

KeywordList: the list of the keywords extracted from the subfunctions

Relations: the execution relation between different subfunctions

        Signature->InputMessage: subfunction's input message

        Signature->OutputMessage: subfunction's output message

        Restriction->PreCondition: PreCondition means to the previous subfunction in the execution order, identified with subfunction ID.

        Restriction->PostConditon: PostCondition means to the next subfunction in the execution order, identified with subfunction ID.

Similar to the retrieving of the first level keyword, the extracting of sub-function's keywords is also related to the domain dictionary and attribute word table. The system can extract the keywords from the submitted inputs using various analyzing methods of natural language, or by the interactive method, that lets the user input the keywords.

In addition, if the extracted keywords still cannot satisfy the demands of component retrieving, the user may subdivide the subfunction and extract the further keywords. The process of retrieving and document structure are as the same as those on the second level. Here, it is necessary to point out that each keyword must correspond to at least one subfunction, while one sub-function may correspond to multiple keywords. Keywords without subfunctions are empty keywords.


As shown in the section above, the key factor for the program mining system to successfully extract components satisfying user requirements is whether it is able to convert user requirements into proper keywords. How does one extract keywords from the requirements inputted by users? In this section, we will take the natural language input as an example to introduce several commonly used processing methods of natural language.

In the semantic understanding of natural language, intelligent word segmentation is a basic sector, which extracts the core words composing

sentences for semantic analyzing. The human–machine interface of program mining may use the intelligent segmentation technologies to extract core words from the input natural language texts and compare these core words with the words in the domain dictionary and attribute word table to acquire keywords.

Due to the specific features of the language, the segmentation of English words is relatively easy. Here, we mainly discuss the segmentation methods of Chinese words.

At present, there are many Automatic Segmentation methods for Chinese words. Among these, the most frequently used are: Positive Direction Maximum Matching (PDMM), Converse Direction Maximum Matching, Bi-direction Maximum Matching, Segmentation Sign Tagging, Best Matching, and Mechanic Segmentation plus Ambiguity Correction. All these methods can be used in the human–machine interface of the program mining system. Considering the matching with domain dictionary and attribute word table, we adopt the word-table-based segmentation—Positive Direction Maximum Matching.

The flow chart of PDMM is showed in Figure 5.5. The segmentation process with PDMM is as follows:

Step1: Suppose the text to be segmented is a String, the biggest word length in the domain dictionary and attribute work table as MaxLen. Initially let LEN = MaxLen, take out the character string “str” with length LEN from the String in sequence;

Step 2: If “str”is empty, the algorithm ends, otherwise, enter the next step;

Step 3: Match the “str” with the words in the domain dictionary and attribute word table;

Step 4: If the matching succeeds, the “str” is a keyword. Then, put this keyword into the keyword table and assign the values according to the XML text structure shown in Figure 5.3. The pointer that points to the text to be segmented moves forward for LEN Chinese characters; return to Step 1;

Step 5: If the matching fails, if LEN>1, minus LEN by 1, take a character “str” with the length of LEN from the corpus to be segmented, return to Setp 2; if LEN = 1, get a single character word, with length 1, then put this single character word into the word table for acknowledgment, the pointer that points to the corpus to be segmented moves forward for one Chinese character; return to Step 1.


In Step 1, if the string length of the text to be segmented StringLen<MaxLen, then take the “str” as text to be segmented.

In Step 5, the system puts the unmatched words table into the word table to be identified by the user. After the automatic segmentation ends, the system returns the word table for acknowledgment to the user to interactively segment. If the user specifies one of the words in the word table for acknowledgment as keyword, then the system puts the keyword into keyword table and assigns values to each item according to the XML text structure in Figure 5.3. At the same time, the word is put into the domain dictionary and attribute word table after the user's acknowledgment.


When the core words extracted from natural language do not match the elements in the domain dictionary and attribute word table, and the user tasks require further particular subdivision, the system will go on with function decomposition of the service requests. This function-decomposing process may be processed automatically or interactively with the users. Generally, it is easier to implement the system-supported interactive decomposition than automatic decomposition and the accuracy of the interactive method is also higher than that of the automatic one.

The process of function decomposition of the user requirements can be briefly introduced as follows:

First, according to the domain dictionary, the system compares and matches the keywords that need further decomposing, determines the identical or similar entries with the keywords for function decomposition in the domain dictionary, and then determines the corresponding domains of the keywords interactively with users. Second, according to the determined domains and the system-given attribute word table, the system decomposes the subfunctions and acquires the keywords, which is also completed with interaction of the system and users. After the keyword domains corresponding to the functions for decomposition are specified, the system then records and generates the subfunction sequence list after decomposition so as to use them in composing the related components after retrieving.

The function of each subfunction module obtained after decomposing should be implemented by a component. If the module still cannot find the corresponding component, the user can require the system to decompose the function further or require the related intelligent agent to find the new component on the Internet.


As mentioned above, the user's keyword table described with XML can be obtained and generated by adoption of intelligent automatic segmentation and PDMM of the domain dictionary and attribute word table. To accelerate the computing speed of requirements decomposing and improve the accuracy of requirements understanding, the first thing to do is to determine the application domains the user requirements belong to.

The specific method of determining the domains is to compute the similarity according to the comparison between the keyword table of the user

requirements and word table of each application domain in the application domain dictionary. We assume that the domain entries with the best similarity are the domain of the user requirements.

How does one determine the similarity between these entries? We can either use the research results from artificial intelligence, such as the LSI algorithm, or employ the interaction between users and the system to determine the keyword entries.


With the determining of the user requirements domains, which serves as the foundation, the system can decompose and determine the related subfunctions by retrieving related entries in the attribute word table and interacting with the user, and the keywords corresponding to the subfunctions.

The process of determining subfunctions is demonstrated in Figure 5.6. Having determined the user requirements domain, the system extracts the entries related to the domains, inserts them into a candidate keyword table and presents the list to users through the human–machine interface. By interacting with the system, users choose one keyword from the candidate keyword list and acquire the function description of the keyword from the attribute word table. Then, users check whether the function description is consistent with the function decomposing; if yes, the system adds the keyword into the corresponding function set after confirmation from the users; if not, the user chooses the next keyword to continue the function decomposing. After the users' function decomposing is completed, the final function decomposition scheme is generated.

Example 5.1 An example of function decomposition

Suppose the input of the user is “multimedia player”; first time retrieving can get the keyword table: {multimedia player}.

The keyword “Multimedia” corresponds to many entries in the domain dictionary and attribute word table; in other words, the subfunction module of the keyword is uncertain and it needs further decomposition. The system will interact with the user to decompose the function of the service requirement. The process is as follows:

First, the system confirms the domain of the user requirements of “Multimedia” by interacting with the user according to the retrieved keywords.

Second, the system retrieves all entries under the “Multimedia” domain and presents them to the user via the human–machine interface. In the

human–machine interface of program mining, the system extracts the entries related to the domain from the attribute word table, performs a function composition according to the function relation between the entries, and then provides a candidate keyword list for the user, where each keyword corresponds to a subfunction, which is shown in Figure 5.7.

Then, it is the user who selects the keywords one by one from the candidate keyword list to compare and finds out the subfunction module corresponding with the keyword “Multimedia.” Here, the human–machine interface of program mining ought to provide enough function information corresponding to the keywords and intelligent interactive ways to help the user determine the function decomposing scheme. For example, when the user chooses a keyword, the system should automatically retrieve the corresponding function description from the attribute word table.

At last, by interaction, the user finally determines that the corresponding function decomposing scheme of the keyword “Multimedia” is “mp3”, “avi”, “mpg”; then, the keyword list after function decomposing is:

{mp3, avi, Mpeg, Multimedia, Player}.


After the subfunction subsets and related keywords are determined, another problem of function decomposition is to determine the sequence of the sub-functions and relation between them. Only after having determined the sequence of these subfunctions, is it possible to compose the components found according to the sequence and complete the service function requested by users.

The calling sequence and relations between subfunction sets or subfunctions cannot be specified randomly. They must meet the overall requirement of service sets input by the user and satisfy the inherently restricted relation between the function subsets. In addition, the calling sequence and relation are also related to the restricted relation between the corresponding components; otherwise, it results in failure in component retrieving and composing.


Suppose that the user requirement input in natural language is: “I need a multimedia player.” Then, the procedure of user requirement analyzing is as follows:

  1. Retrieving the keyword table of user requirements. The system uses the PDMM to segment the user's input. After interaction with the user, the system gets the following keyword table:
            User Requirement Keyword Table: {multimedia, player}
  2. Determining the user requirement domain. According to the user requirement keywords, the system interacts with the user and determines the service domain being “Multimedia” domain.
  3. Decomposing the user requirement. The system extracts the entries related to “Multimedia” from the attribute word table and presents them to the user through the human–machine interface. After interaction, the system determines the subfunction module corresponding to the keyword “Multimedia.”
    Finally, with interaction, the user determines that the final function decomposing schemes corresponding to the keyword “Multimedia” are “mp3”, “avi”, “mpeg”; then, the keyword table after function decomposing is as follows:
            Keyword List after Function Decomposing:
            {mp3, avi, Mpeg, Multimedia, Player}
    Here, each keyword corresponds to one subfunction module. For convenience of expression, we assign each subfunction module an ID number. For example, the function of the subfunction module with ID number SF01 is to complete the multimedia player interface function and the module with ID number SF02 is to complete the MP3 encoding function.
  4. Determing of sequence table of subfunction modules. The keyword table after function decomposing corresponding to subfunction modules can be used to retrieve components. However, to compose the components, the determination of the function sequence table between sub-function modules is necessary.

In this example, the calling relations between several subfunction modules are shown in Figure 5.8.

As shown in Figure 5.8, there are four subfunction modules derived from user requirement decomposition; among these, the subfunction modules AVI, MPEG, and MP3 correspond to different decoder components of different media formats, while the “player” corresponds to the interface component user required of the multimedia player. They form calling relations. Correspondingly, the sequence of subfunction modules that the four keywords correspond to is as follows:

PreCondition?(“avi”) = “player”;
PreCondition?(“mpeg”) = “player”;
PreCondition?(“mp3”) = “player”;
PostCondition(“player”) = “avi”;
PostCondition(“player”) = “mpeg”;
PostCondition(“player”) = “mp3”;

According to the analysis above, we get the final XML document of the user requirements as follows:


The human–machine interface of the program mining system is mainly used for interaction with users. It receives users' inputs, delivers them to processing modules and presents the optional functions, function decomposing schemes to the users.

The human–machine interface of the program mining system mainly has two types of users. One is the developers of Internet services. They use the program mining system to simplify the services developing work. The other is the end users. They use services through the interface, and at the same time, when the services do not meet their requirements, they input their service requirements with the interface again.

In the following, we will first introduce the design of end-user-oriented interface module design, user's behavior memory and interesting module, and the visualized method for requirement analysis; on this basis, we introduce the design of developer-oriented interface in program mining.


Figure 5.9 is the function module diagram of the human–machine interface in a program mining system.

As shown in the figure, the human–machine interface of the program mining system is divided into the following modules.

  1. User requirements input module. Considering the diversity of the contents and input methods of user requirements in program mining, the human–machine interface provides different input modules, such as the natural language processing module, icon navigation processing module,video processing module, and so on, to handle the different service request from users. With the help of the requirement retrieving module, the users input their own service requirements into the program mining system with various input methods, including natural language input, icon navigation, voice input, and so on.
  2. Keyword extracting module. The user requirement input module receives the user's service requests and submits them to the keyword extracting module. The keyword extracting module first extracts the useful requirement information from the user's service requests, which mainly refers to the keyword table of the user requirements, for the system to analyse the concrete contents and meaning of the user requirements. Section 5.3 has already introduced the methods of extraction of user requirements keywords with the example of natural language input. As for the icon input method, the keyword extraction is relatively simple. It only needs to transform the icon information the user selected into recognizable text information to the system.
  3. User domain determination module. The main task of the user domain determination module is to determine the domain of the user requirements according to the keyword table of user requirements extracted by the keyword retrieving module. On the one hand, it can restrict the subsequent requirement analyzing into a limited domain and improve the accuracy of understanding and reduce the ambiguity and, on the other hand, it can reduce the retrieving range in the component warehouse and thus raise the searching efficiency.
    In the keyword extracting module and user domain determination module, the application domain dictionary will be used. Words stored in the application domain dictionary are the feature words and their parasynonyms that can represent various fields. The entries in the application domain dictionary are retrieved by the developers from the attributes of the related components when building the component warehouse.
  4. Function decomposing module. According to the keywords extracted from users' requirements, the module does further function decomposing within the determined domain range. After the determination of the corresponding subfunctions of the function to be decomposed, the function decomposing module will generate the keyword table and the subfunction sequence table. The keyword table is used for retrieving components, while the subfunction sequence table is used to compose the retrieved components.

After this, the human–machine interface of the program mining system will submit the keywords to the component retrieving module to extract the component from the Local Component Resource Warehouse (LCRW), and the subfunction sequence table to the component composing module to compose the components found.


In program mining, to better satisfy a single user's specific requirements, the system acquires and memorizes such information as the frequently-used services using habits, sets up a user profile configuration file for each user, and stores it in the knowledge repository as the foundation for identifying the user identity and supplying personalized services. Informations needed for the establishment of the user profile may be set up by user during his/her first time log in. The information may also be collected with the agent that automatically records user selections and application of services to gradually set up and improve the user profile.

By establishing the profile of users, the system can provide past cases of user requirement input, services selecting, accessing, and implementation. When users put forward new requirements, the system can also provide necessary help in the process of interaction with users according to the record of using habits and interests found in the user's profile. Thus, the personalized services for user requirements may be better satisfied.


Besides using the natural language processing module to implement requirement analyzing, which we have introduced in Section 5.3, the human–machine

interface of the program mining system also provides the method of visualized service navigation to the end users to help in requirement analyzing and function decomposing.

In the program mining systems, the requirements of the user are classified according to the domains. After determining the user requirements domain through the user domain determination module, the system loads the service navigation module with related information of the domain in time, such as domain description, entries related to this domain in the attribute word table, function description of the subfunction module corresponding to the entries, and so on. Then, it presents them to the user friendly, so as to help the user submit forward requests more exactly and to guide the user gradually in refining the requirement expression. Finally, the system generates an accurate description of user requirements, and therefore provides a base for the subsequent component retrieving, selecting, and composing.

The service navigation module first classifies the candidate selection services and programs according to the application domain and function. Each classification is expressed as an icon image in the navigation interface, combined with the text notes, establishing the graphics context navigation directory in several levels. The user can gradually refine the required services or programs by icon selection.

By establishing a visualized navigation interface, the system is both convenient for the user to submit requirements and also to guide and restrict the user requirements, thus avoiding fuzziness and uncertainty of the requirements. In addition, the classification of application domain and functions in services navigation may adopt the same strategy as that of components in warehouse, so as to establish the relations between user requirements and components, to create conditions for choosing available components within the corresponding range, and to improve the pertinence of component retrieval and acquisition.


The method of program mining is introduced aiming at the user's customized on-demand computing according to requirement in the network environment. It is necessary to design different human–machine interfaces for end users and developers, respectively, according to their different requirement features. For end users, it should be simple and easy to use, while for developers, it should introduce domain information and expert information, aiming at retrieving the user requirements as correctly as possible and enlarging the range of service objects.

The human–machine interface of the developer-oriented program mining system is shown in Figure 5.10. Through the human–machine interface, the developers can use the same methods as the end user to input their own service requirements. At the same time, the system also supports the direct input method of the requirement keywords so as to let the developer conveniently interact with the system and determine his/her own service requirements.

In the process of end-user requirement analyzing, keyword retrieving, user domain determination, and function decomposing are based on the domain dictionary, from which the entries are retrieved by the developer from the attributes of related components when building the component warehouse.

Compared with the situation of end users, the requirement analyzing of developers can be completed either on the basis of the domain dictionary or directly based on the component warehouse. This makes the requirement analyzing more accurate and enables the subfunction modules to correspond directly to the components. If the decomposed subfunction module still cannot find a corresponding component, then the developer may execute further function decomposition or ask the related intelligent agent to find the new component on the Internet.

Accordingly, the developer-oriented human–machine interface provides a programming interface for component composition for the convenience of the developer to directly compose the components found.

At the same time, the developer-oriented human–machine interface also provides maintenance tools for the application domain dictionary and user behavior repository.

Supported by these tools, service developers can use the program mining system for service development to better support requests from end users.


Borenstein, N., and Freed, N. (1993). MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies, RFC 1521 September.

Chan, S. W. K., and Franklin, J. (2003). Dynamic context generation for natural language understanding: a multifaceted knowledge approach. IEEE Transactions on Systems, Man and Cybernetics, Part A, 3(31), 23–41.

Document Type Definition.

EXtensible Markup Language (XML) 1.0.

Fielding, R., and Gettys, J., et al. (1997). Hypertext Transfer Protocol — HTTP/1.1. RFC 2068, January.

Hai, Z. G. (2000). A problem-oriented and rule-based component repository. TheJournal of Systems and Software, 50, 201–208.

Hristidis, V., Papakonstantinou, Y., and Balmin, A. (2003). Keyword proximity search on XML graphs. Proc. of the 19th International Conference on Data Engineering, 5–8 March, 367–378.

Lawrence, S., Giles, C. L., and Fong, S. (2000). Natural language grammatical inference with recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering, 12(1), 126–140.

Meuller, A., Mundt, T., and Lindner, W. (2001). Using XML to semi-automatically derive user interfaces. Second International Workshop on User Interfaces to Data Intensive Systems, May 31 to June.

Nakauchi, K., Ishikawa, Y., Morikawa, H., and Aoyama, T. (2003). Peer-to-peer keyword search using keyword relationship. Proc. of the CCGrid 2003. 3rd IEEE/ ACM International Symposium on Cluster Computing and the Grid, 12–15 May, (pp. 359–366).

Ning, K., and Meng, L. M. (2003). Design and implementation of the DTD-based XML parser. Proc. of the International Conference on Communication Technology, ICCT 2003, 2, 1634–1637.

North, S., and Hermans, P. (1999). Teach yourself XML in 21 days. SAMS Publishing.

Overview of SGML Resources.

Oyama, S., Kokubo, T., and Ishida, T. (2004). Domain-specific Web search with keyword spices. IEEE Transactions on Knowledge and Data Engineering, 16(1), 17–27.

Shi, H. C., Shang, Y., and Ren, F. J. (2001). Using natural language to access databases on the Web. IEEE International Conference on Systems, Man, and Cybernetics, 1, 429–434.

Sun Microsystems. (1998). JavaTM speech grammar format specification, Version 1.0, October 26, 1998, from

Teppo, A., and Vuorimaa, P. (2001). Speech interface implementation for XML browser. Proc. of the 2001 International Conference on Auditory Display, Espoo, Finland, July 29 to August 1.

Web Services Description Language(WSDL)1.1. March, 2000.

Wei, Z. Z., Zhang, Y. X., Xu, K. G., and Li, X. Characterizing Software Component in Program Mining. CSIT2001, Dec. TaiWan.

Xu, C. Z., and Ibrahim, T. I. (2004). A keyword-based semantic prefetching approach in Internet news services. IEEE Transactions on Knowledge and Data Engineering, 16(5), 601–611.

Ying, X. M., Liu, M., and Dong, W. H. (2001). User-interested-keywords set discovery using rough sets for intelligent information agents. Proc. of the 2001 International Conferences on Info-tech and Info-net (ICII 2001), Beijing, 4, 61–66. IEEE (IEEE Press).

Yoo, C. S., Woo, S. M., and Kim, Y. S. (1999). Automatic generation algorithm of uniform DTD for structured documents. Proc. of the IEEE Region 10 Conference, TENCON 99, 2, 1095–1098. IEEE (IEEE Press).

About this article

Description of User Requirements and Interface for Program Mining

Updated About content Print Article