Ultsch, A. & Korus, D. „Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks“ Proc. 3rd European Congress on Intelligent Techniques and Soft Computing EUFIT’95, Aachen/Germany, Aug. 28-31, 1995, Vol. I, pp. 326-331. Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks Alfred Ultsch, Dieter Korus FG Informatik, University of Marburg Hans Meerwein Str. (Lahnberge) D-35032 Marburg/Lahn, Germany phone +49 - 6421 - 28 - 21 85 fax +49 - 6421 - 28 - 89 0
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
  Ultsch, A. & Korus, D. „Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks“ Proc.3rd European Congress on Intelligent Techniques and Soft Computing EUFIT’95, Aachen/Germany, Aug. 28-31, 1995,Vol. I, pp. 326-331.1 Automatic Acquisition of Symbolic Knowledgefrom Subsymbolic Neural Networks Alfred Ultsch, Dieter KorusFG Informatik, University of MarburgHans Meerwein Str. (Lahnberge)D-35032 Marburg/Lahn, Germanyphone +49 - 6421 - 28 - 21 85fax +49 - 6421 - 28 - 89 02email Abstract Knowledge acquisition is a bottleneck in AI applications. Neural learning is a new perspective inknowledge acquisition. In our approach we have extended Kohonen's self-organizing feature maps(SOFM) by the U-matrix method for the discovery of structures resp. classes. We have developed amachine learning algorithm, called SIG* , which automated extracts rules out of SOFM which aretrained to classify high-dimensional data. SIG* selects significant attributes and constructs appropriateconditions for them in order to characterize each class. And SIG* generates also differentiating rules,which distinguish classes from each other. The algorithm has been tested on many different data setswith promising results. The framework of using SIG* integrated in a system which automated acquiresknowledge from learned SOFM is also presented. An additional approach to extract fuzzy rules out of aSOFM will be developed Keywords: Knowledge Acquisition, Machine Learning, Neural Networks, Fuzzy Rules 1. Introduction Knowledge acquisition is often a bottleneck in AI applications. Many expert systems use knowledge in symbolic form(e.g. rules, frames, etc. ). For human experts it is, however, difficult to formulate their knowledge in these formalisms.Different approaches to the problem of knowledge acquisition have been proposed, for instance interviews with expertsby knowledge engineers etc. These approaches concentrate often on how to interact with the experts in order to get aformulation of their knowledge in symbolic form. Here we follow a different approach: experts have gained theirexpertise by experiences, i.e. by dealing with cases. In order to get the experts' knowledge into an expert system wepropose to process the case data in the attempt to learn the particularities of the domains. In this paper we use artificialneural networks (ANN) for the first step of processing the data. ANN with unsupervised learning can adapt to structuresinherent in a data set, i.e. the internal structure of ANN reflects structural features in the data [Ultsch/92]. Suitable ANNexhibit the property to produce their structure during learning by the integration (overlay) of many case data. This isoften termed as processing subsymbolic data.Kohonen's self-organizing feature maps (SOFM) [Kohonen/89] have the property that the neighbourhood among thetraining data, perheps in a high dimensional space, is reflected in the neighbourhood of the units on the generatedfeature map, practically in a 1, 2, or 3 dimensional space. We can make use of this property of SOFM to discoverstructures in high dimensional data and map them into a lower dimensional space. For SOFM we have developed amethod, called U-matrix method (UMM), to detect and display the structures learned from the data [Ultsch/90]. Usingthe UMM a trained feature map is transformed into a landscape with hills or walls seperating different regionswhere cases are located [Ultsch/91a]. All cases that lay in a common basin are considered to have a strong similarity i.e.have some common structural properties. With the algorithm presented in the sequel we attempt to extract a symbolicdescription of the similarities from the trained SOFM, i.e. to come to a symbolic general description of the cases.An inductive machine learning algorithm called SIG* [Ultsch/91a] takes the training data with the classificationdetected through the learned SOFM as input, generates rules for characterizing and differentiating the classes of the  Ultsch, A. & Korus, D. „Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks“ Proc.3rd European Congress on Intelligent Techniques and Soft Computing EUFIT’95, Aachen/Germany, Aug. 28-31, 1995,Vol. I, pp. 326-331.2data. We have developed a system, called REGINA , which uses SIG* as a knowledge acquisition tool for a diagnosisexpert system while using SOFM as a neural classifier [Ultsch/92].The examples for learning may be incomplete or even inconsistent. Therefore the extracted rules should also be faulttolerant. A promising approach to this is to use fuzzy set calculus [Enbutsu/91] [Mukaidono/92] [Weber/91] [Yi/92].We have developed an alternative approach to SIG* to generate fuzzy membership functions and rules out of a SOFM[Ultsch/91b].In section 2 the system REGINA is briefly depicted. The idea of  SIG* and the way SIG* works is described with anexample in section 3. Section 4 describes an alternative approach to extract Fuzzy membership functions and rules outof a neural classification. Finally a summary of applications and conclusions gives an overview of this work, andsuggests the future work on SIG* . 2. Overview of REGINA The system REGINA consists of five major modules:ã neural classifierã analysing toolsã rule extractionã inferenceIn Regina the raw data are firstly processed such that they can be used to train Kohonen's self-organizing feature maps(SOFM). After learning of SOFM we have the neighbourhood structure among the training data implicit on SOFM.Using analysing tools, in particular the U-Matrix method [Ultsch/91a] , the neighbourhood structure on learned SOFMcan be visually recognized. The training data are transfered to rule extraction. SIG* takes the training data with theclassification detected through SOFM as input and generates symbolic rules. The extracted rules, the information in theneural classifier and associative memory as well and the experts' rules in addition are employed in inference. 3. Rule Generation with SIG* SIG* has been developed in the context of medical applications [Ultsch 91a]. In this domain other rule-generatingalgorithms such as ID3 [Quinlan/83], for example, fail to produce suiting rules. SIG* takes a data set in the space Rnthat has been classified by SOFM/UMM as input and produces descriptions of the classes in the form of decision rules.For each class an essential rule, called characterizing rule, is generated, which describes that class. Additional rules thatdistinguish between different classes are also generated. These are called differenciating rules. This models the typicaldifferential-diagnosing approach of medical experts, but is a very common approach in other domains as well. Thegenerated rules by SIG*, in particular, take the significance of the different structural properties of the classes intoaccount. If only a few properties account for most of the cases of a class, the rules are kept very simple.Two central problems are addressed by the SIG* algorithm:1. how to decide which attributes of the data are significant so as to characterize each class,2. how to formulize apt conditions for each selected significant attribute.In order to solve the first problem, each attribute of a class is associated with a significance value . The significancevalue can be obtained, for example, by means of statistical measures. For the second problem we can make use of thedistribution properties of the attributes of a class. In the following we use an example to describe the SIG* algorithm.The complete and formal description can be found in [Ultsch/91a]. 3.1. Selecting Significant Attributes for a Class As an example, we assume a data set of case-vectors with five attributes Attr1, Attr2, Attr3, Attr4, Attr5. LetSOFM/UMM distinguish in the example four classes Cl1, Cl2, Cl3, Cl4. Let SVij denote the significance value of Attriin class Cj . The matrix SM=(SVij)5x4 we call significance matrix . For our example the significance matrix may begiven as follows:  Ultsch, A. & Korus, D. „Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks“ Proc.3rd European Congress on Intelligent Techniques and Soft Computing EUFIT’95, Aachen/Germany, Aug. 28-31, 1995,Vol. I, pp. 326-331.3SM Cl1Cl2Cl3Cl4 Attr 1 1.546 * 3.1  Attr 2 3.13.220 * 6.4  Attr 3 *  Attr 4 68.3 * 5.72.7  Attr 5 89.5 * 6.27.3In this matrix the largest value in each row is marked with an asterisk (*).In order to detect the attributes that are most characteristic for the description of a class, the significance values of theattributes are normalized in percentage of the total sum of significance values of a class. Then these normalized valuesare ordered in decreasing order. For Cl1 and Cl3, for example, these ordered attributes are:percentualsignificanceCl1CumulativeAttr533.89%33.89%Attr425.42%59.31%Attr321.19%80.50%Attr213.14%93.64%Attr16.36%100.00%percentualsignificanceCl3CumulativeAttr2 * 50.38%50.38%Attr515.62%66.00%Attr1 * 15.11%81.11%Attr414.36%95.47%Attr34.53%100.00%As significant attributes for the description of a class, the attributes with the largest significance value in the orderedsequence are taken until the cumulative percentage equals or exceeds a given threshold value. For a threshold value of 50% in the above example Attr5 and Attr4 would be selected for Class Cl1. For Cl3 only Attr2 would be considered.For this class there are attributes, however, that have been marked with an asterisk (see above): Attr2 and Attr1. If thereare any marked attributes, that are not considered so far, as in our example Attr1, they are also considered for a sensibledescription of the given class. So the descriptive attributes for our examples would be:for Cl1: Attr5, Attr4 and for Cl3: Attr2 and Attr1.The same algorithm is performed for all classes and all attributes and gives for each class the set of significant attributesto be used in a meanigful but not over detailed description of the class. If an attribute is exceedingly more significantthan all others, (consider for example Attr2 for Cl3) only very few attributes are selected. On the other hand, if almostall attributes posses the same significance considerably more attributes are taken into account. The addition of allasterisked attribues assures, that those attributes are considered for which the given class is the most significant. 3.2. Constructing Conditions for the Significant Attributes of a Class A class is described by a number of conditions about the attributes selected by the algorithm described above. If theseconditions are too strong, many cases may not be correctly diagnosed. If the conditions are too soft, cases that do notbelong to a certain class are erroneously subsumed under that class. The main problem is to estimate correctly thedistributions of the attributes of a class. If no assumption on the distribution is made, the minimum and maximum of all  Ultsch, A. & Korus, D. „Automatic Acquisition of Symbolic Knowledge from Subsymbolic Neural Networks“ Proc.3rd European Congress on Intelligent Techniques and Soft Computing EUFIT’95, Aachen/Germany, Aug. 28-31, 1995,Vol. I, pp. 326-331.4those vectors that belong, according to SOFM/UMM, to a certain class may be taken as the limits of the attribute value.In this case a condition of the i-th attribute in the j-th class can look like  attributeij IN [ minij , maxij ] .But this kind of formulization of conditions likely results in an erroneous subsumption .If a normal distribution is assumed for a certain attribute, we know from statistics, that 95% of the attribute values arecaptured in the limits [meanij -2*dev ,meanij +2*dev ] , where dev is the value of the standard deviation of theattribute. For other assumptions about the distribution, two parameters low and hi may be given in SIG*. For this casethe conditions generated are as follows: attributeij IN[ meanij + low * dev, meanij + hi * dev   ]. 3.3. Characterizing Rules and Differentiating Rules The algorithm described in 3.1. and 3.2. produces the essential description of a class. If the intersection of suchdescriptions of two classes A and B is nonempty, i.e. a case may belong to both classes, a finer description of theborderline between the two overlapping classes is necessary. To the characterizing rule of each class a condition isadded that is tested by a differentiating rule. A rule that differentiates between the classes A and B is generated by ananalog algorithm as for the characterizing rules. As significance values however, they may be measured between theparticular classes A and B. The conditions are typically set stronger in the case of characterizing rules. To compensatethis the conditions of the differentiating rules are connected by a logical OR. 4. Alternative Approach to Extract Fuzzy Rules If one wants to get fuzzy rules out of the data instead of sharp rules one approach is to first generate membershipfunctions out of a SOFM. We get a first approximation of the membership functions by computing a histogram for eachattribute and each class, which was discovered by the SOFM. The middle points of the intervalls will be connected to afrequency polygon. In a next step additional data vectors for each intervall of each attribute and each class will begenerated and classified by the above learned SOFM. With the help of these additional classified vectors we get asecond better approximation of the membership functions [Ultsch/91b].To make the rules, which have to be developed, communicatable, we transform the membershiop functions intolinguistic variables. As the result of a poll we got seven linguistic reference variables. To each of it we designed areferencial membership function. To transform the above generated membership functions into a linguistic description,the degree of the correspondation of the membership function to each of the reference functions was computed. Withthe help of these linguistic descriptions we can formulate a complete and communicatiable rule for each class, byconsidering all attributes. To get rules, which can be better understood by the expert of the domain, we removed in alast step the attributes which are not relevant for the conclusion [Ultsch/91b]. 5. Applications and Conclusion We have tested the system REGINA on many data sets from different domains. These include medical andenvironmental problems as well as industrial processes. Up to now the results have been very promising. In some casesknowledge that has not been known to us, but was verified by the domain experts, has been extracted. In most cases theperformance of the generated rules ranged in the 80 to 90 percent class.Our approach has three advantages :(1) the integration of unsupervised neural learning and inductive machine learning in automated knowledge acquisition,(2) a flexible, domain-dependent decision criterion for selecting significant attributes instead of a predeterminedminimal decision criterion (as usual) in rule generation,(3) the possibility for constructing rule conditions in various points of view.The extracted fuzzy rules perform not so well as the rules generated by SIG*. In the near future we will combine theboth approaches.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!