|This is essentially a copy of a draft of a paper I'm tentatively writing for submission to the Journal of American Medical Informatics Association. As such, it is divided into sections:
This indeed is an explicit goal of the Digital Anatomist Foundational Model (FM) of anatomy. Its modeling and representation of not only the concepts but even more so their relationships forms a powerful body of knowledge that enable the creation of higher level applications with a potentially greater practical use in both the educational and clinical settings.
This report presents one such application. This application is both a graphical user interface to the FM database, and a reasoning agent that performs data manipulation and processing on the information provided by the FM in order to answer questions going beyond simple data retrieval.
With this in mind, the interface used to query the database has the basic structure of a sentence in the form of "Subject Relation Object". Each of the three components (subject, relation, or object) has as its valid domain data entered into the FM database. We therefore considered it best to provide the user with a list of values that could be submitted to the program as a valid subject, relation, or object, rather than something more high level and difficult to process, such as a natural language query. The list of possible entries is displayed in a data-tree format for ease of maneuvering, and the query once processed appears below it, along with the results of the query.
The structure of the three-component sentence lends itself intuitively to the creation of queries. Specifically, any sentence in which at least two out of the three components is known logically seems to be enough to specify a valid query, and enough to return all answers that could have been represented by the unknown. The following section details the different types of queries possible, which are all variations of this concept of supplying at least two thirds of the components of the basic "subject relation object" sentence, and receiving back either the third party, or as we shall, see, a Boolean reply.
The omission of the suffix indicates transitive closure. Submitting the query "Unknown is part of Esophagus" prompts the program to not only retrieve everything entered as a part of Esophagus directly, but to in turn retrieve everything that is a part of everything that is a direct part of Esophagus, and so forth until no more concepts can be retrieved. Thus, a concept in the answer set could be several concepts removed from Esophagus, but connected to it through a series of ìis part ofî relations.
Almost all the relations seen by the user in actuality conceal two relations, because almost every relation has a distinct directionality. "Is bounded by" is not the same relation as "is boundary of", and yet clearly there is a connection between them. The difference lies in the position of the subject and object in their placement around the relation. The esophagus is bound by the external surface of the esophagus; flip the position of the subject (esophagus) and the object (external surface of the esophagus), and we use the other relation: the external surface of the esophagus is boundary of the esophagus. We therefore denote these pairs of relations as inverses of each other, and to simplify the interface, we only include explicitly one of them in the list of valid relations. The other is accessed through the corresponding placement of the unknown variable. "Unknown is boundary of Esophagus" causes the look-up of Esophagus' "is bounded by" field, whereas "Esophagus is boundary of Unknown" causes the look-up of Esophagus' "is boundary of" field. All components of the query remain the same, but a different positioning of subject and object causes a different query to be asked by the program, although this detail is hidden from the user in the spirit of data abstraction.
This feature exists with a view toward answering queries such as "Heart has part X, X contains Y" where Y is the desired answer, and X represents the same concept in both sentences. This problem could be solved practically by breaking it up into two queries. The user could first ask "Unknown is part of Heart", which generates a variable Ux to represent all the concepts that are a part of the heart. The user then could take this variable Ux and formulate the query "Unknown is contained in Ux". All the concepts returned by this query gives the answer to the query "Heart has part X, X contains Y".
Using a Ux as a subject or object and Unknown as the other will itself generate another Ux, since "Unknown" was used in either the subject or object. This in turn can be used in a query, and so forth. Each query generated from a Ux, called composite queries, retain the information of the generating Ux in a data-tree format, with leaves being the answer and the parent nodes being the concepts that produced that answer. In the previous example, we would know the intermediate X that produced each particular Y, because of the representation of the information of such composite queries in a tree. This data representation and the overall ability of output to become input lends itself to mutiple, nested calls that allow for the building up of complex, highly specific data sets.
The following illustrates the repeated use of composite queries, and their data representation. "Unknown is part of Esophagus" provides all concepts that is in the transitive closure of Esophagus' has-part network. That query also generates a variable named U1 that represents all those concepts. We then use U1 as if it were a concept itself, and ask "U1 is continuous with(directly) Unknown". The answer given retains the information of what member of U1 generated each specific answer concept. Thus, the wall of the stomach and the wall of the pharynx are both continuous with the wall of the esophagus, while the muscle layer of the stomach, which is also an answer, is continuous with the muscle layer of the esophagus. These answers themselves generate a variable name U2, which can further be used in a query like "U2 is kind of Unknown". This feedback of the answer to the previous query adds another layer of depth to the data-tree representing the answer. The tree again shows the intermediate nodes through which the program passed in order to arrive at the answers. The actual answers are again the leaves of the tree; the parents of the leaves are the concepts that produced them. Thus, the wall of the pharynx is a kind of subdivision of the pharynx, the wall of the stomach a kind of subdivision of the stomach, and the muscle layer of the stomach a kind of organ component of the stomach.
This record-keeping of the intermediate steps used to obtain an answer set might be annoying if the user does not particularly care how the answer was computed. In this case, the user has the option of creating a composite relation, and using it in a single query to find concepts several relations removed. By clicking on the "Make relation" button, the user pops a window that allows him to string together several relations that go through unspecified, never-to-be-disclosed intermediate concepts. (see some figure) To duplicate the results of the previous example, we construct a composite relation "is continuous with(directly) X is part of" by clicking on a "Make relation" button, which opens up an additional window allowing a user to select and string together an indefinite number of simple relations. We then use the generated composite relation in the query "Unknown is continuous with(directly) X is part of Esophagus". This is intrinsically the same query as the first two queries made previously: "Unknown (U1) is part of Esophagus" and "U1 is continuous with(directly) Unknown)". This, however, does not give any information at all about the intermediary concepts that produced the answers, which is fine if the user does not care to know.
Ux variables may also be used in Boolean queries for the creation of composite Boolean queries. The answer, either a "Yes" or a "No", would again be located at the leaves, with its parent as the concept to which the Boolean answer applies. It itself does not generate a new Ux variable; no subcomponent of the query accepts either a "Yes" or a "No" as a valid input, and thus cannot use any Ux variable with a "Yes" or "No" value.
This option, however, is not as simple to implement as the previous ones, in that the direction in which to proceed from a concept is the unknown. In previous cases, given at least a subject or an object and a relation, the program had in essence a known starting point and direction, and finding answers to the query was merely a matter of accessing the starting concept's field that corresponded with the relation. With the very direction as the unknown, however, we have the starting and the ending point, but no clue as to which direction to proceed, and thus must search almost all possible relation routes.
The search algorithm is in three parts. The first line of attack is to check all relations separately and distinctly, both as a direct relationship or its transitive closure, by iterating through all the possible relations, constructing a Boolean query with it, and using the program's Boolean query option internally. If at any time the Boolean query function returns a "Yes" as its answer, the current relation is output as the unknown relation.
The program also checks certain composite relations like "(has parts)*, contains", which is known to be useful through discussions with University of Washington anatomists. In keeping with standard regular expression grammar, this relation allows for the traversing of any number of "has part" relations, with a "contains" relation at the end. Thus, this is a valid relation between "Heart" and "Blood", in that "Heart" has part "Right atrium" which has part "cavity of right atrium" which contains "blood". This relation is intuitively meaningful, because if some part of a concept contains another concept, then naturally we consider the second concept to be "inside" the first, although the relation is neither a strict "has part" nor "contains".
This issue of meaningfulness governs the entire relation query search algorithm, and influences, in particular, the third and last part of the algorithm. If no relation is found either through separate inquiries of each of the relations or through the use of special, composite relations known to be meaningful, then as a last resort a brute force method is applied that searches several layers of all possible relations, with the assumption that the shorter number of relations needed to travel from one concept to another, the better.
The FM knowledge base models very well as a directed graph, with each concept being a node and every relation a directed edge. For instance, "Heart" and "Right atrium" can be considered as two vertices of a graph, with a directed edge pointing from "Heart" to "Right atrium" and labeled "has part". If we wish to find the shortest distance from one concept to another, this is equivalent theoretically to the shortest-path problem in graph theory and in theoretical computer science. A famous algorithm used frequently to solve such problems is Dijkstra's algorithm, which takes into account the fact that edges can be weighted, and the shortest path from one vertex to another is the path whose edges have the minimum combined edge weights. In the current situation, each edge, which represents a relation, is of equal weight; the relation "has part" is not considered to be a more significant and meaningful one than "is continuous with", although that might change in a more sophisticated version of the program. Given that all edge weights are equal, Dijkstra's algorithm becomes in essence a breadth-first search with one of the concepts as the root. This search paradigm searches all the vertices one edge away before searching those that are two edges away, which in turn is searched before the program moves on to consider those that are three edges away from the starting root concept, and so forth. Therefore, if two valid relation chains exist that travel from one vertex to another, the one of shorter length will be discovered, and thus outputted, before the second, longer one. The concept of meaningfulness intrudes, however, because the data set being modeled is the human body, and everything is connected to everything else ultimately. In graph theory terms, the graph used to represent the data set is strongly connected (every vertex is reachable from every other vertex), and a relation will always be found if we let the breadth-first search run long enough. This relation it finds, however, might have no significant meaning if the path is dozens or even hundreds of relations long, and the computational cost will become considerable, if every concept reached has a number of concepts reachable by it, which in turns has a few concepts connected to them, etc. The number of concepts the program will ultimately consider grows exponentially at every level, and thus no practical use can be found in expending much computational time and resources to find a shortest path that might not have significant meaning in any case. The breadth-first search is therefore implemented to be depth limited: we may search through vertices in a breadth-first fashion, but after a user-defined depth, the program stops and reports of no relation found. The current value is four; any relation chain containing more than four relations is not found by the program.
The user performs such set operations graphically based on the table containing all the queries; all the queries are recorded there, and thus provide a handy reference to their answer data sets. Beside each query is a NOT checkbox which may be checked to indicate the complement of the data set, and the LOGIC column is actually a pull-down menu from which the user may select either the logical AND for intersection, or the logical OR for union.
An example would be the question: "What are all the parts of the esophagus that are not a kind of subdivision of the esophagus?" The question is in two parts: the unknown must be part of the esophagus, and the unknown must not be a kind of subdivision of the esophagus. We thus ask these two queries, and the answer must lie in the intersection between the first query and the complement of the second. By checking the NOT box for the second query then selecting both queries, we see that the only answer residing in the intersection is "Lumen of the esophagus".
This capability of performing set operations in essence allows the user to perform more specific combining and filtering of answer sets. The ability to save such sets to a file and later import them into future executions of the program allows such customized sets to be saved for potential use in future queries as input.
This program also explores some of the reasoning possible using the FM, going beyond simple data retrieval. The time saving and use of such processing is obvious; chasing down answers through multiple concept frames and relations over and over again is exactly the sort of repetitive, structured data processing in which computers far surpass humans in ability. This reasoning engine enables a higher level of abstraction, and allows a user to much more efficiently retrieve anatomical information not directly entered into the database.
Uses may be found in a clinical or educational setting, among others. For example, found in an actual book of sample medical examination questions is the problem:
The brachiocephalic trunk divides into two arteries:This question at its core deals with the "branch of" relationship, (because the brachiocephalic trunk is an artery, versus the "tributary of" relationship for veins), and its query becomes: "Unknown is branch of brachiocephalic artery". Based on the query results, the second option B is the right answer.
In the medical condition known as pulmonary embolism, a blood clot that forms in a leg vein can travel up to the lung and lodge there, fatally obstructing oxygen flow. This program may be used to trace the path the blood clot followed on its way to the lung with a series of queries. To find all the veins of the leg that potentially could develop a blood clot, we begin with the query: "Unknown (U1) is kind of Vein of free lower limb". Then we can take all answers, which is given the label "U1", and find all veins to which they are a tributary: "U1 is tributary of Unknown (U2)". Let's say, as a rusty anatomist, we know we want the vein that ends up continuous with some part of the heart, but we have forgotten its name. We then construct a composite relation (see some figure) which we use in another query: "Unknown (U3) is continuous with X is part of Heart". This gives all the answers to structures that are continuous with an unknown that we know is a part of the heart. By taking the intersection of U2 and U3, the only concepts left will be those that have a vein of the leg as a tributary, and which also is continuous with a part of the heart; the only concept left is the inferior vena cava. We might want to pursue the blood clot's path further, and investigate what the inferior vena cava is continuous with, taking the transitive closure of this relation under the reasoning that anything that is continuous at any level with the inferior vena cava is physically a viable location to which the blood clot may travel. This should pull up right atrium, right ventricle, and pulmonary artery. With pulmonary artery as the furthest along the transitive closure of the "is continuous with" relation, logically this is the furthest location so far to which we have traced the blood clot. We hope, based on our rusty anatomical knowledge, that this is close enough to the lung to try the depth-limited relational query in order to find the shortest path between them: "Pulmonary Artery Unknown Lung". This should pull up something meaningful, like pulmonary artery has branch (or maybe has part) intrapulmonary part of pulmonary artery is part of lung. This provides the complete path of the blood clot as it travels from any number of leg veins to the lung itself, going through a series of queries that would have taken much longer had a user merely employed ProtÈgÈ as the search tool. Although this data is not presently entered in the database, this serves to illustrate the potential use of the program.
Of course, this example illustrates the fact that using the program in a profitable way requires an adequate understanding of the different types of queries supportable, and some experience and savvy in breaking a query into its component questions and then using their answer sets. Outside anatomical knowledge is a definite plus in formulating queries as well. All these considerations point to an obvious further step: as this program took more high-level queries and through reasoning boiled them into the simple data-retrieval queries ProtÈgÈ could answer, so too could another program take even higher level questions and process them into ones this program can understand. Thus the program, in a future setting, could have more use as a query engine in its own right than as a graphical user interface, functioning more as an API to the next version of a graphical user interface. The previous pulmonary embolism example highlights this area of further development: the answer to the blood clot path question is demonstrably answerable by the current program through a series of clever questions and a bit of outside knowledge (the venous tree ultimately connects to the heart, the heart is continuous with various items, etc.). A more sophisticated program could remove the need for such savoir-faire, and allow the user to specify in some form the general question "What is the path of the blood from a vein in the leg to the lung?" The future program itself would then generate the afore mentioned queries to be fed into the present program, and would know to string the answers to the separate parts of the query together in a way that would list in proper anatomical order the series of anatomical entities the blood clot passes through. This principle of building programs that could take a high level query and translate it to a lower-level one answerable by another program could hopefully be applied again and again. Eventually, a future program could reach a state of possessing the ability to input a very high-level query, potentially on the level of natural language, and output the correct answer. This of course would of great use to medical education and clinical diagnosis and treatment, if any question concerning anatomy could be submitted to the program and answered quickly and accurately. This would also be a validation for the need and use of the Foundational Model as a much richer knowledge base than any currently available, and a wonderful illustration of its potential power.
|"I pursue toward the goal for the prize to which God in Christ Jesus has called me upward." Philippians 3:14|