Knowledge Machine Research Group
Kiev, Ukraine, E-mail: gkm-ekp at users.sf.net
Intellectual activity, knowledge, information, data... An attempt to define it in an applicable way
Intellect Modeling Discussion Paper
By Konstantin M Golubev
Revised 9-Dec-1999, 1-Sep-2008, 15-Oct-2010, October-2017
"We are drowning in information but starved for
(John Naisbitt, author of Megatrend)
If you have no problems - simply enjoy your life! You need no any papers on intellectual activity, knowledge etc. This rather complex paper is intended for those who have problems, want to solve them and don't know how. It is also can be helpful for those which business is to help others to solve their problems and find opportunities.
What is intellectual activity?
As we know, the main tool, which humans use to survive, is an intellect. Without intellect all other tools are useless and even can do harm instead of good. Let's try to understand what is intellectual activity of a man and how to make it fruitful.
"He possesses two out of the three qualities necessary for the ideal detective. He has the power of observation and that of deduction. He is only wanting in knowledge, and that may come in time." (Mr Sherlock Holmes)
THE SIGN OF FOUR, p.91. Sir Arthur Conan Doyle. The Penguin Complete Sherlock Holmes. With a preface of Christopher Morley. Penguin Books, 1981.
We would like to mention that Sherlock Holmes stories were written by Sir Arthur Conan Doyle to illustrate methods of the intellectual activity of brilliant experts like Dr. Joseph Bell of the Edinburgh Infirmary (see preface of Christopher Morley). Sir Arthur Conan Doyle was known expert too. Therefore we believe that he described the intellectual activity in a right way.
Following Mr Sherlock Holmes, we can find following steps of intellectual activity:
Observation - getting data and information
Producing propositions, based on the knowledge
Selection and verification of the most appropriate propositions
Memorizing - converting data to information and new knowledge item creation
Abstraction finding – building artificial objects representing group of real objects, featuring typical signs of
1st step. Observation
The first step is collection of data and information. Without it all other steps are senseless. We propose to treat as 'data' everything that could be perceived by a man: text, sound, pictures, multimedia etc. And part of data, which is directly connected to the knowledge possessed by a man, we propose to call 'information'. It is the part that is really involved in a problems solving. For example, imagine that you are listening to a very interesting lecture and speaker is using sometimes a language that you don't understand. Then that part of the lecture in a foreign language is simply 'data' for you just like music. And part of the lecture in a local language may contain valuable information helping you. It is obvious that this part is variable depending on the experience of the person - what is valuable for one person, may be useless for other.
2nd step. Producing propositions based on the knowledge
Knowledge is a simplified model of certain human intellect which can assist to answer questions like “What is it and how it is called?” and “What we should do in a present situation?” Elementary items of knowledge have form of "If Situation Then Action" rules sometimes called productions as introduced by famous experts in Artificial Intelligence Alan Newell and Herbert Simon - developers of General Problem Solver.
Taking into account this definition, we propose to define knowledge elements as 3-parts stable memory patterns. Each pattern contains:
1. Description of a problem - data, memorized at the time of a
2. Name of a problem - should be unique text
3. Description of a problem solution - the actions needed for a solution verification and application, expected result
How to describe a problem and its solution?
People, as a rule, use words to describe their problems. There are so many words, and number of their combinations is countless. It may seem impossible to find a sense in a such huge amount of data. But, fortunately, what is really important - it's ideas. Plenty of words are just like clothes that people wear. The same man can wear different garment and remain unchanged as a personality. Therefore we believe that description may be transformed into a set of standard ideas. We propose to define "idea's text" as a standard text unequivocally defining a specific side of a situation i.e. representing a stable structure in a brain's right part responsible for images of the world. We think that intellectual activity is based on ideas as images of the world, but not on specific words representing them. This text should not include any excessive words and the words included should always have the right sense. For example, people may say: "It looks so green to me"; "I think it's a greenish stuff"; "It reminds me a fresh grass." The idea's text should be: "The color is green." Note that people can express the same idea with words absolutely different.
And what the number of such ideas could be?
Famous American psychologist Mr Cattel in his work "Universal Index of Source Traits" has made an attempt to propose a list of items for a human personality features description. Preliminary list included 4,550 different items used by many authors. After excluding synonyms it was appeared that only 171 were left. The same result we have got from our experience (medicine, art, banking, business etc). Number of ideas used for a description of problems and their solutions in a particular area of knowledge, which may be learned by one man, was always not exceeded several hundreds. For example, we have found that Oriental Acupuncture (medical theory) is based on less than 500 ideas. But it is the great knowledge. It seems that this limitation of ideas number might be the individual human brain restraint. Take into account that human brain is a multi-floor building and what we see is only upper floor. Though we think that there should exist areas which are complex by the nature. For example, total list of possible diseases of World Health Organization exceeds several tens thousand items - absolutely beyond possibilities of any expert.
We should note that only humans have the ability to define ideas and to exclude synonyms. It is highly creative intellectual activity.
An expert produces propositions based on his own knowledge:
"As a rule, when I have heard some slight indication of the course of events, I am able to guide myself by the thousands of other similar cases which occur to my memory." (Mr Sherlock Holmes)
THE READ-HEADED LEAGUE, p.176
3rd step. Selection and verification of the most appropriate propositions
"...you now pretend to deduce this knowledge I could only say what was the balance of probability. I did not at all expect to be so accurate." (Mr Sherlock Holmes)
THE SIGN OF FOUR, p.93
"For example, observation shows me that you have been to the Wigmore Street Post-Office this morning, but deduction lets me know that when there you dispatched a telegram... The rest is deduction...Why, of course I knew that you had not written a letter, since I sat opposite to you all morning. I see also in your open desk that you have a sheet of stamps and a thick bundle of postcards. What could you go into the post-office for, then, but to send a wire ? Eliminate all other factors, and the one which remains must be the truth." (Mr Sherlock Holmes)
THE SIGN OF FOUR, p.91
4th step. Memorizing - converting data to information and new knowledge item creation
If the problem was solved - proven success or proven fail, then there should appear a new memory element including problem's description, problem's name and description of problem's solution. Data is transforming into information and any new problem with similar description will cause possible solution proposition appearing based on this knowledge element.
5th step. Abstraction finding – building artificial objects representing group of real objects, featuring typical signs of group
Abstraction analysis is intended to find groups of similar objects and regularities they are based on. The central point is an ability of human brain to find similar objects descriptions inside memory. The brain can do it almost instantly.
Data Processing, Information and Knowledge Management
The following are quotations from Datamation
What is knowledge management?
Knowledge management is "a set of practices that includes identifying and mapping intellectual assets within organizations, generating new knowledge ..., making vast amounts of corporate information accessible, sharing best practices, and applying management strategies and technology that support all of the above." --CAP Ventures, http://www.capv.com/index.html
It is "a business activity with two primary aspects:
Treating the knowledge component of business activities as an explicit concern of business reflected in strategy, policy, and practice at all levels of the organization;
Making a direct connection between an organization's intellectual assets-both explicit (recorded) and tacit (personal know-how)--and positive business results."
-"Knowledge at Work," an on-line publication, http://www.knowledge-at-work.com
It is "a business practice that refers to the concept of harnessing information and knowledge, and making it effortlessly available to all employees to help them do their jobs more effectively." --Doculabs http://www.doculabs.com
Building a knowledge base
At Broderbund Software of Novato, Calif., internal and Web-based Inference and similar products create casebases from unstructured data, much as database management systems create databases from structured data. These casebases house 7,000 reported problems and solutions for 700 products, says Jim Wilmott, Broderbund's product-support manager. Half of all users' problems are resolved by using the company's Web-based question-and-answer casebase, and nearly three-quarters of the users prefer on-line help to a free phone call, Wilmott says.
The following are quotations from Fulcrum White Paper (www.fulcrum.com).
... Gartner Group notes that:
Information is being created in today’s enterprises at a rate that staggers the imagination of all but IT professionals. Some is being stored in individual "silos" with access only by those directly involved in the function such as departmental staff. Other information is stored in thousands of linear miles of file cabinets in offices, shared departmental areas, libraries and corporate repositories. There is little representation of currency (timeliness) or other indicator of value…(and information often goes) unsighted because those in need are unaware of its existence…
The Knowledge Management Brief Knew Language for New Leverage concludes:
(There is an) overwhelming necessity for adopting a corporate knowledge management strategy that has been created by the pervasiveness of computers and the huge amounts of information they help us create – and, specifically, the networked organization that makes global access and widespread sharing possible. Too much information is almost as bad as not enough. You have to identify what’s relevant, important, and effective.
Knowledge organizations have been characterized as enterprises in which the key asset is knowledge. Their competitive advantage comes from having and effectively using knowledge. Examples include the law office, accounting firm, marketing firm, software company, most government agencies, universities, the military, and significant parts of most manufacturing companies, whether they make cookies or cars.
Montague Institute White Paper (www.montague.com)
What is a knowledge base?
Many companies are using a knowledge base to capture and deploy their intellectual capital.. For example, at consulting firm Booz, Allen, and Hamilton, a knowledge base contains the following kinds of information:
Searchable database with links to job histories, resumes, etc.;
Listing of calendar items, business news, and personnel information;
Informative and interactive training materials;
"Forums" or discussion groups;
Links to departmental Web pages;
Searchable database of consulting specialties
Ideas and examples for clients and external groups (e.g. the media).
Why are knowledge bases important?
Knowledge bases are key to creating, preserving, and deploying intellectual capital -- the know how of employees and the databases, reports, and other intellectual assets they product. The information they contain is:
less expensive to store and disseminate than paper reports;
easier to use for problem solving and decision making than individual computer databases;
easier to search than physical libraries.
What is knowledge base publishing?
Knowledge base publishing is a term we use to describe the process of creating, maintaining, and "promoting" a knowledge base. It's a combination of print and electronic formats as well as a new system of relationships among authors and readers.
While traditional publishing is linear and segmented, knowledge base publishing is weblike and interconnected. The difference is most striking when different departments load existing documents onto a corporate Web site, when they use electronic press releases instead of hard copy releases, or when they give customer access to internal databases, such as ordering and shipping information.
Dataware White Paper (www.dataware.com)
Executives in large organizations know that they must develop better techniques to manage knowledge, which is increasingly becoming their greatest asset. Organizations currently create and maintain knowledge in isolated systems targeted at specific workgroups. For users outside of the workgroup, that knowledge is virtually invisible. Their only options are to spend time looking for it, recreate it, or do their job without it. Each of these options has a price: time, energy and bad decisions.
Brainstorming tools help inspire creative thinking and convert tacit into explicit knowledge. These end user applications help categorize, organize and identify knowledge resources and are therefore useful knowledge creation tools.
"In the rosy future I envision, categorization and organization of knowledge will be a core competence for every firm. This will require strategic thinking about what knowledge is important; development of a knowledge vocabulary (and a thesaurus to accommodate near misses); prolific creation of indices, search tools and navigation aids; and constant refinement and pruning of knowledge categories. Knowledge editors will have to combine sources and add context to transform information into knowledge."
There are many attempts to define it.
One part is connected with general assets management theory. If you have assets, you should manage them, have warehouses for them etc. It results in development of super-database, containing all possible kinds of data sources, with super-search and retrieval engine accessed with universal type of client software, mostly Internet browser. These tools are intended for knowledge consumers.
The other part is connected with knowledge ecology, virtual team development, communities of practice, providing knowledge exteriorization and possibilities for learning. This part is for knowledge producers.
We think that there should be part having the goal to pass significant part of intellectual work to machines. Why? Ability of a person to learn and therefore to apply knowledge is limited - remember school, university etc. Amount of knowledge is becoming greater, but who will learn and apply it? Intranet/Internet and emerging e-knowledge systems methodology may allow all people to learn in an adaptive way and to use great knowledge immediately. We think that advantages are obvious, that's why our research group is developing e-knowledge publishing. It is working, you may test it at http://gkm-ekp.sf.net
An attempt to apply definitions made in the paper
We think that any external data source contains the following parts:
It is everything that could be perceived by a man: text, sound, pictures, multimedia. We believe that the main task on which Information Technologies (IT) now oriented is data management. All kinds of hardware and software are well suited for data capturing and distribution. But who needs this data? If it is collected regardless of people using it - it is senseless. Ordinary users don't need data as it is and can not do such a high skill demanding work as data mining.
That part of data, which is directly connected to the knowledge possessed by a perceiving man. It is obvious that this part is variable depending on the experience of persons.
Usually what users are expecting from IT is to get an information helping them to solve their problems. For example, if you want to fly by plane, you should like to get flights schedule. But data that you will get from computers became an information only in that case when it is relevant to your experience, to your own knowledge. If you don't know what is an airplane - flights schedule will be useless for you. Therefore the main difficulty for users is how to find data that will become an information. That's why search and categorization systems are extremely important part of IT. For users it is not very interesting where needed data resides - in databases, document management systems, HTML files etc. They will prefer to have only one data access point to all sources, including human experts. We see from quotations above that understanding of this fact leads many companies to the implementation of Knowledge Management (KM) systems. But it is rather complicated task to find information with search engines based on the morphological retrieval indexes and not on ideas searching. Try Internet search engines, for example, to find definition of airplane if you don't know what is it. Categorization is easier to use, but it is too rigid and relatively poor for a good description of data sources.
III. Explicit knowledge (external)
From our point of view this is constant and the most valuable part of data source, not depending on any person. And it is the part that we are learning on.
We already have proposed to define knowledge elements as 3-parts stable memory patterns.
1. Description of a problem
2. Name of a problem
3. Description of a problem solution
There is 1st level knowledge - concrete, which was developed during concrete problems solving, and 2nd level knowledge - abstract, which was developed on the basis of concrete knowledge. It is typical situations and solutions, general rules etc.
IV. Individual knowledge (tacit)
Types of tacit knowledge include hands-on skills, special know-how, intuitions, and the like. Michael Polanyi, the first to distinguish tacit from explicit knowledge, stated "We can know more than we can tell."
Sure, almost any kind of knowledge is initially tacit. All intellectual activity of person goes in sub-consciousness, and therefore does not need words. Words appear at a level of consciousness (co-knowledge), which is many times poorer than sub-consciousness.
Any kind of data will be useless if a man has very little own knowledge. Therefore people need to learn to work effectively. And in many cases IT can help. For example, you can use distance learning... But evidently it is very time-consuming activity and, therefore, amount of knowledge learned is small comparatively to all existing knowledge.
There is great amount of applicable knowledge in the world. Before using a man should learn it. To learn it is the task far beyond possibilities of any man. We are becoming richer in knowledge but can not use our treasures. Oddly enough?
We see the following solution. There is need to develop such a machine that has an ability to accept explicit knowledge found in printed/spoken sources (books, articles, databases) as knowledge elements (3-parts stable memory patterns) and transform it into machine-simulated tacit knowledge in a form of intellectual activity support systems. We call it knowledge machine. It was developed as an alternative to traditional Artificial Intelligence. The goal is to assist human intellect on every step of its activity, accept human knowledge and develop new knowledge together with people. The activity of IMK could be verified by human expert on every stage. These e-knowledge systems could be used both for adaptive learning and on-line consulting.
The machines should assist during 5 steps of intellectual activity:
2. Producing propositions, based on the knowledge
3. Selection and verification of the most appropriate propositions
5. Abstraction finding – building artificial objects representing group of real objects, featuring typical signs of group
People access to these systems may be provided with Internet/Intranet. Since these machines have no human restrictions on knowledge volume, it will be possible to input all existing knowledge into them. And all people can use it immediately for adaptive learning and on-line consulting.
All explicit knowledge could be converted by e-knowledge developers to the 3-parts knowledge elements based on ideas. And those thick manuals and extensive knowledge bases will transform to small lists of hundreds ideas.
The system assisting human expert’s activity should comply with the following requirements described by Arthur Conan Doyle in Sherlock Holmes stories.
We would call it a knowledge machine.
Step 1 - Observation
1. A knowledge machine should have maximum possible information about a case before a judgment.
Step 2 - Producing propositions, based on knowledge
2. A knowledge machine should possess maximum possible knowledge in a sphere of implementation.
3. A knowledge machine should possess no excessive knowledge, should have nothing but the tools which may help in doing work.
4. Getting indication of the course of events, a knowledge machine should be able to guide itself by other similar cases which occur to its memory.
5. A knowledge machine should have an ability to take into account not only descriptions of situations in its memory but results as well, providing a possibility to reconstruct a description from a result, i.e. if you told it a result, it would be able to evolve what the steps were which led up to that result.
6. Possessing information about the great number of cases, a knowledge machine should have an ability to find a strong family resemblance about them, i.e. to find templates of typical cases.
7. A knowledge machine should have an ability to explain the grounds of its conclusion.
8. A knowledge machine should arrive at the conclusion for a few seconds after getting a description of case.
9. A knowledge machine should focus on the most unusual in descriptions of situations.
Step 3 - Elimination of impossible propositions
10. A knowledge machine should have an ability to point out all impossible propositions.
Step 4 - Selection and verification of the most appropriate propositions
11. A knowledge machine should estimate a level of a confidence of its propositions.
The technologies of AI as expert systems and neural networks don’t comply with these requirements. And it is a reason why human-AI interaction is complicated at the time. People hardly can trust AI propositions.
Expert system is based on the idea of decision tree, when, with every answer to a program's question, a direction of moving through a tree changes until a final leaf (decision) will be reached.
So not all possible questions will be asked, and not maximum information will be received.
The key elements are decision rules, but no knowledge itself. Not a word about the thousands of other similar cases, about typical cases.
As we see, expert systems originally were designed to be deduction machines. But it is not very reliable to entrust to machine deciding what is absolutely impossible. We think that more fruitful approach is to show what reasons to consider some hypotheses as impossible. And only man should make the final decision.
Neural network is based, as we know, on the idea of teaching of set of elements (neurons), controlling conductivity between them.
A neural network cannot explain reasons of own conclusion in terms that people can understand. So it is very hard to verify its activity and, therefore, to believe.
An expert system is an example of a 'top-down' approach when particular instances of intelligent behavior selected and an attempt to design machines that can replicate that behavior was made. A neural network is an example of 'bottom-up' approach when there is an attempt to study the biological mechanisms that underlie human intelligence and to build machines, which work on similar principles.
IMK technology complies with all 11 requirements and unites 'top-down' and 'bottom-up' approaches. Any human knowledge written and spoken can be uploaded to IMK in a straight way by any expert not familiar with software coding. The IMK components are designed to create ready-to-use software application using simple text files edited by people. IMK assists intellectual activity, but does not replace people.
Intended to replace human experts
Intended to assist human intellect
Based primarily on mathematics
Based on neurophysiology, psychology, knowledge management theory and mathematics
It is practically impossible to transform directly external knowledge sources to expert systems
It is further advancement of traditional publishing, external knowledge sources (books, articles etc) may be transformed into e-knowledge systems easily
Based on the decision rules concept
Based on general knowledge concept
The more complex expert system is - the worse it works
The more complex e-knowledge system is - the better it works
Development has many stages and very expensive
Development has one stage and relatively inexpensive
It is relatively hard work to incoroprate expert system into other information systems due to sequential nature of data input and output
E-knowledge system may be easily incorporated into any kind of information system due to support of wide range of data input and output sources
It is practically impossible to use expert systems for learning, because they are not based on the human knowledge
It may be easily used as forefront for Distance Learning information systems, providing ability of Adaptive Learning, based on the Just-In-Time Knowledge concept
May not be used for new knowledge creation
May be used for new knowledge creation
You are welcome to knowledge that really helps you.
General Knowledge Machine Research Group