Difference between revisions of "Data-centric knowledge"

From PKC
Jump to navigation Jump to search
 
(49 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Data-centric knowledge is a formalized mapping of concepts to data points. Its universal applicability is based on the representability assumption of [[Kan Extension]]. Kan extension states that all concepts and idealized knowledge are [[representable]] through functors from a domain of complex data types to uniquely identifiable data entries in set-theoretic format. This means that [[knowledge]] of any kinds can all be stored or [[Representable Functor|represented]] using concrete data points stored in databases.
Data-centric knowledge is an approach of explicitly utilizing [[Data Science]] concepts and [[PKC|modern data manipulation instruments]] to organize knowledge. The main driver for organizing knowledge in a data-centric manner is inspired by [[Moore's Law]], which points out the causal connections between the physical dimensions of data manipulation instruments and its impact on socio-technical dynamics. Phenomenologically, [[Moore's Law]] established a functional relationship between dimensionless/scale-free data, with the observable performance of speed and scale of decision-making in commodity devices. Moore's insight enabled a very large range of application areas to utilize data processing instruments that were not possible before, including the management and acquisition of human knowledge. [[MU]] intend to help participants to understand the principles of [[Data Science]] and use [[popularly-available data processing instruments]] to interact with the content data in their own domains of interest.  


=Data-Centric Knowledge under the context of [[MU]]=
Knowledge is represented as a special kind of data based on raw data and computed from priorly established information content under a unifying context of [[MU]] data operations. Every piece of knowledge needs to go through the following stages to be given a representable handle for ongoing integration of knowledge content:
# Grounding Raw Data: This data set is collected from widely deployed user terminals or certified data sensors that should always be annotated with timestamps and spatial tags that explicitly specify who, when and where the data are being collected. These raw data content, especially the timestamps and location/account that provided the data will be used as a reference to determine the authenticity of data.
# Inferred information: The ordering and prioritization of information content are filtered by previously mentioned raw data. This information filtering procedure is conducted by a set of computational inference tools, whose source code are version-controlled based on [[MU]]-compliant rules. Computational procedures specified using Neural networks, Bayesian Belief Networks, System Dynamic models, and other data-intensive inference mechanisms will have their training data set as part of the version-controlled data content.
# Action of Acknowledgement: Knowledge is represented as a set of causal relations that are explicitly coded up as executable programs/contracts defined in [[MU]] compliant [[PKC]]s. An action of acknowledgment can be automatically triggered by verified raw data and programmatically computed information content, including semi-automatically acknowledged by human-in-the-loop authorization of action. The event of acknowledgment can be represented as a piece of authenticated data that possesses pragmatic value, such as a token of appreciation, honor badges, or cash payment. It is the data on the event-of-acknowledgement that we register and represent knowledge content in [[MU]].
==Intellectual Framework==
Data-centric knowledge management differs from traditional knowledge by representing all knowledge content using a logically coherent symbol system and machine-processable language, or a [[meta language]]. This language is inherently independent of domain knowledge and can be customized to satisfy the representational needs of a wide range of domain-specific knowledge. The feasibility of using a small set of symbols to represent and to reason about a wide range of systems has been explained in the following publications<ref>{{:Thesis/A Meta-language for Systems Architecting}}</ref><ref>{{:Paper/Algebra of Systems}}</ref><ref>{{:Thesis/The Algebra of Open and Interconnected Systems}}</ref>.
==Universal Knowledge Representation==
Data-centric knowledge subscribes to a central thesis, that all knowledge content can be universally represented by one data type, composed of [[ordered relation]]s. Its universal applicability is based on the mathematical claim of [[Kan Extension]]. Kan extension states that all concepts and idealized knowledge are [[representable]] through [[functor]]s which is a kind of [[directed relation]]. This means that [[knowledge]] of any kind can all be [[Representable Functor|represented]] using [[ordered relation|ordered entries]] of concrete data points stored in scalable databases.
===Composability and Univesality===
Given the universal assumption of knowledge content representability, it implies that content can always be composed of these [[arrow]]-like universal components. The direct connection between composability and universality is a crucial insight in learning and teaching knowledge-based on [[Data Science]], where we may consider [[Data Science]] being an extension or inclusion of [[quantum mechanics]], where it is a scientific language explicitly designed to encompass data in all possible physical scales and forms. All scientific hypotheses of [[quantum mechanics]] are grounded in observable data and the interpretive mechanisms of these data. Therefore, instead of just focusing on the physical meaning of observable data, we can use the science of data interpretation as a generalized tool for all other areas of intellectual work. In any case, the logic of data is the grounding currency of science, and all data interpretation must follow a [[consistency|consistent]] set of logical rules. Even if one tries to extend the scope of certain logical assertions, the scope is also denoted in a [[soundness|logically sound]] data set.
===Rosetta Stone of Knowledge Content===
To leverage the power of data science at its core, the discovery and recognition of [[Curry-Howard-Lambek Correspondence]] is an abstract form of [[Rosetta Stone]] that provides a framework to perform automated knowledge management. This mathematical observation shows that data, being the exchange currency across the three areas of knowledge representation, logical, algorithmic, and categorical. This cross-referencing mechanism allows one to invoke an adequate model of computation on the available data and provides a unifying strategy to tease out knowledge using computational resources at hand. Henceforth, it enables knowledge processing automation at scale, leveraging existing commodity tools for data computation and communication. This approach broadens and deepens the scope and speed of knowledge management.
===[[Effective Data Processing]]===
Having had a unifying data conversion and complementary models of computation to deal with data, it is likely that a significant amount of pragmatic problems can find effective data processing algorithms or strategies to digest data, therefore present useful information for decision-making. [[Data-centric knowledge]] is about exploiting this principle at all areas of data collection, data refinement, and data publication to maximize the opportunities for identifying valuable content in available data.
==Observability of Data-Centric Knowledge==
[[MU]] is about bringing the [[power of data]] to both individual and organizational awareness. This means that data of different kinds will be continuously processed and reported to enable project and resource management in general. By making data assets observable in their adequate reporting formats, it will significantly improve the quality and quantity of human and organizational activities in a profound way.
===[[Intuitive Data Presentation]]===
The goal of learning is to condition the minds to become cognitively handle the information presented. Therefore, presenting data using adequate user interfaces, and allow decision makers to intuitively navigate the data content space is a crucial part of knowledge management. For instance, modern browsers are embedded with rich media rendering and programmable interactivity, and even Machine Learning algorithms, that could provide context-sensitive interactive experience to users. This is an essential part of [[data-centric knowledge]] management.
===Report Generation as a Mean to an End===
It is necessary to note that observability is just an mean to the end. The goal is not about generating reports, the goal is to elevate awareness from data reporting in context. Therefore, the format and frequency of report generation and data visualization is a form of art, where it needs to be integrated with [[UI/UX]] design efforts extensively.
===[[High Availability]]===
Data services must be available at the times of use. This would require constant maintenance and exercise of usage. Therefore, not only one should practice the provisioning, backup, restore procedures as a knowledge management skill, but also constantly use various data/knowledge content services to learn about the availability issues and potential pitfalls to avoid system downtime. More importantly, by making one's own computable knowledge content highly available, it will incrementally become a reputation-establishing process. It can also be considered as a community service so that when one constantly keep their content routinely updated and refined, the overall perceived value of knowledge content would also increase.
=Areas of Tasks regarding Data Management=
{{:Explaining the 11 key areas of Data Management}}
=Related Pages=
[[Category:Kan Extension]]
[[Category:Kan Extension]]
[[Category:Representable]]
[[Category:Representable]]
[[Category:Smart Contract]]
[[Category:Knowledge Representation]]
[[Category:Report Generation]]

Latest revision as of 03:05, 8 March 2022

Data-centric knowledge is an approach of explicitly utilizing Data Science concepts and modern data manipulation instruments to organize knowledge. The main driver for organizing knowledge in a data-centric manner is inspired by Moore's Law, which points out the causal connections between the physical dimensions of data manipulation instruments and its impact on socio-technical dynamics. Phenomenologically, Moore's Law established a functional relationship between dimensionless/scale-free data, with the observable performance of speed and scale of decision-making in commodity devices. Moore's insight enabled a very large range of application areas to utilize data processing instruments that were not possible before, including the management and acquisition of human knowledge. MU intend to help participants to understand the principles of Data Science and use popularly-available data processing instruments to interact with the content data in their own domains of interest.


Data-Centric Knowledge under the context of MU

Knowledge is represented as a special kind of data based on raw data and computed from priorly established information content under a unifying context of MU data operations. Every piece of knowledge needs to go through the following stages to be given a representable handle for ongoing integration of knowledge content:

  1. Grounding Raw Data: This data set is collected from widely deployed user terminals or certified data sensors that should always be annotated with timestamps and spatial tags that explicitly specify who, when and where the data are being collected. These raw data content, especially the timestamps and location/account that provided the data will be used as a reference to determine the authenticity of data.
  2. Inferred information: The ordering and prioritization of information content are filtered by previously mentioned raw data. This information filtering procedure is conducted by a set of computational inference tools, whose source code are version-controlled based on MU-compliant rules. Computational procedures specified using Neural networks, Bayesian Belief Networks, System Dynamic models, and other data-intensive inference mechanisms will have their training data set as part of the version-controlled data content.
  3. Action of Acknowledgement: Knowledge is represented as a set of causal relations that are explicitly coded up as executable programs/contracts defined in MU compliant PKCs. An action of acknowledgment can be automatically triggered by verified raw data and programmatically computed information content, including semi-automatically acknowledged by human-in-the-loop authorization of action. The event of acknowledgment can be represented as a piece of authenticated data that possesses pragmatic value, such as a token of appreciation, honor badges, or cash payment. It is the data on the event-of-acknowledgement that we register and represent knowledge content in MU.

Intellectual Framework

Data-centric knowledge management differs from traditional knowledge by representing all knowledge content using a logically coherent symbol system and machine-processable language, or a meta language. This language is inherently independent of domain knowledge and can be customized to satisfy the representational needs of a wide range of domain-specific knowledge. The feasibility of using a small set of symbols to represent and to reason about a wide range of systems has been explained in the following publications[1][2][3].

Universal Knowledge Representation

Data-centric knowledge subscribes to a central thesis, that all knowledge content can be universally represented by one data type, composed of ordered relations. Its universal applicability is based on the mathematical claim of Kan Extension. Kan extension states that all concepts and idealized knowledge are representable through functors which is a kind of directed relation. This means that knowledge of any kind can all be represented using ordered entries of concrete data points stored in scalable databases.

Composability and Univesality

Given the universal assumption of knowledge content representability, it implies that content can always be composed of these arrow-like universal components. The direct connection between composability and universality is a crucial insight in learning and teaching knowledge-based on Data Science, where we may consider Data Science being an extension or inclusion of quantum mechanics, where it is a scientific language explicitly designed to encompass data in all possible physical scales and forms. All scientific hypotheses of quantum mechanics are grounded in observable data and the interpretive mechanisms of these data. Therefore, instead of just focusing on the physical meaning of observable data, we can use the science of data interpretation as a generalized tool for all other areas of intellectual work. In any case, the logic of data is the grounding currency of science, and all data interpretation must follow a consistent set of logical rules. Even if one tries to extend the scope of certain logical assertions, the scope is also denoted in a logically sound data set.

Rosetta Stone of Knowledge Content

To leverage the power of data science at its core, the discovery and recognition of Curry-Howard-Lambek Correspondence is an abstract form of Rosetta Stone that provides a framework to perform automated knowledge management. This mathematical observation shows that data, being the exchange currency across the three areas of knowledge representation, logical, algorithmic, and categorical. This cross-referencing mechanism allows one to invoke an adequate model of computation on the available data and provides a unifying strategy to tease out knowledge using computational resources at hand. Henceforth, it enables knowledge processing automation at scale, leveraging existing commodity tools for data computation and communication. This approach broadens and deepens the scope and speed of knowledge management.

Effective Data Processing

Having had a unifying data conversion and complementary models of computation to deal with data, it is likely that a significant amount of pragmatic problems can find effective data processing algorithms or strategies to digest data, therefore present useful information for decision-making. Data-centric knowledge is about exploiting this principle at all areas of data collection, data refinement, and data publication to maximize the opportunities for identifying valuable content in available data.

Observability of Data-Centric Knowledge

MU is about bringing the power of data to both individual and organizational awareness. This means that data of different kinds will be continuously processed and reported to enable project and resource management in general. By making data assets observable in their adequate reporting formats, it will significantly improve the quality and quantity of human and organizational activities in a profound way.

Intuitive Data Presentation

The goal of learning is to condition the minds to become cognitively handle the information presented. Therefore, presenting data using adequate user interfaces, and allow decision makers to intuitively navigate the data content space is a crucial part of knowledge management. For instance, modern browsers are embedded with rich media rendering and programmable interactivity, and even Machine Learning algorithms, that could provide context-sensitive interactive experience to users. This is an essential part of data-centric knowledge management.

Report Generation as a Mean to an End

It is necessary to note that observability is just an mean to the end. The goal is not about generating reports, the goal is to elevate awareness from data reporting in context. Therefore, the format and frequency of report generation and data visualization is a form of art, where it needs to be integrated with UI/UX design efforts extensively.

High Availability

Data services must be available at the times of use. This would require constant maintenance and exercise of usage. Therefore, not only one should practice the provisioning, backup, restore procedures as a knowledge management skill, but also constantly use various data/knowledge content services to learn about the availability issues and potential pitfalls to avoid system downtime. More importantly, by making one's own computable knowledge content highly available, it will incrementally become a reputation-establishing process. It can also be considered as a community service so that when one constantly keep their content routinely updated and refined, the overall perceived value of knowledge content would also increase.

Areas of Tasks regarding Data Management

Anshul Tiwari, a data engineer stated that there are at least 11 areas of data management activities[4]:

  1. Data Governance
  2. Data Architecture
  3. Data Modeling and Design
  4. Data Storage and Operations
  5. Data Security
  6. Data Integration
  7. Documentation and Content
  8. Master Data Management
  9. Data Warehouse and Business Intelligence
  10. Meta Data Management
  11. Data Quality

References

  1. Koo, Hsueh-Yung Benjamin (31 Jan 2005). A Meta-language for Systems Architecting (PDF) (Ph.D.). local page: MIT. Retrieved July 18, 2021. 
  2. Koo, Hsueh-Yung Benjamin; Simmons, Willard; Crawley, Edward (Nov 16, 2021). "Algebra of Systems as a Meta Language for Model Synthesis and Analysis" (PDF). local page: IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. 
  3. Fong, Brendan (2016). The Algebra of Open and Interconnected Systems (PDF) (Ph.D.). local page: University of Oxford. Retrieved October 15, 2021. 
  4. Tiwari, Anshul (Feb 16, 2022). What are the 11 key areas of Data Management and specific data roles?. local page: IT k Funde. 

Related Pages