Data-centric knowledge
Data-centric knowledge is an approach of explicitly applying Data Science concepts and modern data manipulation instruments to organize knowledge. The main driver for organizing knowledge in a data-centric manner is inspired by Moore's Law, which points out the causal connections between the physical dimensions of data manipulation instruments and its impact on socio-technical dynamics. Phenomenologically, Moore's Law established a functional relationship between dimensionless/scale-free data, with the observable performance of speed and scale of decision-making in commodity devices. Moore's insight enabled a very large range of application areas to utilize data processing instruments that were not possible before, including the management and acquisition of human knowledge. MU intend to help participants to understand the principles of Data Science and use late-breaking data processing instruments to interact with the content data in their own domains of interest.
Data-Centric Knowledge under the context of MU
Knowledge is represented as a special kind of data based on raw data and computed from priorly established information content under a unifying context of MU data operations. Every piece of knowledge needs to go through the following stages to be given a representable handle for ongoing integration of knowledge content:
- Grounding Raw Data: This data set is collected from widely deployed user terminals or certified data sensors that should always be annotated with timestamps and spatial tags that explicitly specify who, when and where the data are being collected. These raw data content, especially the timestamps and location/account that provided the data will be used as a reference to determine the authenticity of data.
- Inferred information: The ordering and prioritization of information content are filtered by previously mentioned raw data. This information filtering procedure is conducted by a set of computational inference tools, whose source code are version-controlled based on MU-compliant rules. Computational procedures specified using Neural networks, Bayesian Belief Networks, System Dynamic models, and other data-intensive inference mechanisms will have their training data set as part of the version-controlled data content.
- Action of Acknowledgement: Knowledge is represented as a set of causal relations that are explicitly coded up as executable programs/contracts defined in MU compliant PKCs. An action of acknowledgment can be automatically triggered by verified raw data and programmatically computed information content, including semi-automatically acknowledged by human-in-the-loop authorization of action. The event of acknowledgment can be represented as a piece of authenticated data that possesses pragmatic value, such as a token of appreciation, honor badges, or cash payment. It is the data on the event-of-acknowledgement that we register and represent knowledge content in MU.
Universal Knowledge Representation
Data-centric knowledge subscribes to a central thesis, that all knowledge content can be universally represented by one data type, composed of ordered relations. Its universal applicability is based on the mathematical claim of Kan Extension. Kan extension states that all concepts and idealized knowledge are representable through functors which is a kind of directed relation. This means that knowledge of any kind can all be represented using ordered entries of concrete data points stored in scalable databases.
Composability and Univesality
Given the universal assumption of knowledge content representability, it implies that content can always be composed of these arrow-like universal components. The direct connection between composability and universality is a crucial insight in learning and teaching knowledge based on Data Science, where we may consider Data Science being an extension or inclusion of quantum mechanics, where it is a scientific language explicitly designed to encompass data in all possible physical scales and forms. All scientific hypotheses of quantum mechanics are grounded in observable data and the interpretive mechanisms of these data. Therefore, instead of just focusing on the physical meaning of obserable data, we can use the science of data interpretation as a generalized tool for all other areas of intellectual work. In any case, the logic of data is the grounding currency of science, and all data interpretation must follow a consistent set of logical rules. Even if one tries to extend the scope of certain logical assertions, the scope is also denoted in a logically sound data set.
Rosetta Stone of Deep Knowledge
To leverage the power of data science at its core, the discovery and recognition of Curry-Howard-Lambek Correspondence is a crucial Rosetta Stone of knowledge management. The correspondence shows that data, being the exchange currency across the three areas of knowledge representation, logical, algorithmic, and categorical. This insight provides a unifying strategy to represent knowledge and convert knowledge content in their most convenient forms for computation. Henceforth, it enables knowledge processing automation at scale, leveraging existing commodity tools for data computation and communication. This approach broadens, and deepens the scope and speed of knowledge management.
Observability of Data-Centric Knowledge
MU is about bringing the power of data to both individual and organizational awareness. This means that data of different kinds will be continuously processed and reported to enable project and resource management in general. By making data assets observable in their adequate reporting formats, it will significantly improve the quality and quantity of human and organizational activities in a profound way.
Report Generation as a Mean to an End
It is necessary to note that observability is just an mean to the end. The goal is not about generating reports, the goal is to elevate awareness from data reporting in context. Therefore, the format and frequency of report generation and data visualization is a form of art, where it needs to be integrated with UI/UX design efforts extensively.
Areas of Tasks regarding Data Management
Anshul Tiwari, a data engineer stated that there are at least 11 areas of data management activities[1]:
- Data Governance
- Data Architecture
- Data Modeling and Design
- Data Storage and Operations
- Data Security
- Data Integration
- Documentation and Content
- Master Data Management
- Data Warehouse and Business Intelligence
- Meta Data Management
- Data Quality
References
- ↑ Tiwari, Anshul (Feb 16, 2022). What are the 11 key areas of Data Management and specific data roles?. local page: IT k Funde.