Data-centric knowledge

From PKC
Revision as of 08:56, 18 February 2022 by Benkoo (talk | contribs)
Jump to navigation Jump to search

Data-centric knowledge is a formalized mapping of concepts to data points. Its universal applicability is based on the representability assumption of Kan Extension. Kan extension states that all concepts and idealized knowledge are representable through functors from a domain of complex data types to uniquely identifiable data entries in set-theoretic format. This means that knowledge of any kinds can all be stored or represented using concrete data points stored in databases.

Data-Centric Knowledge under the context of MU

Knowledge is considered to be a derived property from data asset collected under the context of MU data operations. Every piece of knowledge needs to go through the following stages to be given a representable handle for ongoing integration of knowledge content:

  1. Grounding Raw Data: This data set is collected from widely deployed user terminals or certified data sensors that should always be annotated with timestamps and spatial tags that explicitly specify who, when and where the data are being collected. These raw data content, especially the timestamps and location/account that provided the data will be used as a reference to determine the authenticity of data.
  2. Inferred information: The ordering and semantic implication of the data content would further define the information content supported by Grounding Raw Data. This computational procedure is defined by a set of computational inference tools. For example, using Bayesian Belief Network, or Trained Neural Network, they will infer the probability distribution of certain events, therefore present information content beyond raw data. However, the authenticity of these decision algorithms and their training data sets must be verified as a part of the version control system of MU certified PKC.
  3. Knowledge content: Is a set of causal relations that are grounded on raw data, and inferred information content. It will be automatically presented to user when raw data and inferred information content determines that certain causal relations are immediately relevant to user at the point of system operation.