Difference between revisions of "A computable framework for accountable data assets"

From PKC
Jump to navigation Jump to search
(→‎Synopsis: Fixed grammar)
Line 8: Line 8:


=Background and Introduction=
=Background and Introduction=
This article shows that the complexity of web service [[development and maintenance operation]]s can be significantly reduced by adopting a different data modeling mindset. The novelty of this approach is to manipulate data assets in a concrete data type ([[key-value pair]]), while abstractly treating them as algebraic entities<ref name=AoS>{{:Paper/Algebra of Systems}}</ref><ref name=AOIS>{{:Thesis/The Algebra of Open and Interconnected Systems}}</ref>, so that system complexity and data integrity concerns can be always reasoned about by a set of [[computable|computable/decidable]] operations<ref>{{:Video/The Man Who Revolutionized Computer Science With Math}}</ref><ref>{{:Book/Specifying Systems}}</ref>. To deal with growth in functionality and data volume methodically, the accountability of system changes is delegated to three types of accounts, [[Externally Owned Account]], [[Contract Account]], and [[Project Account]]. These three account types map accountability to human participants, behavioral specification, and project-specific operational anecdotal evidence, respectively. In other words, this data management framework represents system evolution possibilities in exact terms of human identities, version-controlled source code, and project-specific execution traces, and nothing else, therefore making use of technical means to guarantee [[non-repudiation]], [[transparency]], and [[context awareness]]. Finally, this article is written using a proto-typical data management tool, namely the [[PKC]] web service package, so that as readers browse through this article using [[PKC]] web service, the reader and potential editors will directly experience the algebraic properties and data-centric nature is being realized in an operational tool that can derive data-driven improvements in a self-referential way.
This article shows that the complexity of web service [[development and maintenance operation]]s can be significantly reduced by adopting a different data modeling mindset. The novelty of this approach is by manipulating data assets in a concrete data type ([[key-value pair]]) while abstractly treating them as algebraic entities<ref name=AoS>{{:Paper/Algebra of Systems}}</ref><ref name=AOIS>{{:Thesis/The Algebra of Open and Interconnected Systems}}</ref>, so that system complexity and data integrity concerns can be always reasoned through a set of [[computable|computable/decidable]] operations<ref>{{:Video/The Man Who Revolutionized Computer Science With Math}}</ref><ref>{{:Book/Specifying Systems}}</ref>. To deal with growth in functionality and data volume methodically, the accountability of system changes is delegated to three types of accounts: [[Externally Owned Account|Externally Owned Accounts]], [[Contract Account|Contract Accounts]], and [[Project Account|Project Accounts]]. Respectively, these three account types map accountability to human participants, behavioral specification, and project-specific operational anecdotal evidence. In other words, this data management framework represents system evolution possibilities in exact terms of human identities, version-controlled source code, project-specific execution traces, and nothing else, therefore using technical means to guarantee [[non-repudiation]], [[transparency]], and [[context awareness]]. Finally, this article is written using a proto-typical data management tool, namely the [[PKC]] web service package, so that as readers browse this article using the [[PKC]] web service, the readers and potential editors will directly experience the algebraic properties and data-centric nature being realized in an operational tool that can derive data-driven improvements in a self-referential way.


=Universality: an Axiomatic Assumption in Data Science=
=Universality: an Axiomatic Assumption in Data Science=
It is necessary to axiomatically assume that all information content can be approximately represented using a finite set of pre-defined symbols. The [[universe]] of symbols simply means the complete collection of all admissible symbols. The notion of logical [[universality]]<ref>{{:Book/Discrete Mathematics with Applications}}</ref> or rules that exhaustively apply to all admissible symbols in the symbol '''universe''' is the intellectual foundation of logical [[proof]]s, and therefore provides the scientific foundation of data integrity. Without this integrity assumption, data cannot have rigorous meanings.  
It is necessary to axiomatically assume that all information content can be approximately represented using a finite set of pre-defined symbols. The [[universe]] of symbols simply means the complete collection of all admissible symbols. The notion of logical [[universality]]<ref>{{:Book/Discrete Mathematics with Applications}}</ref>, or rules that exhaustively apply to all admissible symbols in the symbol '''universe''', is the intellectual foundation of logical [[proof]]s, therefore providing the scientific foundation of data integrity. Without this integrity assumption, data cannot have rigorous meaning.  


In this article, we will treat [[key-value pair]]s as the [[universal component]] to serve as the unifying data and function representation device, so that we can reduce the learning curve and system maintenance complexity. Based on the axiomatic assumption, it is well known that a special kind of [[key-value pair]]s, also known as the [[Lambda calculus]] (a.k.a. [[S-expression]]) may approximate any computational tasks. The following sections will provide the primer
In this article, we will treat [[key-value pair]]s as the [[universal component]] to serve as the unifying data and function representation device, so that we can reduce the learning curve and system maintenance complexity. Based on the axiomatic assumption, it is well known that a special kind of [[key-value pair]], also known as the [[Lambda calculus]] (a.k.a. [[S-expression]]) may approximate any computational tasks.  


==Lambda Calculus: A recursive data structures that can represent all decision procedures==
==Lambda Calculus: A recursive data structures that can represent all decision procedures==

Revision as of 02:55, 1 December 2022

Synopsis

This article argues that complex web services can always be composed of simple data types, such as key-value pairs. By methodically utilizing the universal[1][2] properties of key-value pairs, it will significantly reduce the cost and development effort of continuously more functionally-rich web-based services. This systematic approach improves the following three areas of web service development and operations:

  1. Universal Abstraction (Sound and Complete): Representing domain-neutral knowledge content in a key-value pair-based programming model (functional and declarative model) that allows flexible composition of data assets to create new instances of data assets- such as web pages, digital files, and web services.
  2. Accountability (Terminable responsibility tracing): Enforcing a formalized web service maintenance workflow by assigning accountability of changes to three account types. The accountability of content changes goes with Externally Owned Accounts. The business logic of workflows are specified in formalized workflow description languages. Each version of the workflow specification is uniquely bound to a Contract Account. Every initiation of a workflow is reflected in the creation of a Project Account, which captures all relevant information content related to the execution effects of the workflow. The definition of Project Account extends the technical specification of Smart Contract as specified in the design of Ethereum[3][4][5].
  3. Self-documenting (Semantic precision[6]): PKC is a documentation-driven DevOps practice. It integrates human-readable documents using MediaWiki's hyperlinks and Special pages to represent the knowledge or state of the microservice at work. Self-documenting is achieved by relating semantic labeling technology to industry-standard Application Programming Interfaces(APIs) of Git and Docker/Kubernetes to reflect and report the status and historical trail of the PKC system at work. The real-time aspect of this software architecture is accomplished by managing timestamp information of the system using blockchain-based public timing services so that data content changes will be recorded by globally shared clocks.

The novelty of this key-value pair-centric approach to data asset management is acheiving the simplest possible system representation without oversimplification [7]. It allows arbitrary functions, data content, and machine-processable causal relations to be represented uniformly in terms of key-value pairs. This key-value pair data primitive can also be used as a generic measuring metric to enable sound, precise, and terminable framework for modeling complex web services.

Background and Introduction

This article shows that the complexity of web service development and maintenance operations can be significantly reduced by adopting a different data modeling mindset. The novelty of this approach is by manipulating data assets in a concrete data type (key-value pair) while abstractly treating them as algebraic entities[8][9], so that system complexity and data integrity concerns can be always reasoned through a set of computable/decidable operations[10][11]. To deal with growth in functionality and data volume methodically, the accountability of system changes is delegated to three types of accounts: Externally Owned Accounts, Contract Accounts, and Project Accounts. Respectively, these three account types map accountability to human participants, behavioral specification, and project-specific operational anecdotal evidence. In other words, this data management framework represents system evolution possibilities in exact terms of human identities, version-controlled source code, project-specific execution traces, and nothing else, therefore using technical means to guarantee non-repudiation, transparency, and context awareness. Finally, this article is written using a proto-typical data management tool, namely the PKC web service package, so that as readers browse this article using the PKC web service, the readers and potential editors will directly experience the algebraic properties and data-centric nature being realized in an operational tool that can derive data-driven improvements in a self-referential way.

Universality: an Axiomatic Assumption in Data Science

It is necessary to axiomatically assume that all information content can be approximately represented using a finite set of pre-defined symbols. The universe of symbols simply means the complete collection of all admissible symbols. The notion of logical universality[12], or rules that exhaustively apply to all admissible symbols in the symbol universe, is the intellectual foundation of logical proofs, therefore providing the scientific foundation of data integrity. Without this integrity assumption, data cannot have rigorous meaning.

In this article, we will treat key-value pairs as the universal component to serve as the unifying data and function representation device, so that we can reduce the learning curve and system maintenance complexity. Based on the axiomatic assumption, it is well known that a special kind of key-value pair, also known as the Lambda calculus (a.k.a. S-expression) may approximate any computational tasks.

Lambda Calculus: A recursive data structures that can represent all decision procedures

According to the universality assumption, all finite-length decision procedures can be represented as some Lambda Calculus[13][14] programs. We know this statement is true because Lambda calculus is known to be Turing complete[15] meaning that it can model all possible computing/decision procedures. More technically, Turing completeness reveals the following insight:

All decision procedures can be recursively mapped onto a nested structure of switching(If-Then-Else) statements.

To test this idea, one may observe that Lambda calculus is a three-branch switching statement that represents three types (, , and ) of computational abstractions. We consider each type of the abstractions computational, because the variable values and expressions' interpretation results are to be determined dynamically.

A Table representing Lambda calculus and its abstraction types.
Admissible data types Symbolic representation Description
Variable (-conversion) x A character or string representing a parameter or mathematical/logical value.
Substitution ( reduction) x.M)(value to be bound to x) This expression specifies how function is defined by replacing values of bound variable x in the lambda (λ) expression M.
Composition (-reduction) (M N) Specifying the sequential composition of multiple lambda expressions such as M and N.

As shown in the table above, all three admissible data types can be symbolic represented as some textual expressions, occasionally annotated by dedicated symbols, such as λ. In any case, all three data types are considered as admissible forms of functional expressions. In compiler literature, this representational form of functions is called S-expression, short for symbolic expression. It is well-established that S-expression (often denoted in Backus-Naur form) can be used to represent any computing procedure, and can also encode any digitized data content. To maximize representational efficiency, different kinds of data content should be encoded using different formats. Based on the universality assumption, all data content can all be thought of as some sequentially composed symbols.

Decision Procedure represented in a Switch Statement

To illustrate that all decision procedures can be represented in nothing but key-value pairs. We first start with the notion of control structure in terms of switching statements. A switching statement is simply a look up table. Once given a certain value, it will switch to a defined procedure labeled with the matched value.

Using the built-in magic word[16] of MediaWiki, the code and the MediaWiki displayed result can be shown in the following table:

Wiki Source Code Rendered Result

{{#switch: {{#expr: 3+2*1}} | 1 = one | 2 = two | 3|4|5 = any of 3–5 | 6 = six | 7 = {{uc:sEveN}} <!--lowercase--> | #default = other }}

any of 3–5


Based on the example shown above, it should be evident that #switch as a function takes in an input expression {{#expr: 4+2+1}}, which should be evaluated to the numerical value 7, and the #switch function uses the following key-value pairs to find the matching key7, and return the assigned value.

The If-Then Control Structures as the minimal switch statement

Given the examples above, it should be obvious that the possible behavior of a switching statement can only have a total of 6 outputs, since the admissible cases has a total of 6 alternative branches (in this case, 3, 4, 5 are considered to be one branch). In other words, #switch is a generalized function that allow programmers to define an arbitrary number of branches. In contrast, the #ifexpr function is a hard-wired branching statements with exactly two possible branches, where two branches are the minimal number required to be a switch statement. Since there are only two options, the relative sequential positions of the two branches become the implicit keys (position indices). For the function #ifexpr, the first branch is selected if the input expression evaluates to 1 for being true, and the second branch, 0 for being false.

The first If example:

Wiki Source Code Rendered Result

'''Wrapper wikicode or text''' {{#ifexpr: 3<5 | This expression is {{#expr: 0=0}} | This expression is evaluated to {{#expr: 9>9}} }} '''Wrapper wikicode'''.

Wrapper wikicode or text This expression is 1 Wrapper wikicode.

The second If example:

Wiki Source Code Rendered Result

{{#ifexpr: 7=3 | {{#expr: 3+2=5}} RESULT | some text representing {{#expr: 1<1}} result }}

some text representing 0 result

This pair of #ifexpr examples shown above intentionally demonstrate the notion of selecting execution paths. The first example shows that not only the expression 3<5 is evaluated to be false, it also chose to rewrite the string, from the original form:This expression is {{#expr: 0=0}} to This expression is 1. This demonstration reveals the basic behavior of expression rewrite process, which is how Lambda calculus works. (For interested readers, transcluding these code samples and determine how and which one of the transcluded code is to be rendered can find examples on page Demo:CodeWrapper.)

Lambda Calculus as a three branch recursive switching statement

Given the general case of switching:(#switch) and the special case of switching:(#ifexpr), it can now be revealed that Lambda calculus is nothing but a three-branch switching structure:

<λexp> ::= <var> | λ <var> . <λexp>|( <λexp> <λexp> )

Seeing the Backus-Naur form implementation of Lambda calculus, it should be obvious that this Turing-complete language is completely implemented in key-value pairs. One may reflect on the argument presented so far:

Key-value pair is the foundational building block for constructing decision procedures (computational processes).

Example: Wiki code as annotated Lambda expression

Lambda calculus expression: <λexp> can be encoded in MediaWiki as {{#expr: computable expression}}. In this wiki's syntactical structure, #expr: is equivalent to the marker:λ in Lambda calculus. In other words, the entire wiki page that you are reading and editing is effectively an annotated Lambda calculus expression. Whenever a segment of the text shows the pattern of {{#expr: computable expression}}, the string rewrite system on the web server will start interpreting and following the switching structure denoted by the changeable content of <var> and <λexp>. The possible values of these three types of <λexp> can be as large as one's database can hold. This is basically where database technologies and PKC come to play a role.

Key-value pairs for composing web-based computational services

Knowing that all decision procedures are composed of switching statements, one may apply this simple principle to the composition of web services. The following table should help relate concepts developed in Lambda calculus to web services:

A table that relates Lambda calculus's abstractions with Web Services
Admissible data types Symbolic representation Description
Variable (-conversion) x A web page or a data artifact that can be observed and used directly by a web user.
Substitution (-reduction) x.M)(value to be bound to x) A template or executable function that can be reused and plugged-in by a defined range of values or data feeds.
Composition (-reduction) (M N) The sequential/structural arrangements of known computational resources.

Managing Functions as Catalogs of Names in adequate Cycle Times

When mapping structural information of an arbitrary system to a set, a morphism (a generalized kind of function) that conducts this mapping is called a representable functor. This mapping can also be represented using S-expression, and syntactically, it is denoted as a pair of data element called Key-value pair. In the web-based environment, every hyperlink is a key–value pair, where key represents an Universal Resource Locator (URL) string, value is the page or data element referenced by the URL. A collection of key-value pairs can be considered as a dictionary of hashtable, where keys are unique values of the hashes. Once everything is represented as a URL, it will naturally have a shelf life, meaning that its validity will change over time. To management every URL in terms of its time-based properties will be the main challenge to recognize the integrity of name spaces.

Security through Accountability

Systems are composed of imperfect subsystems. Therefore, it is impossible to guaranteed absolute security or integrity of a system. The integrity or security of a system can only be guaranteed through traceable accountability. By relegating all data changes with adequate accounts, then, all systems will have the accounts to assign accountability.

System Observability with automated testing

To help the largest number of users to get involved with the system, the strategy is to reveal the data in the most egalitarian possible kind of user interface. In this case, we choose the web browser as the common interface for human and machines. It is possible to use tools such as Selenium or Quant-UX to automatically act as users to test the overall system. This will close the loop in terms of having data to drive the behavior of testing and get results of the data service system.

Data Integrity Concerns and Accountability(TBD)

This section will talk about the implementation of PKC and its software engineering related concerns.

Representational Closure

A Table that Shows Representable Closure in Abstraction types
Technical Term Abstraction types Symbolic representation Description
-conversion Variable Naming abstraction A collection of symbols (names) that act as unique identifiers.
-reduction Substitution rule Function evaluations A template or executable function that can be reused in multiple contexts.
-composition Sequential composition Function composition The sequential/structural arrangements of known representable data.

Extensibility, Scalability, and Learnability

Based on results presented in Algebra of Systems, this computational framework specifies an algebraically-formulated accounting system for transacting data assets on the web. Operationally, this article defines the data capture and data verification procedure in terms of the above-mentioned data asset classes so that it can leverage the mathematical rigor to reason about data integrity. Moreover, this article prescribes an implementation roadmap to construct an open source and self-owned cloud computing (network-based data processing) service utilizing decentralized security system, so that small and large organizations can utilize the same data processing infrastructure to conduct business activities. This will significantly reduce the cost and accelerate business transaction cycles, therefore enabling more people to leverage the technical potentials in the supply network of data, products, and services on the Internet. Most importantly, it will enable a much larger crowd to utilize data processing technologies, such as cloud computing services without having to become a full-stack software developer, but by browsing through catalogs of PKC-packaged publicly tested data assets.

Decision-making agents represented as Accounts

Account is a type of data structure that defines conditional rights based on ownership. This can be accomplished technically using cryptographically guaranteed algorithms. Inspired by Ethereum, for the right to assign ownership to resources, only two kinds of accounts are possible:

  1. Externally Owned Account: This class of accounts are controlled by agents or agencies that must authenticate their identity and they can exercise their rights via an access control list.
  2. Programmable Account(a.k.a. Contract Account): This class of accounts are controlled by a set of source code that are published and executed based on a code base that is implicitly trusted by all participants, who controls the Externally Owned Account.

Broadest Possible User Base

This framework should provide intuitive user interfaces for entry-level users through popularly-available web browser-based interfaces. in a features offered freely on the Web-enabled Internet, operate their possible to create an open source turn-key solution, that allow almost every person on the web, a self-sovereign cloud computing service, This revolutionary software artifact presented many business opportunities and inspired many new technologies, however methods and tools to ensure their system integrity have not yet caught up with these changes.

  1. Complex software applications and business processes that have been serving a large portion of the society are searching for systematic ways to migrate to modern technical infrastructures.
these algebraic formulation of accounting systems has

Deployment and Interoperability of Accountable Data

According to Rambaud and Pérez[17][18], an algebraically-defined accounting (data capture and verification) practice may systematically automate the decision procedures for the following activities:

  1. Decide how to classify the data collected and send the collected data to relevant data processing workflows.
  2. Whether a given data set is considered admissible or not. This is judged in terms of its data formats and legal value ranges.
  3. Whether a transaction process is allowable, or not. This include whether a given transaction is feasible, in relevant operational/business logics.

Deployment Process

A pragmatic way of deploying accountable data assets is to leverage the complementary features of the immutability of past history, the unpredictable nature of future events, and the duality or even three-layered aspects of computation (the correspondences between Proof, Program Execution, and Grammar) to verify and validate the correctness of data content based on a common blockchain, which assigns a singular ordering sequence property to all data elements. The plan is to use a generalized workflow to iteratively refine and optimize the content structure of data assets in a networked environment that assigns socio-physical meaning to data.

Controlling the Logic of Deployment using Automated Tests

Unlike traditional testing practices, the controlling gate is based on some senior developers' reputation, or a central administration's authority, to approve test results to deploy a tested piece of code to the public. In this proposed framework, data assets, source code, and test cases are considered to be named data assets to be nominated or pledged by supporting parties by their own account identities. All accounts, whether it is an Externally Owned Account, or Programmable Account, can sign off on data content release to a particular stage of qualification based on some Ricardian Contract. This is a direct application of using the well-known Smart Contract infrastructure to control software production workflow, or content release workflow at a much larger scale. Clearly, the total number of participants to release any given piece of data can be controlled by the Smart Contract, which could be bound to certain well-defined verioning system to make the Smart Contract to be a Ricardian Contract.

Lambda Calculus and Curry-Howard-Lambek Correspondence

The second layer of this framework is to integrate test cases and test case execution using the well-known mathematical observation called Curry-Howard-Lambek Correspondence. This allows one to see that certain pieces of data would be submitted through some predefined execution procedures, which is by themselves, pieces of data that have time-stamp associated with them, and their processing of certain input data content would be transformed into certain output data content that signifies their judgments on the input data. By recording the executing trail of input data with their processing results of the test procedures, this trail could constitute a form of automated verification or validation of the system.

Hermeneutical Circle and Data Configurations

Think of software or data content product between a configuration management exercise, then, the notion of Hermeneutics (this term refers to the theory and methodology of text interpretation, originally the interpretation of biblical texts, wisdom literature, and philosophical texts, but it can definitely to applied to data interpretation in this context.), can be applied here(The following diagram is extracted from The Life and Works of Luca Pacioli [19]):

HermeneuticalCircle.png

:

In the Hermeneutical diagram:

  1. Configuration: Refers to the data content to be exposed to the public.
  2. Refiguration: Refers to data content that is being challenged and modified.
  3. Prefiguration: Refers to data content that is being proposed and to be processed through existing test procedures.

These three distinctive phases can be managed using configuration management tools, such as Git or Fossil, as a version control database to keep the cycles of Hermeneutic evolutions moving forward in a controlled manner.

Proof, Program Execution, and Grammar

Following the assumptions in Curry-Howard-Lambek Correspondence, the notion of Proof, Program Execution, and Grammar are three aspects of a unifying system. These aspects of data content knowledge would only be "proven" if they are represented in their respective domain of description. The convenient fact is that Proof, Program Execution, and Grammar are three kinds of data structures that could be explicitly defined and could be given appropriate abstractions within their own domains. The work of managing software products can follow this rather-compact universal data classification scheme, and keep using the tools and methods provided by the three domains to prove, execute, and define the structural nature of every data element.

Physical, Social and Operationalized Meaning of Data

Once we have a generalized theoretical framework to manage data in the abstract, we need a concrete platform to assign physical meaning to data. In this case, we want to use the timestamps derived from a common blockchain to assign timestamps to data packages. This will provide a consistent mechanism to note the latest time of modification. This order-asserting property will help distinguish the structure of data dependency, based on the temporal ordering, as well as the source of changes in terms of account addresses (Externally Owned Account or Programmable Account) mentioned before. In other words, this framework assigns physical meaning to data through timestamps, and social meaning to data through account addresses. Therefore, data assets and data content will naturally be associated with adequate accountability.

Operationalized Meaning of Data

Data can be manipulated with a large number of heterogeneous computers operating in various locations and configurations. This can have significant performance and data accuracy/consistency implications. From a PKC point of view, individual consumers of data content should not be burdened with sophisticated technologies and massive energy consumption/data storage requirements, therefore, thinking of how data can be compressed and refreshed with adequate economic concerns must be a built-in function, hidden away from everyone usage. After the birth of Docker-like container virtualization technologies, it became possible to operationalize data manipulation across a large number of computing architectures and configurations with a kind of configuration consistency. This minimizes the discrepancies of data manipulation by assuming all these virtualized computers will not distort data content and functional correctness. Moreover, a mechanism, such as Fossil, which uses a relational data model (SQLite as a data repository) to manage files and data version history, can be a universal model to manage data in a recursively distributed manner.

Interoperability of PKC

Given that all data content can be associated with physical, social, and operationalized meanings in a rigorous way, grounded in lambda calculus, it is conceivable that the composability of data can be attained in a computational framework that associates human, human agencies, and technically verified procedures and data content. This would be a useful framework to reduce uncertainty in data management, and therefore, enhance the exchangeability of data and source code. More importantly, every PKC can be associated with these data management tools and infrastructures, to know each PKC instance qualifies or signifies a specific version of the software, and/or knowledge content, therefore, certain kind of information consistency can be reached with minimal third party intervention. Effectively PKC can be used as the domain-neutral data management operating system, that serves the purpose of keeping data content consistent without leaking information beyond the system-prescribed perimeters.

Open Format over Open Source

Contrary to common beliefs, Open Format is more important than Open Source, since source code can be a major burden to be parsed, understood, and tested for validity. However, formats of information are by definition explicit and need to be open enough, to be easily understood and to encourage interchange. Therefore, a key-value pair-based format could be a foundational design building block to support the Open Format idea. This requires a thorough and rigorous adoption of Open API Specification, as well as the JSON/YAML key-value pair-based data format and declarative syntax for specifying data assets across File,Page, and Service data domains.

The three objectives at G20 2022

To enable Inter-organizational workflow or data exchanges, according to Luhut Panjaitan, the Coordinating Ministry for Maritime and Investments Affairs of Indonesia, also the hosting nation for G20 in 2022. In order to spread the Science of Governance through self-administered Data, he prescribed the necessary steps in the following manner:

  1. Create a data storage and manipulation instrument (PKC) to enable personalized and scalable containment of data assets.
  2. Create publicly available training programs and publishable curriculum to promote the understanding of governance through data.
  3. Invited and work with global industry standards bodies to set adequate and fair data standards for all national participants.

Only with non-proprietary data asset management instrumentation, open and public educational programs on the value of data, and industry-wide data exchange protocols to be open for public examinations, one could really anticipate a continuous flow of data assets to operate with minimal friction. Otherwise, the world would continue to see chaos due to information asymmetry and friction.

Conclusion

This article proposes a system composition/decomposition strategy with an algebraic programming model. It also presented a sample implementation, namely PKC as a self-sufficient building block of an inter-organizational data transaction system. It also borrowed the notion of accounting practice and its formal mathemtical framework to ensure the accountabiity and data consistency. The proposed framework differs from existing blockchains or Web3 systems in the following way:

  1. A hyperlinked data asset management framework that uses key-value pairs to link content data, source code, executable binary data images in one consistently abstracted workflow. The notion of key-value pairs is also the building block of Lambda calculus, which provides a model of functional composition and can be applied to represent workflows. This workflow model allows anyone to reuse the content knowledge, source code, and operational experience of the PKC community. It allow organizations of any size to operate their own data asset management infrastructure using a chosen branch of this open sourced framework of data asset management.
  2. A data-driven (declarative) programming model that integrates content, executable functions, and networked data services as nothing but just key-value pairs, so that it will simply grow and refine its own logical integrity as more key-value pairs are being accumulated. In other words, PKC is a scale-free and domain-neutral learning system will naturally evolve its own structure and content as these key-value pairs are being added to its data asset repository. Both good and bad results can be transparently reused by all other parties.
  3. A web-browser oriented data abstraction, that present all data assets in terms of page abstraction, so one universal namespace and data presentation mechanism covers all usage scenarios, while remain compatible to other universal data abstractions, such as file and service abstractions. This page, file, service abstraction combo is defined and programmed into the web the key-value pair programming model, and therefore offers the maximum reach in terms of participants and data consumption parties.

TBD

Since the appearance of World Wide Web in 1995, world affairs have been transformed by ever-faster electronic data transaction activities. This data-driven phenomenon created an unprecedented global supply network that can be considered as a singular inter-connected web of data transaction activities. Up to year 2022[20], this data-driven supply network favors organizations or persons who have deep pockets and access to more advanced Information Technologies. The competitive edge distinguished by wealth and technology literacy induced many unfair practices and even un-ethical and/or illegal transaction activities at a global scale. To resolve this issue, this article presents Personal Knowledge Container (PKC) as a self-administered cloud computing service which reduces the unfair competitive edges and reduces the cost of system participation or system operation would be necessary to address many fundamental issues caused by information asymmetry.

Recent development in blockchain and Decentralized identifier technologies coupled with web-based applications and 4G/5G connected devices created a technical infrastructure that could significantly reduce the degree of unfairness/information asymmetry in the global marketplace. Anyone with access to an Internet-connected web-browsing device have been able to not just participate in the global supply network, but also learn and operate their own business with minimal entry barrier. To continuously introduce late-breaking Information Technologies to the broadest possible range of users, the world needs to present a user experience through popularly-available web browsers that will present a wide range of data formats, includes natural language annotations and timely workflow instructions, and most importantly, have a "fair" data security model that protect the interests of all participants in a transparent[21] way.

Why and How does this framework differs from existing approaches

Existing web application frameworks are often developed and operated by highly skilled software development and operational teams that serves a specific set of profit attaining objectives. Each instance of web service will have a highly localized and protected set of operational data. This operational data, and software configuration knowledge is a piece of privately owned asset that is usually protected and not shared to the public. In contact, PKC differs from existing data transaction systems, often known as Infrastructure as Code(IaC) in the following way:

Most people simply cannot believe it can be this simple

Key-value pairs, or hyperlinked data content is the simplest, yet universal data type that connects our world and minds. When this universal instrument is made explicit and integrated with self-documented technical arguments to continuously explore and explain the opportunities for improvement, this data management management framework, and its derived data management tools, such as PKC can continuously improve its system correctness while accelerating all activities supported by PKC.

Simplicity enables massive and decentralized/distributed adoption, and generate trust-worthy data

Because PKC is super simple, so that it is possible for everyone to own and to operate their own instance of PKC, therefore creating a larger base of egalitarian data processing and data verification/authentication/authorization agencies. Giving data a much more distributed/decentralized trust-worthiness (it is witnessed by more independent agents and agents, so that it is more trust worthy.)

Trust-worthiness allows PKC-managed data asset to be used for error-correction

Given the trust-worthiness of data, the data can be used to correct mistakes in content, source code, and binary executable images, so that it becomes a platform of DevSecOps workflow.

Self-reflective error correction enables systematic learning

When PKC can be deployed to a broad base of practices, it will enable a kind of self-reflective error correction feature, where many different kinds of applications and use cases can mutually verify and validate the quality of key-value pair-encoded knowledge base. This goes back to the mythical story of Tower of Babel, where a unified language will enable participants to build a structure that can scale up to unprecedented height.

  1. PKC as the e-Catalog of cloud-enabled data assets

PKC is a general-purpose framework that uses an encyclopedic approach to categorize and publish all existing data resources in terms of data content, source code, executable binary, and real world software operational data. This publicized framework of data asset management approach allows all participants to operate their own instances of PKC by leveraging the operational experience of the entire PKC community.

  1. Automate the composition and decomposition of software components

Participant can choose to incorporate parts or all of PKC's functionalities based on the algebraic approach to compose and decompose the functionalities of an otherwise proprietary software infrastructure.

  1. Reuse source code and operational data derived from the entire PKC community
  2. Inter-organizational Workflows amongst a common code base

All code that needs to be written can become a portion of the data asset to be kept in PKC.

Stop Reinventing the Wheel

  1. PKC as a Meta Protocol for Data Assets
  1. Disseminate the most-recent-possible data that reflect verifiable truth
  1. Published data as a public Natural Resource[22]


References

  1. MathProofsable, ed. (Jan 9, 2017). Toposes - "Nice Places to Do Math". local page: MathProofsable. 
  2. MathProofsable, ed. (Jan 5, 2018). Motivation for a Definition of a Topos. local page: MathProofsable. 
  3. Buterin, Vitalik (2014). "Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform" (PDF). local page: ETHEREUM FOUNDATION. 
  4. Wood, Gavin (April 7, 2022). "ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER" (PDF) (Berlin Version 934279c ed.). local page: ETHEREUM FOUNDATION. 
  5. Meyerson, Michael (2002). Political numeracy : mathematical perspectives on our chaotic constitution. local page: Norton Publisher. ISBN 0393323722. 
  6. Brown, Kris (May 18, 2022). Kris Brown: Combinatorial Representation of Scientific Knowledge. local page: Topos Institute. 
  7. Everything Should Be Made as Simple as Possible, But Not Simpler.

  8. Koo, Hsueh-Yung Benjamin; Simmons, Willard; Crawley, Edward (Nov 16, 2021). "Algebra of Systems as a Meta Language for Model Synthesis and Analysis" (PDF). local page: IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. 
  9. Fong, Brendan (2016). The Algebra of Open and Interconnected Systems (PDF) (Ph.D.). local page: University of Oxford. Retrieved October 15, 2021. 
  10. Lamport, Leslie (May 17, 2022). The Man Who Revolutionized Computer Science With Math. local page: Quanta Magazine. 
  11. Lamport, Leslie (2020). Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. local page: Addison Wesley. ISBN 0-321-14306-X. 
  12. Epp, Susanna (2020). Discrete Mathematics with Applications (5th ed.). local page: Cengage. ISBN 978-1-337-69419-3. 
  13. To understand the intricate mechanisms of Lambda calculus, and why and how this simple language can be universal, please read this page:Dana Scott on Lambda Calculus.
  14. Scott, Dana (January 1, 1970). "Outline of a Mathematical Theory of Computation". local page: Oxford University Computing Laboratory Programming Research Group. 
  15. Dolan, Stephen (July 19, 2013). "mov is Turing-complete" (PDF). local page: Computer Laboratory, University of Cambridge. 
  16. This article: Why Lua Scripting, explains how to turn wikitext into a functional programming language using #swithc and #if magic words.
  17. Rambaud, Salvador Cruz; Pérez, José García; Nehmer, Robert A.; Robinson, Derek J S Robinson (2010). Algebraic Models for Accounting Systems. local page: Cambridge at the University Press. ISBN 978-981-4287-11-1. 
  18. Rambaud, Salvador Cruz; Pérez, José García (2005). "The Accounting System as an Algebraic Automaton". INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS. local page: Wiley Periodicals, Inc. 20: 827–842. 
  19. Sangster, Alan (2021). "The Life and Works of Luca Pacioli (1446/7–1517), Humanist Educator". Abacus: A Journal of Accounting, Finance and Business Studies. local page: University of Sydney. 57.  , Figure 2
  20. This document is revised on 12 1, 2022
  21. Transparency of security rules can be encoded in published Smart contracts, so that participants can decide to participate or not based on reading the explicitly specified contracts.
  22. Slide/Fab City Full Stack

Related Pages

PKC