A computable framework for accountable data assets
Synopsis
This article argues for the fact that complex web services can always be composed of rather simple data types, such as key-value pairs. By methodically exploiting the nature of the universal properties of key-value pairs, it will significantly reduce the costs and development effort of ever more functionally-rich web-based services. This systematic approach applies to the following three areas of web service artifacts:
- A key-value pair-based programming model (functional and declarative model) that ensures algebraic closure properties (sound and complete) in three classes of data assets, namely, information content presented in Interactive Web-browers, source code as time-stamped data, and executable binary files with associated testing or operational data.
- A software architecture that maps industry-standard tools, such as web browser, version control systems, and a standardized virtualization runtime environment to the above-mentioned data asset classes.
- A web-based workflow grounded on one common physically-meaningful data tracking system, such as blockchain-ed data, that ensures only authenticated participants and machine-executable contracts[1][2][3] may conduct changes to a cryptographically verified ledger for any ownership transactions.
The unique feature of this universal approach to data asset composition is to allow all data types to be representable and decomposable into one common data format(key-value pair). This key-value pair data type can also be used as a measuring metric for the respective namespaces of keys and values. The representability, composability/decomposability, and measurability enables a truly sound, precise, and terminable framework for modeling complex web services.
Background and Introduction
This article shows that the complexity of web service development and maintenance operations can be significantly reduced by adopting a different data modeling mind set. The novelty of the proposed approach is to leverage certain foundational mathematical ideas in an algebraic and data-centric manner. The mind set of algebraic closure is strategic, it guides system designers to qualify the universal pattern, where all operational procedures of any types must be generating results in a unifying data format (such as key-value pair), so that operational complexity and data integrity concerns can be always reduced down the an composable set of software tools and accountable human-in-the-loop activities. Data-centric mind set is necessary, so that regardless of the number of data elements or the total number of human agencies and human participants grow, the methodology defined in this article will continue to work. The operating principle of this web service system should be able to scale up and down as the demand of data services grow and diminish. Finally, this article will demonstrate by a concrete data management tool, namely the PKC web service package, so that as readers browse through this article using PKC web service, the reader and potential editors will direct experience the algebraic properties and data-centric nature is being realized in an operational tool that can derive data-driven improvements in a self-referential way.
Universality: an Axiomatic Assumption in Data Science
It is necessary to axiomatically assume that all information content can be approximately represented using a finite set of pre-defined symbols. The universe of symbols simply means the complete collection of all admissible symbols. The notion of logical universality[4] or rules that exhaustively apply to all admissible symbols in the symbol universe is the intellectual foundation of logical proofs, and therefore provides the scientific foundation of data integrity. Without this universality assumption, data cannot have a rigorous meanings.
In this article, we will treat key-value pairs as the universal component to serve as the unifying data and function representation device, so that we can reduce the learning curve and system maintenance complexity. Based on the axiomatic assumption, it is well known that a special kind of key-value pairs, also known as the Lambda calculus (a.k.a. S-expression) may approximate any computational tasks. The following sections will provide the primer
Lambda Calculus: A recursive data structures that can represent all decision procedures
According to the universality assumption, all finite-length decision procedures can be represented as some Lambda Calculus[5][6] programs. We know this statement is true because Lambda calculus is known to be Turing complete[7] meaning that it can model all possible computing/decision procedures. More technically, Turing completeness reveals the following insight:
All decision procedures can be recursively mapped onto a nested structure of switching(If-Then-Else) statements.
To test this idea, one may observe that Lambda calculus is a three-branch switching statement that represents three types (, , and ) of computational abstractions. We consider each type of the abstractions computational, because the variable values and expressions' interpretation results are to be determined dynamically.
Admissible data types | Symbolic representation | Description |
---|---|---|
Variable (-conversion) | x | A character or string representing a parameter or mathematical/logical value. |
Substitution ( reduction) | (λx.M)(value to be bound to x) | This expression specifies how function is defined by replacing values of bound variable x in the lambda (λ) expression M. |
Composition (-reduction) | (M N) | Specifying the sequential composition of multiple lambda expressions such as M and N. |
As shown in the table above, all three admissible data types can be symbolic represented as some textual expressions, occasionally annotated by dedicated symbols, such as λ
. In any case, all three data types are considered as admissible forms of functional expressions. In compiler literature, this representational form of functions is called S-expression, short for symbolic expression. It is well-established that S-expression (often denoted in Backus-Naur form) can be used to represent any computing procedure, and can also encode any digitized data content. To maximize representational efficiency, different kinds of data content should be encoded using different formats. Based on the universality assumption, all data content can all be thought of as some sequentially composed symbols.
Decision Procedure represented in a Switch Statement
To illustrate that all decision procedures can be represented in nothing but key-value pairs. We first start with the notion of control structure in terms of switching statements. A switching statement is simply a look up table. Once given a certain value, it will switch to a defined procedure labeled with the matched value.
Using the built-in magic word[8] of MediaWiki, the code and the MediaWiki displayed result can be shown in the following table:
Wiki Source Code | Rendered Result |
---|---|
{{#switch: {{#expr: 3+2*1}} | 1 = one | 2 = two | 3|4|5 = any of 3–5 | 6 = six | 7 = {{uc:sEveN}} <!--lowercase--> | #default = other }} |
any of 3–5
|
Based on the example shown above, it should be evident that #switch
as a function takes in an input expression {{#expr: 4+2+1}}
, which should be evaluated to the numerical value 7
, and the #switch
function uses the following key-value pairs to find the matching key7
, and return the assigned value.
The If-Then Control Structures as the minimal switch statement
Given the examples above, it should be obvious that the possible behavior of a switching statement can only have a total of 6 outputs, since the admissible cases has a total of 6 alternative branches (in this case, 3, 4, 5 are considered to be one branch). In other words, #switch
is a generalized function that allow programmers to define an arbitrary number of branches. In contrast, the #ifexpr
function is a hard-wired branching statements with exactly two possible branches, where two branches are the minimal number required to be a switch statement. Since there are only two options, the relative sequential positions of the two branches become the implicit keys (position indices). For the function #ifexpr
, the first branch is selected if the input expression evaluates to 1
for being true, and the second branch, 0
for being false.
The first If example:
Wiki Source Code | Rendered Result |
---|---|
'''Wrapper wikicode or text''' {{#ifexpr: 3<5 | This expression is {{#expr: 0=0}} | This expression is evaluated to {{#expr: 9>9}} }} '''Wrapper wikicode'''. |
Wrapper wikicode or text This expression is 1 Wrapper wikicode. |
The second If example:
Wiki Source Code | Rendered Result |
---|---|
{{#ifexpr: 7=3 | {{#expr: 3+2=5}} RESULT | some text representing {{#expr: 1<1}} result }} |
some text representing 0 result |
This pair of #ifexpr
examples shown above intentionally demonstrate the notion of selecting execution paths. The first example shows that not only the expression 3<5
is evaluated to be false, it also chose to rewrite the string, from the original form:This expression is {{#expr: 0=0}}
to This expression is 1
. This demonstration reveals the basic behavior of expression rewrite process, which is how Lambda calculus works. (For interested readers, transcluding these code samples and determine how and which one of the transcluded code is to be rendered can find examples on page Demo:CodeWrapper.)
Lambda Calculus as a three branch recursive switching statement
Given the general case of switching:(#switch
) and the special case of switching:(#ifexpr
), it can now be revealed that Lambda calculus is nothing but a three-branch switching structure:
<λexp> ::= <var> | λ <var> . <λexp>|( <λexp> <λexp> )
Seeing the Backus-Naur form implementation of Lambda calculus, it should be obvious that this Turing-complete language is completely implemented in key-value pairs. One may reflect on the argument presented so far:
Key-value pair is the foundational building block for constructing decision procedures (computational processes).
Example: Wiki code as annotated Lambda expression
Lambda calculus expression: <λexp>
can be encoded in MediaWiki as {{#expr: computable expression}}
. In this wiki's syntactical structure, #expr:
is equivalent to the marker:λ
in Lambda calculus. In other words, the entire wiki page that you are reading and editing is effectively an annotated Lambda calculus expression. Whenever a segment of the text shows the pattern of {{#expr: computable expression}}
, the string rewrite system on the web server will start interpreting and following the switching structure denoted by the changeable content of <var>
and <λexp>
. The possible values of these three types of <λexp>
can be as large as one's database can hold. This is basically where database technologies and PKC come to play a role.
Applying key-value paris for composing web-based computational services
Knowing that all decision procedures are composed of switching statements, one may apply this simple principle to the composition of web services. The following table should help relate concepts developed in Lambda calculus to web services:
Admissible data types | Symbolic representation | Description |
---|---|---|
Variable (-conversion) | x | A web page or a data artifact that can be observed and used directly by a web user. |
Substitution (-reduction) | (λx.M)(value to be bound to x) | A template or executable function that can be reused and plugged-in by a defined range of values or data feeds. |
Composition (-reduction) | (M N) | The sequential/structural arrangements of known computational resources. |
Managing Functions as Catalogs of Names
When mapping structural information of an arbitrary system to a set, a morphism (a generalized kind of function) that conducts this mapping is called a representable functor. This mapping can also be represented using S-expression, and syntactically, it is denoted as a pair of data element called Key-value pair. In the web-based environment, every hyperlink is a key–value pair, where key represents an Universal Resource Locator (URL) string, value is the page or data element referenced by the URL. A collection of key-value pairs can be considered as a dictionary of hashtable, where keys are unique values of the hashes.
Data Integrity Concerns and Accountability(TBD)
This section will talk about the implementation of PKC and its software engineering related concerns.
Representational Closure
Technical Term | Abstraction types | Symbolic representation | Description |
---|---|---|---|
-conversion | Variable | Naming abstraction | A collection of symbols (names) that act as unique identifiers. |
-reduction | Substitution rule | Function evaluations | A template or executable function that can be reused in multiple contexts. |
-composition | Sequential composition | Function composition | The sequential/structural arrangements of known representable data. |
Extensibility, Scalability, and Learnability
Based on results presented in Algebra of Systems[9][10], this computational framework specifies an algebraically-formulated accounting system for transacting data assets on the web. Operationally, this article defines the data capture and data verification procedure in terms of the above-mentioned data asset classes so that it can leverage the mathematical rigor to reason about data integrity. Moreover, this article prescribes an implementation roadmap to construct an open source and self-owned cloud computing (network-based data processing) service utilizing decentralized security system, so that small and large organizations can utilize the same data processing infrastructure to conduct business activities. This will significantly reduce the cost and accelerate business transaction cycles, therefore enabling more people to leverage the technical potentials in the supply network of data, products, and services on the Internet. Most importantly, it will enable a much larger crowd to utilize data processing technologies, such as cloud computing services without having to become a full-stack software developer, but by browsing through catalogs of PKC-packaged publicly tested data assets.
Decision-making agents represented as Accounts
Account is a type of data structure that defines conditional rights based on ownership. This can be accomplished technically using cryptographically guaranteed algorithms. Inspired by Ethereum, for the right to assign ownership to resources, only two kinds of accounts are possible:
- Externally Owned Account: This class of accounts are controlled by agents or agencies that must authenticate their identity and they can exercise their rights via an access control list.
- Programmable Account(a.k.a. Contract Account): This class of accounts are controlled by a set of source code that are published and executed based on a code base that is implicitly trusted by all participants, who controls the Externally Owned Account.
Broadest Possible User Base
This framework should provide intuitive user interfaces for entry-level users through popularly-available web browser-based interfaces. in a features offered freely on the Web-enabled Internet, operate their possible to create an open source turn-key solution, that allow almost every person on the web, a self-sovereign cloud computing service, This revolutionary software artifact presented many business opportunities and inspired many new technologies, however methods and tools to ensure their system integrity have not yet caught up with these changes.
- Complex software applications and business processes that have been serving a large portion of the society are searching for systematic ways to migrate to modern technical infrastructures.
these algebraic formulation of accounting systems has
Deployment and Interoperability in real world System
According to Rambaud and Pérez[11][12], an algebraically-defined accounting (data capture and verification) practice may systematically automate the decision procedures for the following activities:
- Decide how to classify the data collected and send the collected data to relevant data processing workflows.
- Whether a given data set is considered admissible or not. This is judged in terms of its data formats and legal value ranges.
- Whether a transaction process is allowable, or not. This include whether a given transaction is feasible, in relevant operational/business logics.
Deployment Process
Physical Meaning of Data
Interoperability of PKC
Conclusion
This article proposes a system composition/decomposition strategy with an algebraic programming model. It also presented a sample implementation, namely PKC as a self-sufficient building block of an inter-organizational data transaction system. It also borrowed the notion of accounting practice and its formal mathemtical framework to ensure the accountabiity and data consistency. The proposed framework differs from existing blockchains or Web3 systems in the following way:
- A hyperlinked data asset management framework that uses key-value pairs to link content data, source code, executable binary data images in one consistently abstracted workflow. This workflow model allows anyone to reuse the content knowledge, source code, and operational experience of the PKC community. It allow organizations of any size to operate their own data asset management infrastructure using a chosen branch of this open sourced framework of data asset management.
- A data-driven (declarative) programming model that integrates content, executable functions, and networked data services as nothing but just key-value pairs, so that it will simply grow and refine its own logical integrity as more key-value pairs are being accumulated. In other words, PKC is a scale-free and domain-neutral learning system will naturally evolve its own structure and content as these key-value pairs are being added to its data asset repository. Both good and bad results can be transparently reused by all other parties.
- A web-browser oriented data abstraction, that present all data assets in terms of page abstraction, so one universal namespace and data presentation mechanism covers all usage scenarios, while remain compatible to other universal data abstractions, such as file and service abstractions. This page, file, service abstraction combo is defined and programmed into the web the key-value pair programming model, and therefore offers the maximum reach in terms of participants and data consumption parties.
TBD
Since the appearance of World Wide Web in 1995, world affairs have been transformed by ever-faster electronic data transaction activities. This data-driven phenomenon created an unprecedented global supply network that can be considered as a singular inter-connected web of data transaction activities. Up to year 2022[13], this data-driven supply network favors organizations or persons who have deep pockets and access to more advanced Information Technologies. The competitive edge distinguished by wealth and technology literacy induced many unfair practices and even un-ethical and/or illegal transaction activities at a global scale. To resolve this issue, this article presents Personal Knowledge Container (PKC) as a self-owned cloud computing service which reduces the unfair competitive edges and reduces the cost of system participation or system operation would be necessary to address many fundamental issues caused by information asymmetry.
Recent development in blockchain and Decentralized identifier technologies coupled with web-based applications and 4G/5G connected devices created a technical infrastructure that could significantly reduce the degree of unfairness/information asymmetry in the global marketplace. Anyone with access to an Internet-connected web-browsing device have been able to not just participate in the global supply network, but also learn and operate their own business with minimal entry barrier. To continuously introduce late-breaking Information Technologies to the broadest possible range of users, the world needs to present a user experience through popularly-available web browsers that will present a wide range of data formats, includes natural language annotations and timely workflow instructions, and most importantly, have a "fair" data security model that protect the interests of all participants in a transparent[14] way.
Why and How does this framework differs from existing approaches
Existing web application frameworks are often developed and operated by highly skilled software development and operational teams that serves a specific set of profit attaining objectives. Each instance of web service will have a highly localized and protected set of operational data. This operational data, and software configuration knowledge is a piece of privately owned asset that is usually protected and not shared to the public. In contact, PKC differs from existing data transaction systems, often known as Infrastructure as Code(IaC) in the following way:
Most people simply cannot believe it can be this simple
Key-value pairs, or hyperlinked data content is the simplest, yet universal data type that connects our world and minds. When this universal instrument is made explicit and integrated with self-documented technical arguments to continuously explore and explain the opportunities for improvement, this data management management framework, and its derived data management tools, such as PKC can continuously improve its system correctness while accelerating all activities supported by PKC.
Simplicity enables massive and decentralized/distributed adoption, and generate trust-worthy data
Because PKC is super simple, so that it is possible for everyone to own and to operate their own instance of PKC, therefore creating a larger base of egalitarian data processing and data verification/authentication/authorization agencies. Giving data a much more distributed/decentralized trust-worthiness (it is witnessed by more independent agents and agents, so that it is more trust worthy.)
Trust-worthiness allows PKC-managed data asset to be used for error-correction
Given the trust-worthiness of data, the data can be used to correct mistakes in content, source code, and binary executable images, so that it becomes a platform of DevSecOps workflow.
Self-reflective error correction enables systematic learning
When PKC can be deployed to a broad base of practices, it will enable a kind of self-reflective error correction feature, where many different kinds of applications and use cases can mutually verify and validate the quality of key-value pair-encoded knowledge base. This goes back to the mythical story of Tower of Babel, where a unified language will enable participants to build a structure that can scale up to unprecedented height.
- PKC as the e-Catalog of cloud-enabled data assets
PKC is a general-purpose framework that uses an encyclopedic approach to categorize and publish all existing data resources in terms of data content, source code, executable binary, and real world software operational data. This publicized framework of data asset management approach allows all participants to operate their own instances of PKC by leveraging the operational experience of the entire PKC community.
- Automate the composition and decomposition of software components
Participant can choose to incorporate parts or all of PKC's functionalities based on the algebraic approach to compose and decompose the functionalities of an otherwise proprietary software infrastructure.
- Reuse source code and operational data derived from the entire PKC community
- Inter-organizational Workflows amongst a common code base
All code that needs to be written can become a portion of the data asset to be kept in PKC.
Stop Reinventing the Wheel
- PKC as a Meta Protocol for Data Assets
- Disseminate the most-recent-possible data that reflect verifiable truth
- Published data as a public Natural Resource[15]
References
- ↑ Buterin, Vitalik (2014). "Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform" (PDF). local page: ETHEREUM FOUNDATION.
- ↑ Wood, Gavin (April 7, 2022). "ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER" (PDF) (Berlin Version 934279c ed.). local page: ETHEREUM FOUNDATION.
- ↑ Meyerson, Michael (2002). Political numeracy : mathematical perspectives on our chaotic constitution. local page: Norton Publisher. ISBN 0393323722.
- ↑ Epp, Susanna (2020). Discrete Mathematics with Applications (5th ed.). local page: Cengage. ISBN 978-1-337-69419-3.
- ↑ To understand the intricate mechanisms of Lambda calculus, and why and how this simple language can be universal, please read this page:Dana Scott on Lambda Calculus.
- ↑ Scott, Dana (January 1, 1970). "Outline of a Mathematical Theory of Computation". local page: Oxford University Computing Laboratory Programming Research Group.
- ↑ Dolan, Stephen (July 19, 2013). "mov is Turing-complete" (PDF). local page: Computer Laboratory, University of Cambridge.
- ↑ This article: Deep Dive on Lua, explains how to turn wikitext into a functional programming language using
#swithc
and#if
magic words. - ↑ Koo, Hsueh-Yung Benjamin; Simmons, Willard; Crawley, Edward (Nov 16, 2021). "Algebra of Systems as a Meta Language for Model Synthesis and Analysis" (PDF). local page: IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS.
- ↑ Fong, Brendan (2016). The Algebra of Open and Interconnected Systems (PDF) (Ph.D.). local page: University of Oxford. Retrieved October 15, 2021.
- ↑ Rambaud, Salvador Cruz; Pérez, José García; Nehmer, Robert A.; Robinson, Derek J S Robinson (2010). Algebraic Models for Accounting Systems. local page: Cambridge at the University Press. ISBN 978-981-4287-11-1.
- ↑ Rambaud, Salvador Cruz; Pérez, José García (2005). "The Accounting System as an Algebraic Automaton". INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS. local page: Wiley Periodicals, Inc. 20: 827–842.
- ↑ This document is revised on 5 20, 2022
- ↑ Transparency of security rules can be encoded in published Smart contracts, so that participants can decide to participate or not based on reading the explicitly specified contracts.
- ↑ Slide/Fab City Full Stack