Difference between revisions of "Reinforced Learning with Human Feedback"
Line 1: | Line 1: | ||
[[RLHF]] is also known as [[Reinforced Learning with Human Feedback]]. One way to think of [[RLHF]] at scale is to allow all [[human annotation]] and [[data editing history]] as a part of [[RLHF]]. This is particularly possible with a web-based interface that captures human inputs on portable networked devices. On top of that, if all these history are ordered and encoded with block numbers of some public [[blockchain]], the occurrence of all these [[RLHF]] activities could be universally integrated to reflect some statistical behavior of human intent as a collective. This is where [[accountability]] of individual and the collective can be operationally executed. | [[RLHF]] is also known as [[Reinforced Learning with Human Feedback]]. This is a testing or training strategy for data scientists to make sense or assign social meaning to data by human feedback. One way to think of [[RLHF]] at scale is to allow all [[human annotation]] and [[data editing history]] as a part of [[RLHF]]. This is particularly possible with a web-based interface that captures human inputs on portable networked devices. On top of that, if all these history are ordered and encoded with block numbers of some public [[blockchain]], the occurrence of all these [[RLHF]] activities could be universally integrated to reflect some statistical behavior of human intent as a collective. This is where [[accountability]] of individual and the collective can be operationally executed. | ||
<noinclude> | <noinclude> |
Revision as of 06:41, 27 May 2023
RLHF is also known as Reinforced Learning with Human Feedback. This is a testing or training strategy for data scientists to make sense or assign social meaning to data by human feedback. One way to think of RLHF at scale is to allow all human annotation and data editing history as a part of RLHF. This is particularly possible with a web-based interface that captures human inputs on portable networked devices. On top of that, if all these history are ordered and encoded with block numbers of some public blockchain, the occurrence of all these RLHF activities could be universally integrated to reflect some statistical behavior of human intent as a collective. This is where accountability of individual and the collective can be operationally executed.
References
Related Pages