Wednesday, February 15, 2023

Lightning Talks @ Hacker Dojo - John Sokol





My talk about a a web3 architecture. 

Funny how one guy is like Why not use Mastodon, And a minute later another is tell how he was Banned from Mastodon.   

Mastodon, sorry looks like it's striking out. 

Seems the biggest challenge these people have is trying to P2P or the illusion of it while still being a nanny state. 







browser tab storage


In a browser tab,   can store persistently across reboots 5MB based on the server address.

And SessionStorage, as long as a tab is kept open, is Limited only by system memory; it could be many Gigabytes. 


How big can localStorage be?

LocalStorage and SessionStorage

Storage TypeMax Size

LocalStorage 5MB per app per browser. According to the HTML5 spec, this limit can be increased by the user when needed; however, only a few browsers support this 

SessionStorage Limited only by system memory 


 

SO we can choose to keep 5 MB permanently in your browser that the web site can access, (your keys,creditiats, boot code) and keep on the machine could be 100's of megabytes. 


 

Within an open tab


 

Using a RAID-like redundancy scheme, the data can be  broken up into pieces, called stripes. With a little added redundancy that will tell me what data is missing and be able to reconstruct before we lose enough to become unrecoverable..  


 

In raid,you can remove a hard drive, and replace it, it will begin to restore the missing data to the new drive based on redundant (extra copies) in the array

 

https://www.gwtproject.org/doc/latest/DevGuideHtml5Storage.html




Use of WebRTC for P2P communications directly between browser tabs. 


https://webrtc.github.io/samples/



RTCPeerConnection

https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection



https://www.youtube.com/watch?v=btKd-VLzNR0



The Strange Irony is all This originated from My 1997 video streaming company. 


https://www.ecip.com/fwdoc.htm


https://johnsokol.blogspot.com/2006/12/how-skype-punches-holes-in-firewalls.html



Ad-Hoc Hybrid star mesh topology


For this to work some users will need to run Node.js servers in the cloud to act as relays. in a Hybrid star mesh topology






black being HTML5 - browser Tabs, and white (node.js cloud instances) relays



A relay is most likely a Node.js instance that is used to establish webrtc links between tabs. 


Between relays is a publish and subscribe communications message bus that forms the layer 1 one of communications coordination and initialization and signaling. 


Also a Tab can enter a URL of another Relay and have its host relay establish a link. Bridging relays and allowing cross relay content searches, and its client tabs together into one larger network. 


I see relays as being more Slaves devices responding to request and serving a database of local clients with assets. 



Content addressing


All content addressed by relay name (URL or IP) will look like a normal website. 



Relays may use standard username and password or cookies to control access. And charge access fees per month. 


Relays may blacklist other relays or URL's??? 


Some relays may only be accessible through relays to gain credentials. 


http://Relay/Hash



Such as https://relay.johnsokol.com:2222/dd5db37a4c79ea699c3a8084d5a74aef


For the most part a address here is indistinguishable from a normal URL intended for a browser and indeed, it will serve up some default HTML content. It's really intended to be connected to by websocket. 


Externally when a user directly connects, they can  get one of only a few pieces of externally accessible content  This is a list of hashes or usl's -> hash defined in a config file loaded at startup. Whatever the admin has decided for them. Maybe a sign up page with payment. 


 Internally, on the bus the request is published.  A special Admin Tab, is keeping a database and subscribed, and furfulls  requests. 


   The Node will broadcast to all subscribing tabs the request and send the content to be sent as an HTML web page from the node.js relay.  For non-websocket basic http requests. 


  All other content requests from there are done using our internal messaging protocol over a websocket to the relay server to the tabs and p2p over webrtc directly between users tabs to avoid bandwidth consumption of the cloud relay. 


For this to work there must be a significant javascript communications layer running in the users tab. 



There should be some form of access control to the relay to prevent DOS attacks. Or maybe a way to put behind services such as cloudflare?


Some relays are only accessible by other relays and not openly available. Ie- websocket only interfaces. So HTTP content can be served from CDN. 



Basic Session flow


special Admin Tab

     A Tab based javascript that manages the Relay, and its backend services,  keeping a database of users  and subscribes to messages, and furfulls  some requests. 

 

 Some login credentials are in the Node.js config file and entered for the first time a Admin accesses a relay with a new Browser, from this point these are stored in cookies. 



Normal Tab


Regular web users click links on social media. They are directed to the Node relay that messages its websock connected clients.

These Posting clients are subscribed to the relay for certain requests and respond with the  content. Ideally this is a JS Client, that then can present the content as fetched over WebRTC. 


The receiving client Tab establishes a WEBRTC data transfer request, for decoded content. The initially needed Javascript and HTML needed to invoke it. 


From this point the open webrtc request can be made to established tabs, or use the existing websocket ro the relay and query where to make the next webrtc connections to load the next pieces of content. 


This is all Dynamic HTML , generated by Javascript internal to the browser. 


We can send over js code , and run in the browser to produce APPS in a distributed computing platform. 

For this application, it's to act as a file server. 



Still to work out 


Rules for Security and Access control models, so what should I allow hosted into my Tab?  

How to deal with Pigs, and bad actors?  Seems the initial responsibility is that of the relay/s.


What handles and tracking meta data do we want to keep on data?

    Do we want Authorities and other investigators to be able to do forensics?  

    This is more of a design decision embedded in the javascript libraries. 


Maybe using the 5MB local store to keep a persistence database , with user credentials , certs, and keys,  and boot javascript images. Configuration options, who gets to use up my memory and with what. 


Creation of an internal VM in javascript to allow daemon processes to subscribe to client disconnect or content lost messages to trigger data reconstruction and backup into tabs with available storage. 


Think storage should be block oriented. Databases , archive storage systems and such can be built on the top of this. 


We may need external addition stun , turn and Ice servers for webRTC communications initiation in addition to the node relay.   (TBD) 





RAID hard drive array -  like stripping across many servers, to improve performance and throughput.



With my ECIP codes ecip.com  I started here with the  basic Hamming codes and expanded much further. 


For the all purposes of this system a basic (7,4) hamming code will suffice.) 


We split our data file into 4 pieces. This can be done  first bite to file 1 , second to file 2, and so forth as if dealing poker cards.
Or it can be in blocks of 1KB or   ¼ the file length. 
The advantage to a dealing poker style in 1K blocks is speed and being able to reassemble while streaming a video or audio playback. 


These 4 files are then xor'ed to produce 3 additional files.  For a total of 7. 


We can corrupt 2 and recover all the data. Or loose 2 of 7 and still have all the data.


Since we are storing data in active browser tabs.  These can sleep, or go away without notice. 


So detecting this and reconstructing the missing file before additional loss is critical for data retention. 


This is where I started , it took me 4 years of  research to get to the ECIP codes. 


https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Raptor_Codes_IEEE_technical_analysis.pdf


Or raid across the online storage, and sense when data's gone missing and start recreating the needed missing copies so as to not lose data. 

still the risk of massive network outages exists.  Such as  a blackout, or internet cable cut. 

users could store data and restore archive files. 

javascript can generate saveable files such as a gif of jpeg. 

I have collected all of the needed examples, and just a matter of assembling it all. 


https://en.wikipedia.org/wiki/Erasure_code


https://www.usenix.org/conference/fast15/technical-sessions/presentation/xia


System Vulnerabilities. 


Getting busted for content. 

   Stored files such as copyright movies could be posted by malicious actors such as Disney.  And track every browser's IP that attempts to access it. This is used with Torrents.
This might also be possible by man in the middle proxy servers such as the type Xfinity and Uverse Internet have. 



Such that the mere act of holding a block from this content puts you in jeopardy.  



Public pad cryptography. 

This also goes along with an idea I have played with that I call , "public pad cryptography"

designed to break code breaking equipment.  It was a neat idea back when writable CD's came out.


Both a properly compressed file and a properly encrypted file both should be nearly indistinguishable from noise or a file of random numbers. 


But they can be fingerprinted and tracked, checksummed. And identified as contranded content. 


On the other hand, if the system would demand you need some other random block from some other content, then even a non copyright infringer could have legitimate reason to hold part of that file as a key to decode other content .  there by blending or mixing all of the data up. 


This can be done a simply as the Xor process needed to do the hamming codes 


It is super efficient. And will completely wreck standard code breaking techniques as it pushes against the memory bottleneck rather than the compute bottleneck. 


Other similar existing projects with working example code. Components that could be built upon.

PeerJS


PeerServer: A server for PeerJS

PeerServer helps establish connections between PeerJS clients. Data is not proxied through the server.


https://peerjs.com


https://github.com/peers/peerjs-server#readme


PeerServer

PeerServer: A Server in the Browser with WebRTC


https://github.com/PeerServer/peer-server/blob/master/README.md


https://www.youtube.com/watch?v=yQH5Vkzw8ko

https://www.youtube.com/watch?v=w76V3H1Q6HI


Croquet

https://croquet.io/docs/index.html

OS-JS - Super cool. 

I didn't realize the file system is remote and shared by everyone at first.
  As such the demo site gets corrupted after a while. 

https://demo.os-js.org/

https://www.os-js.org/



CYFS is short for CYberFileSystem


https://www.cyfs.com/
https://github.com/buckyos/CYFS


Holly shit, nearly identical to what I was thinking. After a solid 2 weeks of researching I only found out about this because of an upcoming hackathon with a $10,000 prize.


Croquet

Instantaneous Shared Experiences is how we describe Croquet on our website

https://blog.codefrau.net/2021/08/what-is-croquet-anyways.html


https://croquet.io/docs/croquet/


Part of Bluesky initiative. 

Gun DB.    - Mark Nadal  Gave talk in dojo in January. 

https://gun.eco/

https://github.com/amark/gun



Nostr 

https://nostr.com/

https://github.com/nostr-protocol/nostr


What is Nostr?

A decentralized network based on cryptographic keypairs and that is not peer-to-peer, it is super simple and scalable and therefore has a chance of working. Read more about the protocol. You can also reach us at our Telegram group (while we don't have a decent group chat application fully working on Nostr).




They now have a javascript port of IPFS  (a complete game changer for the Eth/IPFS model) 

https://js.ipfs.tech/




DAG

https://docs.ipfs.tech/concepts/merkle-dag/


Articles:


Erasure Codes:

Nodes are going to dethrone tech giants — from Apple to Google

https://cointelegraph.com/news/nodes-are-going-to-dethrone-tech-giants-from-apple-to-google




Authors History: 


Original Amorphous OS talk from 2000 at the ACCU. 

https://www.johnsokol.com/~sokol/amorp/amorphous1_files/v3_document.htm



Meshcast P2P Business plan 2001.

http://www.dnull.com/meshcast/meshcast/



Decash - crypto payment system 2004

https://web.archive.org/web/20040328083934/http://decash.com/



https://johnsokol.blogspot.com/2018/02/opensecret-unforgeable-proof-of.html


FileSharing P2P & SOA

Idea is to combine Peer to Peer and Server Oriented architecture. This will be especially useful for very large files (over 500 MB files).

https://docs.google.com/document/d/e/2PACX-1vQiWlARGLBLIjzC4HT-KGyRanXHsirJm-GRjFeDaGKQyh1ZZKm7VuzxCW2uUCUwyDq4VZYkcAfrsQcA/pub


Other things to see: 



Full Linux on Intel X86 running in a webassembly based emulator in a browser tab. 


https://churchofbsd.blogspot.com/2020/06/jslinux-pcx86-emulator-in-javascript.html


https://churchofbsd.blogspot.com/2020/06/a-browser-on-linux-with-x-windows-in.html





from my notes in Amorphous.  2012 

File System

Global File system address. 



Like the URL scheme, but rather than point to a host and domain name. Instead we can point to a network file system object, there could be several. These too inherit their security permissions and preinstalled configs, appropriate version dependencies and base objects.

The intent is every application has a constructor or factory that builds its world before it executes. Version appropriate shared libraries will be loaded as needed.

Network, distributed Cache system. Built around git style version control and code distribution. 




Could be based on the GIT Hash Abstraction layer. 

  Where files are mapped to a HashKey reference in a directory listing ( index ) 

  Duplicates are automatically consolidated. 

  Updates and reverts are quick. 

If a file system were based on this then rather than checkout a version, you mount a version and branch. 

you can share back your changes, people check these in and out. 



Changes from git, you can mount and build file systems for different apps. carefully quarantine programs.  

Infinite granularity. 



Directory entries in JSON or equivalent directory name value pairs. Recursive objects defined. 

that finally point to actual files and the index directory file can have complete granularity in control and permissions. 

File is actually pointed to as a hash key, within the hash key is an object who matches that hash or a placeholder object that references the location to fetch the real one. 



Software install/upgrade/revert/remove is a git checkout for a sub-directory within a larger repo. 



Applications are just a link to a directory entry. This intern starts the retrieval engine who will query it's upstream caches for instructions on how to fetch this data. 




Files can be encrypted. can call other dependent objects to perform encryption or compression or other translation or interpretation. 



File attributes include:

  Ownership, license, URL, EMAIL, (TRACK, BACKTRACE, REPORT, UPDATE) Callback URL. 

  Permis



No comments: