Windows RTC    




What is windows RTC?

The quick answer is that it should be like Microsoft Teams or Zoom, or YouTube streaming, but without needing to sign up. Or like Apple's Facetime, without needing to buy an Apple device (note though that Apple devices restrict full webcam access to all but their own Safari browser and still restrict Safari's access to MediaRecorder object unless experimental features are enabled).

The more correct answer is that it is to test and support the important work of the W3C consortium, in defining the HTML5 protocol, and the more recent RTC group, which may in principle mean that web browsers are allowed to communicate video with each other.

The full answer is a bit long, and involves the history of the internet, as a sort of batte for survival of individuals' right to access it, supported by organizations like Netscape, Mozilla, LINUX, and the RTC. You almost certainly know this already, but you can if you really need to read it.




Why is it called `Windows RTC'?

The word 'Windows' here, on first reading, should mean RTC (real time chat) which is available to anyone who has a computer (or phone). Especially, if someone has not figured out how to get LINUX.

The best alternative would be to tell someone, "Just get LINUX, install a websocket server, and hire a DNS service pointing to it." That can depend on permission of the internet provider, and is not going to work unless users actually own their WIFI connection. We want something which allows users to share their real time media (e.g. stream videos of themselves and their voice) without needing permission of any centralized industry at all.

As a sort-of testament to the work of W3C and HTML5, it ought to be possible now for people to just have an HTML file locally which shows their browser how they want to present themselves to another person, who has nothing but a browser and internet connection.

This is not possible, for two good reasons. First is the absence of availability of DNS services to ordinary people, and second is the still-evolving 'cross-origin' policy of HTML5. The convention nowadays is that two people would both go to the same website, which hosts the shared files which contain the video and audio data. Also, as a practicality, something like this has to happen as for n people chatting, there are n2 channels of data, and if we think of using the whole internet.

Crucially though, we do not want the server to be required to, or have permission to, do anythning to those files. We do not want any requirment that the files are transcoded.


What are the specifications of Windows RTC?

The definition of Windows RTC (which really means, genuinely open RTC) -- as opposed to something like 'Microsoft Teams', is this


    1. (Active Browser) Any user can share their real time video and audio with any other user only using a web browser complying with W3C definition of HTML5.

    2. (Passive Server or no server) If files are shared on a server, the server must do nothing to the files besides store them while they are being shared.


These conditions rule out websockets, at the moment, because websockets can `push' files to users. There is a four-fold improvement in latency if we allow websockets. If a server is only a repository, under current polite protocols, each user can go look at a directory file, when the contents are received, he/she can then ask to receive a file, and then those contents are received. This is four steps. Websockets turn this into one step, where the server actively sends a file uninvited. A websocket can't be efficiently written in PHP because it requires a PHP program to be in a sleep loop until the file occurs, and uses one whole dedicated server thread.




    What is the aim of Windows RTC?

The aim is to eventually remove the role of the server altogether, to be absolutely sure that it does not become enshrined as a requiremnt for RTC. Hence, currently, in Windows RTC, we allow a server -- just because of existing and fair (though evolving) constraints of cross-origin policy.


    Is there an existing implementation Windows RTC satisfying such restrictive constraints?

You can try this implementation using the link at the top of the page. If you want your own copy, just unzip this 94 kb file chat.zip, and put index.php in a folder on your own server, or any server. Give it permission 0755 and be sure that php files in that folder have permission to write in that folder. The first time it is run it will create index.html, and auxiliary php files which it needs. From then on anyone who browses to that folder can chat with anyone else who is there. Make sure that the server directs successive users to index.html instead of index.php; to be sure, you can delete index.php from the server after the first use.

    How does it work?

As users chat, a cyclic sequence of small .webm blobs (2 to 3 kb) is created on the server for each user. These are indexed only by user number and buffer number so they are constantly over-written. The stream is interrupted periodically in case audio/video latency disagree. A directory is kept of the most recent written file for each user. To cover rare cases when files might arrive out of order, each instance of the php file waits until its most recent dependency is resolved before recording updating the directory, resulting in occasional waiting cascades which aren't noticeable. Each user's blobs are fed to 'media source array buffers' which feed the video elements. The variable 'bufferDuration' is set in milliseconds at 200, this can be reduced if all users have high CPU devices but is necessary if there is any CPU lag. PHP file locking prevents interleaving of directory entry updating. The javascript coding, unsurprisingly, involves nested callbacks/promises.