The quick answer is that it should be like Microsoft Teams or Zoom, or YouTube streaming, but without needing to
sign up. Or like Apple's Facetime, without needing to buy an Apple device (note though that Apple devices
restrict full webcam access to all but their own Safari browser and still restrict Safari's access to MediaRecorder object unless experimental
features are enabled).
The more correct answer is that it is to test and support the important work of the W3C consortium, in defining the HTML5 protocol,
and the more recent RTC group, which may in principle mean that web browsers are allowed to communicate video with each other.
The full answer is a bit long, and involves the history of the internet, as a sort of batte for survival of individuals' right
to access it, supported by organizations like Netscape, Mozilla, LINUX, and the RTC.
You almost certainly know this already, but you can
if you really need to read it.
(As an aside: to see
the danger of RTC being taken over by manipulations by Google, Zoom, Teams, Facebook, etc, you'd have to think about the
history of the internet.
Before the internet, around when IBM PC became accepted as a standard
micro-computer architecture, Microsoft enforced a patent on DOS, the pared-down version of UNIX which
they'd written for it, and later began marketing an idea which also Apple had been using, of having
an intuitive interface with movable 'icons' on the screen representing files, and 'windows' representing
command-line interfaces of applications like 'Word.' Eventually they trademarked the word 'windows.'
With the advent of the internet, Microsoft attempted to obscure its free and fair nature by breaching
the informal firewall between users' files and the internet. Microsoft did this by hiding 'file extensions,'
by creating a non-removable internet browser called `internet explorer' which breaches the informal firewall etc.
The firewall between users and the internet was important then because it is the only thing that stood between
Microsoft's ownership of the PC market and them actually owning the internet.
The informal firewall became understood, and was informally built into browsers like Netscape and Mozilla Firefox.
Young people nowadays know that Internet Explorer isn't a good browser, but they mistakenly think it is because
we all have a right to bully one browser into non-existence due to something like performance issues. No-one
remembers that it had been an impediment to users' rights to know which files on their device are private and which are being presented
to the internet.
Later, after Google rose to prominence by having a fair search engine, they did the same for phones,
attempting to obscure the open-source Android system.
The work of the W3C amounts to something more important than anything the founders of the US constitution had done.
They kept Berners-Lee's original vision of the internet from being transformed into a corporate product, by constantly
clarifying the notion of that un-named firewall, now called the `browser sandbox.'
Also, all this time, the LINUX operating system has been in existence, which is independent of corporate influence.
In conclusion, the wonderful thing to keep in mind about the history of the internet is the invention of the HTML5 protocol
and the browser sandbox, which clarify that when a person is given a device, or creates content on a device,
they retain the contents of the files
on that device, and can decide with autonomy what to do with them, whereas, when that person decides to present a file
to the internet, that person who has put their information online has a right to know that they have
done so; and once it is online, it is available to anyone without exception and everyone has an equal right to have it.
The buttons on websites inviting people to 'share' or 'not share' mean something different, by the way, there is constant
deception; most people do not fall victim to it.)
Why is it called `Windows RTC'?
The word 'Windows' here, on first reading, should mean RTC (real time chat) which is available to anyone
who has a computer (or phone). Especially, if someone has not figured out how to get LINUX.
The best alternative would be to tell someone, "Just get LINUX, install a websocket server, and hire a DNS service pointing to it."
That can depend on permission of the internet provider, and is not going to work unless users actually own their WIFI connection. We want something which allows
users to share their real time media (e.g. stream videos of themselves and their voice) without needing permission
of any centralized industry at all.
As a sort-of testament to the work of W3C and HTML5, it ought to be possible now for people to just have an HTML file
locally which shows their browser how they want to present themselves to another person, who has nothing but a browser
and internet connection.
This is not possible, for two good reasons. First is the absence of availability of DNS services to ordinary people,
and second is the still-evolving 'cross-origin' policy of HTML5.
The convention nowadays is that two people would both go to the same website, which hosts the shared files which
contain the video and audio data.
Also, as a practicality, something like this has to happen as for n people chatting, there are n2 channels
of data, and if we think of using the whole internet.
Crucially though, we do not want the server to be required to, or have permission to, do anythning to those files.
We do not want any requirment that the files are transcoded.
What are the specifications of Windows RTC?
The definition of Windows RTC (which really means, genuinely open RTC) -- as opposed to something like 'Microsoft Teams',
is this
1. (Active Browser) Any user can share their real time video and audio with any other user only using a web browser
complying with W3C definition of HTML5.
2. (Passive Server or no server) If files are shared on a server, the server must do nothing to the files besides store them while they are being shared.
These conditions rule out websockets, at the moment, because websockets can `push' files to users. There is a four-fold
improvement in latency if we allow websockets. If a server is only a repository, under current polite protocols, each
user can go look at a directory file, when the contents are received, he/she can then ask to receive a file, and then those
contents are received. This is four steps. Websockets turn this into one step, where the server actively sends a file
uninvited. A websocket can't be efficiently written in PHP because it requires a PHP program to be in a sleep loop
until the file occurs, and uses one whole dedicated server thread.
What is the aim of Windows RTC?
The aim is to eventually remove the role of the server altogether, to be absolutely sure that it does not become
enshrined as a requiremnt for RTC. Hence, currently, in Windows RTC, we allow a server -- just because of existing and fair (though evolving) constraints
of cross-origin policy.
Is there an existing implementation Windows RTC satisfying such restrictive constraints?
You can try this implementation using the link at the top of the page. If you want your own copy, just unzip
this 94 kb file chat.zip, and put index.php in a folder on
your own server, or any server. Give it permission 0755 and be sure that php files in that folder have permission to write in that folder. The first time it is run it will create index.html,
and auxiliary php files which it needs. From then on anyone who browses to that folder can chat with anyone else who is there. Make sure that the server directs successive users to index.html instead of index.php; to be sure, you
can delete index.php from the server after the first use.
How does it work?
As users chat, a cyclic sequence of small .webm blobs (2 to 3 kb) is created on the server for each user. These are indexed
only by user number and buffer number so they are constantly over-written. The stream is interrupted periodically in case
audio/video latency disagree. A directory is kept of the most recent written file for each user.
To cover rare cases when files might arrive out of order, each instance of the php file waits until its most recent dependency is resolved before recording
updating the directory, resulting in occasional waiting cascades which aren't noticeable. Each user's blobs are fed to 'media source array buffers' which feed the video elements.
The variable 'bufferDuration' is set in milliseconds at 200, this can be reduced if all users have high CPU devices but
is necessary if there is any CPU lag. PHP file locking prevents interleaving of directory entry updating. The javascript coding,
unsurprisingly, involves nested callbacks/promises.