If you like playing games, Warcraft 3 must be on that list. You may be curious about the video files exported by the game-why the video is only a few hundred KB even after you have played the game for an hour? Soon you will realize what have a happened-the map inside the game has to be reloaded almost every time when you import the replay video. If you skip the step, the video won’t be played
Actually the data recorded in the video is not a video file, but a series of actions come up with time stamps. When importing the map, you actually initialize a state. In this state, once the previous actions are restored, you can restore the whole previous game process. This is what we called the basic principle of reply
For a video, it greatly reduces the size, assuming that we need to record an hour of 1080p24f video, in the case of uncompressed video
Frames = 3600s * 24 = 86400frame
Supposing that each logical pixel is represented by three primary colors of RGB,
each primary color is 8 bits (256 colors)
Frame size = (1920 * 1080) pixels * 8bits * 3 = 49766400bits
Converted to KB is 49766400bits / 8/1024 = 6075KB
Total video volume = 6075KB * 86400 = 524880000KB ≈ 500GB
So compared with the traditional video recorder, assuming that the recording is 500KB, then the volume is reduced by about 524880000KB / 500KB ≈ 1000000 times in theory
In fact, the Web recorder also draws on such an idea, which is generally called Operations Log. In essence, it's recording a series of browser event data, re-render using the browser engine, and restore the previous operations.
From a practical perspective, even if you compare the compressed video with an H.265 compression ratio of several hundred times, you can save at least 200 times in volume
So the question is coming: Why do we have to record web pages? What are the scenarios?
I have though about the following aspects
- The anomaly monitoring system, such as LogRocket, it can be understood that he is a tool that integrates Sentry and Web Recorder, which can playback the graphical interface and data logs of webpage errors to help Debug
- Recording the user's behavior for analysis, such as MouseFlow. LiveSession, "connect" to the user's to see what people do through live streaming
- Collaborative tools, RPA, Webpage Operation Tracking, etc. will also involve similar technologies
Technical details of the TimeCat
The node data of the page can be easily obtained through the DOM API, but for our needs, it is obvious that the data provided by the DOM Node is too redundant. This step is to simplify the information by referring to the design of VirtualDom
interface VNode {
type: number
id: number
tag: string
attrs: Attrs
children: VNode[]
extra: Extra
}
After deep traversal of the DOM, the DOM is mapped to a VNode type node. The Nodes to be recorded are mainly three types ELEMENT_NODE
, COMMENT_NODE
and TEXT_NODE
. After deserialized, it can be restored the state
there are some nodes and attributes that need special treatment, such as
-
Input
and other types ofvalue
checked
cannot be obtained from the DOM, and need to be obtained from the node -
The content of the
script
tag will not be executed later, so it can be directly skipped or marked asnoscript
-
SVG
can be obtained directly, but it and it's children needs to usecreateElementNS ("http://www.w3.org/2000/svg", tagName)
to create -
If the
src
orhref
attributes are relative paths, they need to be converted to absolute paths ......
DOM changes can use MutationObserver
, listen to attributes
, characterData
, childList
three types of changes
const observer = new MutationObserver((mutationRecords, observer) => {
// Record the data
})
observer.observe(target, options)
With the help of the ability combination of WindowEventHandlers
addEventListener
, etc., you can monitor a series of operation events on the page
- Add Node Action
- Delete Node Action
- Change Attribute Action
- Scroll Action
- Change Location Action ...
Record mouse actions through mouseMove
and click
events
For the MouseMove
event, it will be triggered frequently during the movement, resulting in much redundant data. Such data will waste a lot of space, so for the mouse tracking, we only collect a small number of key points, the simplest method is to use throttling to reduce the amount of data generated by the event, some disadvantages will come up:
- Critical mouse coordinate data may be lost in the intercepted
- huge data will be generated even if the movement distance is long. The better way is to calculate the movement trajectory through the
Spline Curves
.
We can watches the input via input
blur
focus
event of Node.addEventListener
, but this can only listen to the user's behavior. If we assign values via JavaScript, we can't listen to the data Changes, at this time we can hijack some special properties through Object.defineProperty
, without affecting the target, forward the new value to the custom handle, and handle the change in a unified method
const elementList: [HTMLElement, string][] = [
[HTMLInputElement.prototype, 'value'],
[HTMLInputElement.prototype, 'checked'],
[HTMLSelectElement.prototype, 'value'],
[HTMLTextAreaElement.prototype, 'value']
]
elementList.forEach(item => {
const [target, key] = item
const original = Object.getOwnPropertyDescriptor(target, key)
Object.defineProperty(target, key, {
set: function(value: string | boolean) {
setTimeout(() => {
handleEvent.call(this, key, value)
})
if (original && original.set) {
original.set.call(this, value)
}
}
})
})
Because the DOM Diff Patch is implemented with the MutationObserver, it is necessary to collect and process the changed records, which involves some key issues: For instance, the timing of DOM changes is sequential, and Mutation can only be summarized as adding and deleting, However when calling insertBefore or appendChild, it will cause movement. These nodes must be processed and marked as moved, otherwise, the loss of node references may cause rendering errors.
Can I Use MutationObserver shows that only in IE11 and later, Android 4.4 and later can be used, compatible with the old browser can be through MutationObserver-shim to support, but using shim may cause some fatal bugs of the collected data. there is also a situation that some websites may block the MutationObserver API, we can restore the Native Code
by creating an Iframe
- Canvas: Use monkey patching to extending or modifying the original AP to get the corresponding action
- Iframe: In the non-cross-domain state, you can also directly access the internal nodes to record, similar to Shadow DOM etc
- Video: By Detect the HTMLVideoElement to get and record video status
- Flash: Record by screen capture
After loading HTML, it will refer to many external resources, usually in many forms
For Example:
- Absolute path
<img src="/xx.png" />
- Relative path
<img src="./xx.png" />
- relative to the current path
<img src="xx.png" />
- The Protocol-relative URL
<img src="//cnd.xx.png" />
- Responsive images
src="www.xxx.png" srcset="www.xxx.png 1x, www.xxx.png 2x"
...
The above requires a converter to deal with the path problem. In the Deserialize
stage, they can be converted to the absolute path under the original domain in order to load normally in cross-domain
There is also a situation where the problem of loading resources for third-party resources requires the proxy server.
Usually, due to the limitation of the resources of the recorded website with the CORS Policy. The solution is that you can add a white list or ignore it if the resources are controllable. The other is to use a proxy server.
Reference article: 3 Ways to Fix the CORS Error
Before starting to play, we need to restore the previous data to the real DOM. This will take a certain time and you will see a white page, which depends on your browser performance and recording web page resources. Refer to FMP (First Meaningful Paint)
, during the loading process, the skeleton map can be dynamically generated from the previously mapped data, and wait for the FMP to send the Ready signal before playing
Reference article: Time to First Meaningful Paint
When the user moves the mouse on the page, many mouseMove
events will be generated. The coordinates and timestamp of the track are obtained through const { x, y } = event.target
If I use a mouse to trace a track on the page, I may get the coordinate points like the picture below
However, on most occasions we do not require 100% restoration of accurate mouse path for the scenario of recording, and we only care about two situations:
1. Where does the mouse click?
2. Where does the mouse stay?
After simplifying the mouse path through these two strategies, it takes only about 6 points to draw a 💖, to simulate the virtual path of the mouse through the spline curve
After filtering out the key points through the rules, the B-spline curve calculation function is used, When redrawing the mouse position during rendering, you can get a mouse with an approximate curve Track
When we constantly taping the content in an input box, our Watcher function will continuously respond to events, through Event.target.value
you can get the latest value of the current HTMLInputElement
, you can use the throttling function to filter Some redundant responses are dropped, but it is not enough. For example, the text in a TextArea will be very long and long. Assuming the length of the text is n, we add 10 characters after the text, then the response The length is:
10n + ∑(k=1, n=10)
Visible will produce a lot of data
After passing by Diff Patch, modifying the string abcd
to bcde
can be expressed as:
const patches = [
{ type: 'delete', index: 0, count: 1 },
{ type: 'add', index: 3, value: 'e' }
]
We can obtain and process some personal privacy data through the annotation of the DOM during development Node.COMMENT_NODE
like: <!-- ... -->
)annotation. Based on agreed statements, we only have to process the requirements of the DOM tag that needs to be desensitized. For example, if want hide tag<button>
, we need change it into<!--hidden--><button><button>
The recorded content may be provided by a third party, which means that there may be certain risks. for example: <div onload="alert('something'); script..."></div>
, or some events in our player may also affect the playback content, so we need a sandbox to isolate the playback environment, Iframe Sandbox
provided by HTML5 remains a good choice, which can help us easily isolate the environment such as:
- Script cannot be executed
- Cannot send ajax request
- Cannot use local storage, ie localStorage, cookies, etc.
- Cannot create alert and windows, such as window.open or target = "_ blank"
- Cannot send form
- Cannot load additional plugins such as flash etc.
- Cannot perform tricky for autoplay. For example: autofocused, autoplay
The player will have a accurate timer. The action data is stored in a stack. Each data is a frame. With RAF(RequestAnimationFrame)
to exec the next frame
pause timer through cancelAnimationFrame
double the speed of the acquisition rate
Audio recording can be provided by the HTML5 WebRTC. Since it mainly records human voice, it doesn’t not need high standard in recording quality. I thus chose the 8000 sample rate, 8-bit rate and mono PCM recording format, later can be converted into lossy compressed mp3
format to save space. Subtitles will be automatically generated after analyzing the recording files by some third-party services
Gzip generally compresses the transmitted data in the network application layer, but our data does not only exist in the database, there may be three storage type:
- The server stores TCP => DB
- Local storage LocalStorage, IndexedDB, Web SQL
- The data is persisted in the script and saved as a local file, such as directly exporting a working HTML file
Greatly reducing the data size before exporting or transmitting,
On the client side, the compression based on Gzip
, As a result I chose Pako to compress the data. As the core of Gzip is Deflate, and Deflate is based on LZ77 and Huffman tree. The text data is converted into Uint8Array
through Gzip, and then Uint8
is converted into the corresponding ASCII
. The advantage to that is each encoding only use 1byte which reduce the volume by about 5 times by compression
We can use indexedDB to store client data. IndexedDB has much larger storage room than LocalStorage with generally no less than 250MB or even no upper limit. ON top of that, it utilizes object store and is available to transaction. The important point is that it is asynchronous. That means it will not block the operation of the Web Recorder. The data can be uploaded to the server after that
The RollUp packer can generate multiple formats, such as UMD
and ESM
, etc. Loading SDK in the project or using the Chrome plug-in to inject the UMD module, we can easily load the code and control it to record the data.
Thanks for sharing from the technical community
Thanks to RRWEB for technical sharing