Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解决 md + assets/textbundle 图片 HTTP 方式保存的防盗链问题 #6900

Open
ttimasdf opened this issue Jan 14, 2025 · 4 comments
Open

Comments

@ttimasdf
Copy link

ttimasdf commented Jan 14, 2025

请描述你的建议或任何内容

目前同步助手( #4041 )有两个保存图片的方式,一个是 axios 从 node 侧下载图片(HTTP方式),一个是简悦插件从浏览器侧下载图片(base64方式),从浏览器上拉取,虽然也算是解决了 Referer 问题,但由于是从页面加载图片,不仅没有省去重新下载一遍的过程,还会受到浏览器 CORS 的限制,反而不如 node 侧更灵活。

以这个文章为例( https://www.52pojie.cn/thread-1879263-1-1.html ),用 base64 方式保存会遇到以下错误。

Access to fetch at 'https://avatar.52pojie.cn/data/avatar/001/79/30/63_avatar_middle.jpg' from origin 'https://www.52pojie.cn' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
VM280:91

Uncaught (in promise) TypeError: Failed to fetch
    at eval (eval at c (contentscripts.js:1:1), <anonymous>:91:33)
    at new Promise (<anonymous>)
    at toBase64 (eval at c (contentscripts.js:1:1), <anonymous>:90:12)
    at uploads (eval at c (contentscripts.js:1:1), <anonymous>:178:35)
    at base64 (eval at c (contentscripts.js:1:1), <anonymous>:205:15)
    at HTMLAnchorElement.exportss (eval at c (contentscripts.js:1:1), <anonymous>:251:55)
    at HTMLHtmlElement.dispatch (common.js:7:20091)
    at g.handle (common.js:7:16836)

大部分图床的防盗链模式,其实就是校验 http header 中的 Referer,Referer 为空或非本站域名则报错。解决方式其实就是在 axios 请求时,加入 Referer 即可。

补充说明

由于同步助手没有开源,我将在下面的评论里放我的 patch。

目前的实现是从 content 里正则匹配 excerpt 获取原文 URL 用于 Referer 字段,我认为的最佳实践是在插件请求同步助手 /nofilebundle API 的时候,加一个字段如 url,把当前页面的原文 URL 传递给后端,这样会最为稳定,且不会因为用户更改配置而失效。

@ttimasdf
Copy link
Author

ttimasdf commented Jan 14, 2025

在这里放 patch,方便 owner 隐藏或删除。

在这个 patch 中,同时包含的更改有:

  1. 功能:asset 文件后缀名使用正确的源文件格式,( .jpg .gif etc) 而非硬编码的 .png
  2. 修复:下载失败的文件 URL 在 content 中被错误替换成不存在的 asset 文件路径
  3. 功能:对 images 进行去重,content 中相同的 ![caption](image_url) 内容对应的 asset 文件将不会被重复下载。

测试 URL

  1. https://www.52pojie.cn/thread-1879263-1-1.html
  2. https://sspai.com/post/95588
diff --git a/src/main/files.js b/src/main/files.js
index b087cbe..6191af0 100644
--- a/src/main/files.js
+++ b/src/main/files.js
@@ -200,6 +200,17 @@ function copy( title, output, data ) {
     }
 }
 
+/**
+ * Get Image Suffix
+ * 
+ * @param {string} path
+
+ */
+function suffix( path ) {
+    let suffix = path.match( /(\.\w{3,4})\)?$/ );
+    return suffix ? suffix[1] : '.png';
+}
+
 exports.write     = write;
 exports.read      = read;
 exports.folder    = folder;
@@ -208,4 +219,5 @@ exports.find      = find;
 exports.remove    = remove;
 exports.removes   = removes;
 exports.multi     = multi;
-exports.copy      = copy;
\ No newline at end of file
+exports.copy      = copy;
+exports.suffix    = suffix;
\ No newline at end of file
diff --git a/src/main/server.js b/src/main/server.js
index 686c38d..103863e 100644
--- a/src/main/server.js
+++ b/src/main/server.js
@@ -1023,6 +1023,16 @@ app.post( '/textbundle', function( req, res ) {
  * assets without .textbundle
  */
 app.post( '/notextbundle', function( req, res ) {
+    const download_headers = {
+        "User-Agent": req.headers["user-agent"],
+    };
+    const server_url_regex = /(?:\nurl: |转码, 原文地址 \[[^\[]*\]\()(http(s)?:\/\/[^\/]+)/ig;
+    const match = server_url_regex.exec(req.body.content);
+    if ( match && match[1] ) {
+        download_headers["Referer"] = match[1];
+    }
+
+    console.log(`download headers: ${JSON.stringify(download_headers)}`);
 
     const title  = req.body.title,
           output = files.multi( req.body.title + '.assets', req.body.path ? req.body.path : config.storage.db.export.output == '' ? config.storage.db.root + '/output' : config.storage.db.export.output, config.storage.db.export.multi ),
@@ -1074,7 +1084,7 @@ app.post( '/notextbundle', function( req, res ) {
         files.folder( folder, () => {
             // download image
             if ( images.length == 0 ) {
-                textbundle.download( assets, req.body.content, media, write );
+                textbundle.download( assets, req.body.content, media, write, true, download_headers);
             } else {
                 textbundle.base64( assets, images, req.body.content, media, write );
             }
diff --git a/src/main/textbundle.js b/src/main/textbundle.js
index f9250be..38702bb 100644
--- a/src/main/textbundle.js
+++ b/src/main/textbundle.js
@@ -19,28 +19,29 @@ const info = {
  * @param {string} media ![](<media>.png)
  * @param {func} callback
  * @param {boolean} is_textbundle
+ * @param {object} headers HTTP Headers for image requests
  */
-function download( assets, content, media, callback, is_textbundle ) {
-    const images  = content.match( /\!\[(\S+)?\]\(http(s)?:\/\/[^)]+\)/ig );
+function download( assets, content, media, callback, is_textbundle, headers ) {
+    const images  = Array.from(new Set(content.match( /\!\[(\S+)?\]\(http(s)?:\/\/[^)]+\)/ig )));
     images && images.length > 0 ? files.folder( assets, () => {
         let index = 0;
-        let ids = {};
+        let outfiles = {};
         const MAX = images.length,
-              cb  = error => {
+        cb  = error => {
             index++;
-            if ( !error ) {
-                const img = is_textbundle ? `![](${ media }/${ ids[ index - 1 ] }.png)` : `![](<${ media }/${ ids[ index - 1 ] }.png>)`;
-                content   = content.replace( images[index-1], img );
+            if ( !error.code ) {
+                const img = is_textbundle ? `![](${ media }/${ outfiles[ index - 1 ] })` : `![](<${ media }/${ outfiles[ index - 1 ] }>)`;
+                content   = content.replaceAll( images[index-1], img );
             }
             if ( index >= MAX ) {
                 callback( content );
             } else {
-                ids[index] = +new Date() + index;
-                save( images[index], assets + `/${ids[index]}.png`, cb );
+                outfiles[index] = +new Date() + index + files.suffix(images[index]);
+                save( images[index], assets + `/${outfiles[index]}`, cb, headers );
             }
         }
-        ids[index] = +new Date() + index;
-        save( images[index], assets + `/${ids[index]}.png`, cb );
+        outfiles[index] = +new Date() + index + files.suffix(images[index]);
+        save( images[index], assets + `/${outfiles[index]}`, cb, headers );
     }) : callback( content );
 }
 
@@ -50,20 +51,22 @@ function download( assets, content, media, callback, is_textbundle ) {
  * @param {string} image url
  * @param {string} path assets/x.png
  * @param {func} callback
+ * @param {object} headers HTTP Headers
  */
-function save( url, path, callback ) {
+function save( url, path, callback, headers ) {
     url = url.replace( /^!\[\S*\]\(|\)$/ig, '' );
     axios({
         url,
         method      : 'GET',
+        headers,
         responseType: 'stream'
     }).then( response => {
         response.data.pipe( fs.createWriteStream( path ));
-        callback( undefined );
+        callback( {code: 0, message: 'success' } );
     }).catch( error => {
         logger().child({ error: error }).error( 'write: ' + 'textbundle image faild ' + url );
-        console.log( "current image download failed, ", path, url );
-        callback( error.code );
+        console.log( `current image download failed, error: ${JSON.stringify(error)}, path: ${path} url: ${url}`);
+        callback( {code: -1, error } );
     })
 }
 

@Kenshin
Copy link
Owner

Kenshin commented Jan 14, 2025

哈哈,谢谢,很赞👍🌹

等我回去 Merge 下。

话说,同步助手是闭源的🌚虽然基本上就是个君子协议。

@ttimasdf
Copy link
Author

哈哈,是呀,信任是个很珍贵的东西。

我把对应的插件( BvjrJA6eh5 )也修改了一下

  1. 请求助手 /notextbundle 时加入 url 参数
  2. 现在 CORS 或其他原因导致 fetch 失败,不会崩掉整个导出过程了。
diff --git a/script.js b/script.js
index 9cea22c..8e65ea1 100644
--- a/script.js
+++ b/script.js
@@ -85,18 +85,25 @@ function lock() {
 
 async function toBase64( image ) {
     return new Promise( async ( resolve, reject ) => {
-        const response  = await fetch( image ),
-              blob      = await response.blob(),
-              file      = new File([blob], 'image.jpg', { type: blob.type }),
-              reader    = new FileReader();
-        reader.onload   = event => {
-            resolve({ done: event.target.result });
-        };
-        reader.onerror  = error => {
-            console.log( 'error: ', error );
-            resolve({ fail: error });
-        };
-        reader.readAsDataURL( file );
+        fetch(image)
+            .then(
+                response => response.blob(),
+                error => { console.log('fetch error: ', error); resolve({fail: error}); }
+            )
+            .then(blob => {
+                const file = new File([blob], 'image.jpg', { type: blob.type });
+                const reader = new FileReader();
+                reader.onload = event => {
+                    resolve({ done: event.target.result });
+                };
+                reader.onerror = error => {
+                    console.log('reader error: ', error);
+                    resolve({ fail: error });
+                };
+                reader.readAsDataURL(file);
+            },
+                error => { resolve({ fail: error }); }
+            );
     });
 }
 
@@ -209,7 +216,7 @@ function assets( title, content ) {
         'url'    : 'http://localhost:7026/notextbundle',
         'method' : 'POST',
         data: {
-            title: safe( title ), content, path: plugin_storage.path, base64Images, type: plugin_storage.assets, custom: plugin_storage.custom, absolute: plugin_storage.absolute
+            url: location.href, title: safe( title ), content, path: plugin_storage.path, base64Images, type: plugin_storage.assets, custom: plugin_storage.custom, absolute: plugin_storage.absolute
         }
     };
     browser.runtime.sendMessage({ type: 'corb2', value: { settings }}, response => {

对应的后端修改,

diff --git a/src/main/server.js b/src/main/server.js
index 103863e..c2c158f 100644
--- a/src/main/server.js
+++ b/src/main/server.js
@@ -1026,8 +1026,8 @@ app.post( '/notextbundle', function( req, res ) {
     const download_headers = {
         "User-Agent": req.headers["user-agent"],
     };
-    const server_url_regex = /(?:\nurl: |转码, 原文地址 \[[^\[]*\]\()(http(s)?:\/\/[^\/]+)/ig;
-    const match = server_url_regex.exec(req.body.content);
+    const server_url_regex = /(http(s)?:\/\/[^\/]+)/ig;
+    const match = server_url_regex.exec(req.body.url);
     if ( match && match[1] ) {
         download_headers["Referer"] = match[1];
     }

理论上 Referer 字段既可以给域名,也可以给完整 URL,具体看服务器的 Referer Policy。比较通用的做法是给域名。

@Kenshin
Copy link
Owner

Kenshin commented Jan 14, 2025

okay 收到 👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants