Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepending inferred metadata to posts, pages #172

Open
simon-brooke opened this issue Sep 5, 2024 · 3 comments
Open

Prepending inferred metadata to posts, pages #172

simon-brooke opened this issue Sep 5, 2024 · 3 comments

Comments

@simon-brooke
Copy link
Contributor

The current inferred metadata system does not add the inferred metadata to posts and pages, but rather leaves them unaltered, and reinfers metadata at each build.

This has two downsides:

  1. It has a time cost; but, more significantly
  2. It assigns authorship of each page without embedded metadata to whoever is running the build at the time.

I currently have a version where, is the flag :write-back-inferred-meta? is present (and true) in config.edn

  1. Makes a backup of the post or page file;
  2. infers the metadata, and then prepends that to the content in the post or page file.

Diff as follows:

diff --git a/src/cryogen_core/infer_meta.clj b/src/cryogen_core/infer_meta.clj
index 84f3d3d..5061ae3 100644
--- a/src/cryogen_core/infer_meta.clj
+++ b/src/cryogen_core/infer_meta.clj
@@ -11,8 +11,10 @@
                                        trimmed-html-snippet]]
             [mikera.image.core :refer [height load-image width]]
             [pantomime.mime :refer [mime-type-of]]
-            [cc.journeyman.real-name.core :refer [get-real-name]])
+            [cc.journeyman.real-name.core :refer [get-real-name]]
+            [clojure.pprint :refer [pprint]])
   (:import [java.util Date Locale]
+           [java.io File]
            [java.nio.file Files FileSystems LinkOption]
            [java.nio.file.attribute FileOwnerAttributeView]))
 
@@ -136,10 +138,10 @@
    been used to extract meta-data removed."
   ([dom] (clean dom dom))
   ([elt dom]
-  (cond
-    (map? elt) (when-not (redundant? elt dom) (assoc elt :content (clean (:content elt) dom)))
-    (coll? elt) (remove nil? (map #(clean % dom) elt))
-    :else elt)))
+   (cond
+     (map? elt) (when-not (redundant? elt dom) (assoc elt :content (clean (:content elt) dom)))
+     (coll? elt) (remove nil? (map #(clean % dom) elt))
+     :else elt)))
 
 (def infer-title
   "Infer the title of this page, ideally by extracting the first `H1` element from this
@@ -205,24 +207,47 @@
                 tag-line?
                 (walk-dom dom))
         tags (when tags-p (join ", " (reduce concat (map #(rest (:content %)) tags-p))))]
-    (when tags (doall (set (map trim (split tags #",")))))))
+    (when tags (doall (apply vector (set (map trim (split tags #","))))))))
 
 (defn infer-meta
   "Infer metadata related to this `page`, assumed to be the name of a file in 
    this `markup`, given this `config`."
   [^java.io.File page config dom]
-    (let [metadata (assoc {}
-                          :author (infer-author page config)
-                          :date (infer-date page config)
-                          :description (infer-description page config dom)
-                          :image (infer-image-data dom config)
-                          :inferred-meta true
-                          :tags (infer-tags dom)
-                          :title (infer-title page config dom))]
-      (info (format "Inferred metadata for document %s dated %s."
-                    (:title metadata)
-                    (:date metadata)))
-      metadata))
+  (let [metadata (assoc {}
+                        :author (infer-author page config)
+                        :date (infer-date page config)
+                        :description (infer-description page config dom)
+                        :image (infer-image-data dom config)
+                        :inferred-meta true
+                        :tags (infer-tags dom)
+                        :title (infer-title page config dom))]
+    (info (format "Inferred metadata for document %s dated %s."
+                  (:title metadata)
+                  (:date metadata)))
+    metadata))
+
+(defn file-extension 
+  "Return the extension, if any, of this `file-name`."
+  [file-name]
+  (second (re-find #"(\.[a-zA-Z0-9]+)$" file-name)))
+
+(defn- create-backup
+  [^File file]
+  (let [path (.getPath file)
+        backup-path (File. (str (replace path (file-extension path) ".bak")))]
+    (info (format "Backing up %s to %s" (.getPath file) (.getPath backup-path)))
+    (spit backup-path (slurp file))))
+
+(defn- write-back-inferred-meta
+  "Backup the file indicated by `page` to a new file with the same name but the
+   extension `.bak`, and replace it with a file having the same content but 
+   with this `meta-data` prefixed."
+  [^File page meta-data config]
+  (let [content (slurp page)
+        pretty-meta (with-out-str (pprint meta-data))]
+    (warn (format "%s: Prepending meta-data:\n %s" (.getName page) pretty-meta)) 
+    (create-backup page)
+    (spit page (str pretty-meta "\n\n" content))))
 
 (defn using-inferred-metadata
   "An implementation of the guts of `cryogen-core.compiler.page-content` for
@@ -235,6 +260,7 @@
     (let [content-dom (trimmed-html-snippet ((render-fn markup) rdr config))
           page-meta (infer-meta page config content-dom)
           file-name (infer-file-name page page-meta config)]
+      (when (:write-back-inferred-meta? config) (write-back-inferred-meta page page-meta config))
       {:file-name   file-name
        :page-meta   page-meta
        :content-dom (clean content-dom)})))

Would you accept this as a pull request?

@yogthos
Copy link
Member

yogthos commented Sep 5, 2024

That looks good to me, and being an opt in flag, it doesn't change existing behavior by default, so I don't see any downside.

As a side note, do you want me to add you as a maintainer for the project? :)

@simon-brooke
Copy link
Contributor Author

Uhhmmm.. my mental health really is very bad. It's probably better to have a second pair of eyes between me and anything which other people might use.

@yogthos
Copy link
Member

yogthos commented Sep 8, 2024

No worries, definitely happy to help review stuff here.

yogthos added a commit that referenced this issue Sep 8, 2024
Re #172: write back inferred metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants