Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a remote command for batch duplicate finding. #1524

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

porridge
Copy link

Here is my first stab at it. I know I at least need to adjust for the accepted coding style, but please let me know if there are any other major things that need changing.

Based on https://github.com/porridge/image-duplicate-finder

Closes: #1520


#include "similar.h"

class pic {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please at least add some minimal documentation, in doxygen format.

That should include why this class is called "pic", what the class usage pattern should be, and what the method arguments mean.

See examples:

{
FileData *fd = static_cast<FileData *>(work->data);
std::string name(fd->path);
pics[name] = std::unique_ptr<pic>(new pic(fd->path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should generally be using std::make_unique to create unique_ptrs. See the example section in
https://en.cppreference.com/w/cpp/memory/unique_ptr/make_unique

For context when referring to language docs, Geeqie is using C++14 (as can be seen in the meson.build project section)

DEBUG_1("processing %d files in set", pics.size());

// Compute similarity score for every pair, build equivalence sets.
for (auto a = pics.begin(); a != pics.end(); ++a) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use meaningful variable names for temporaries. Maybe left_iter and right_iter? left_pic and right_pic? Doesn't have to be those, but the meanings should be self-evident from the name.

if (similarity < options->duplicates_similarity_threshold)
continue;
a->second->equivalent.insert(b->second->equivalent.begin(), b->second->equivalent.end());
for (auto const &f: a->second->equivalent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaningful variable name

}

std::set<std::string> printed;
for (auto const &f: pics) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaningful variable name

continue;
std::vector<const char *> cmd;
cmd.push_back(options->duplicates_program);
for (auto const &e: f.second->equivalent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaningful variable name

pid_t pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit() is not called directly from anywhere else in this file, and geeqie does not generally use perror. The best course of action here is likely to log an error and return. See an example here:

static void gr_lw_id(const gchar *text, GIOChannel *, gpointer)

With that said, it's probably a lot safer to use an API like https://docs.gtk.org/gio/ctor.Subprocess.new.html instead of forking directly. That said, @qarkai is our resident expert on glib APIs, so I would defer to him.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geeqie uses g_spawn_async_with_pipes() for external editor. See editor_command_one() in src/editors.cc:1053.


class pic {
public:
pic(char const *cname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class violates the rule of 3/5/0. See:
https://en.cppreference.com/w/cpp/language/rule_of_three

Comment on lines +131 to +132
'pic.h',
'pic.cc',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'pic.h',
'pic.cc',
'pic.cc',
'pic.h',


#include "pic.h"

pic::pic(char const *cname): name(cname), equivalent{name}, sim(NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use nullptr instead of NULL where applicable.


#include <set>
#include <string>
#include <gdk-pixbuf/gdk-pixbuf.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <gdk-pixbuf/gdk-pixbuf.h>
#include <gdk-pixbuf/gdk-pixbuf.h>

@@ -21,9 +21,11 @@

#include "remote.h"

#include <stdio.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <stdio.h>

Already included as <cstdio>.

g_error_free(err);
return;
}
sim = image_sim_new_from_pixbuf(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving sim initialization to static method or anonymous function.

n = thresh;
if (n < 0 || n > 100)
{
printf_term(TRUE, "Image similarity threshold out of range (%d to %d)\n", 0, 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range limits are duplicated on line 701. Consider moving them to constants.
Probably would be useful to print thresh value here.


static void gr_duplicates_program(const gchar *text, GIOChannel *, gpointer)
{
g_strdup(options->duplicates_program);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
g_strdup(options->duplicates_program);
g_free(options->duplicates_program);

static void gr_process_duplicates(const gchar *, GIOChannel *, gpointer data)
{
auto remote_data = static_cast<RemoteData *>(data);
std::map<std::string, std::unique_ptr<pic>> pics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use tab for indentation.

auto remote_data = static_cast<RemoteData *>(data);
std::map<std::string, std::unique_ptr<pic>> pics;
GList *work = remote_data->file_list;
while (work)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using for loop to reduce scope of work.

std::map<std::string, std::unique_ptr<pic>> pics;
GList *work = remote_data->file_list;
while (work)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix braces indentation all over PR.

@caclark
Copy link
Collaborator

caclark commented Oct 14, 2024

In the latest commit, the command line handling has changed. The attached .diff are the changes I think are necessary to conform with the new code:

1524-1.diff.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command-line utility for duplicate image finding and processing
4 participants