Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature : provides .epub files preview pictures #21

Closed

Conversation

smarinier
Copy link
Contributor

No description provided.

@devnoname120
Copy link
Owner

Thanks! Is there no good EPUB library that could do the heavy lifting to get the cover from the EPUB? I would rather not have to maintain custom EPUB extraction code.

For example: https://github.com/mikespub-org/php-epub-meta
I haven't tried it but it looks like a promising library.

What do you think?

@gschenck
Copy link

I think that a large part of the code is needed anyway, even if one wants an external library to do the real work.
So it would probably not be a problem to merge the commit now and switch to an external library later if necessary.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Psalm found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

private function getContentPath() {
$xml_container = $this->extractXML('META-INF/container.xml');
if ($xml_container) {
$full_path = $xml_container->rootfiles->rootfile['full-path'][0];

Check notice

Code scanning / Psalm

PossiblyNullArrayAccess Note

Cannot access array value on possibly null variable $xml_container->rootfiles->rootfile['full-path'] of type SimpleXMLElement|null
@devnoname120
Copy link
Owner

@smarinier Could you fix the stuff flagged by the bot? Thanks

* extractThumbnail from complicated epub format
*/
private function extractThumbnail(File $file, string $path): ?IImage {
$tmpManager = \OC::$server->get(ITempManager::class);

Check failure

Code scanning / Psalm

UndefinedClass Error

Class, interface or enum named OC does not exist
Copy link
Owner

@devnoname120 devnoname120 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarinier Instance of ITempManager should be obtained through dependency injection from Nextcloud. See here for an example: https://github.com/nextcloud/assistant/pull/71/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. There is also an example in a preview, where i took it i think : lib/private/Preview/Office.php
So this is 50% chance.;) (I think it's the same at least). Not sure the problem doesn't come from Psalm.

Could you confirm here you want this ? As this is less efficient. Indeed, the dependency injection is done by server get class only when the preview is built (then the preview image is stored the cache). If the injection is in constructor, it is called on all files twice (to check the mime type)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. There is also an example in a preview, where i took it i think : lib/private/Preview/Office.php

Old code.

As this is less efficient. Indeed, the dependency injection is done by server get class only when the preview is built (then the preview image is stored the cache). If the injection is in constructor, it is called on all files twice (to check the mime type)

Unless there is a bug the performance impact should be negligible. The file access takes many orders of magnitude more time than the near-instant dependency injection. The \OC::$server->get() notation is being phased out so it's not future proof.

$xml_container = $this->extractXML('META-INF/container.xml');
if (is_object($xml_container)) {
$full_path = $xml_container->rootfiles->rootfile['full-path'][0];
if ($full_path) {

Check notice

Code scanning / Psalm

RiskyTruthyFalsyComparison Note

Operand of type SimpleXMLElement|null contains type SimpleXMLElement, which can be falsy and truthy. This can cause possibly unexpected behavior. Use strict comparison instead.
* @return \DOMDocument|null
*/
protected function extractHTML(string $path): \DOMDocument|null {
$html = $this->extractFileData($path);

Check notice

Code scanning / Psalm

ArgumentTypeCoercion Note

Argument 1 of OCA\Epubviewer\Preview\EPubPreview::extractFileData expects 'META-INF/container.xml', but parent type string provided
if (is_string($html)) {
$dom = new \DOMDocument('1.0', 'utf-8');
$dom->strictErrorChecking = false;
if (@$dom->loadHTML($html)) {

Check notice

Code scanning / Psalm

ArgumentTypeCoercion Note

Argument 1 of DOMDocument::loadHTML expects non-empty-string, but parent type string provided
* @return false|null|string
*/
private function extractFileData(string $path): string|false|null {
$fp = $this->zip->getStream($path, 'r');

Check failure

Code scanning / Psalm

UndefinedClass Error

Class, interface or enum named OC\Archive\ZIP does not exist
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarinier Does OC\Archive\ZIP exist at all if the app files_zip isn't installed?

Copy link
Owner

@devnoname120 devnoname120 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I guess it's a private API: https://github.com/nextcloud/server/blob/952271929d888c8333f5b64aa676f802a8b682af/lib/private/Archive/ZIP.php#L13

How much effort would it require to change the code to use PHP's ZipArchive directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a pity, as the ZIP class embeds ZipArchive properly, and logs errors in NextCloud. Its probably a definition that is necessary to do with Psalm.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it's a private file that should not be used in apps. This means that if we use this implementation it will unexpectedly break in an update without any warnings. I don't want to maintain code that relies on non-standard internal APIs.

use OCP\ITempManager;

class EPubPreview implements IProviderV2 {
private ?ZIP $zip = null;

Check failure

Code scanning / Psalm

UndefinedClass Error

Class, interface or enum named OC\Archive\ZIP does not exist
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@smarinier
Copy link
Contributor Author

I did this implementation

@smarinier Could you fix the stuff flagged by the bot? Thanks

Hi Paul, i've done some stuff, but it seems to point errors that are not linked with my commits, and other i'm not sure i can fix them (eg "don't know the class OCA\Files\Event\LoadAdditionalScriptsEvent ??)

@devnoname120
Copy link
Owner

devnoname120 commented Jun 25, 2024

eg "don't know the class OCA\Files\Event\LoadAdditionalScriptsEvent ??

You can ignore this kind of stuff that is unrelated to your code. I'm not 100% sure why it doesn't get the definitions.

Edit: lol so apparently the right way is to ignore them?
Edit 2: the Memories app solves it by cloning the whole nextcloud backend and including it in the psalm analysis scope
Edit 3: the app integration_mattermost uses a stubs files instead. The app spreed uses even more stubs, the app photos has a giant stubs file as well.

Final edit: yeah so ignore those for now when it's not in a file you touched or when it doesn't make sense. I will implement a trick like the Memories app later after this PR.

Copy link
Owner

@devnoname120 devnoname120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are almost there. Can you fix the remaining issues flagged by the bot? Only those related to your changes ofc.

One last question: what happens if the function that generates a thumbnail has an uncaught exception? Will Nextcloud just ignore generating a preview for that particular file? Will it crash the whole backend of the EPUB viewer app? Anything else? I'm really new to this whole preview provider thing so I'm cautious.

The next steps for me will be to do a quick security audit and some defensive exception handling to make sure that in the worst case it doesn't break anything outside of the previews.

@smarinier
Copy link
Contributor Author

I think we are almost there. Can you fix the remaining issues flagged by the bot? Only those related to your changes ofc.

One last question: what happens if the function that generates a thumbnail has an uncaught exception? Will Nextcloud just ignore generating a preview for that particular file? Will it crash the whole backend of the EPUB viewer app? Anything else? I'm really new to this whole preview provider thing so I'm cautious.

The next steps for me will be to do a quick security audit and some defensive exception handling to make sure that in the worst case it doesn't break anything outside of the previews.

You know, i've been using this for almost 8 years (and thousands of epub). I think rewriting with ZipArchive is more risky than finding how to make Psalm know the presence of the ZIP class ;).
Concerning the preview, this is an individual call. NextCloud knows "This file has a preview, so let's generate a rest /preview call". If something is wrong, you won't have any preview for the file. That's all, i guess.

@devnoname120
Copy link
Owner

I think rewriting with ZipArchive is more risky than finding how to make Psalm know the presence of the ZIP class ;).

The problem is not to figure out how to make Psalm ignore it (it's trivial). The problem is that this ZIP class is private and should not be used by external apps. I don't want to use private APIs unless there is a very strong reason to do that, because they will unexpectedly break in an update.

@devnoname120
Copy link
Owner

@smarinier OK I fixed the psalm now the remaining issues are specific to your PR.

@devnoname120
Copy link
Owner

@smarinier Is this PR abandoned? If yes then I'll close it.

@smarinier
Copy link
Contributor Author

Hi, I don't have the time at the moment to satisfy all your requirements. If that's a problem for you, you can close it.

Yours,

@devnoname120
Copy link
Owner

Thanks for the heads-up. In that case I will close the PR for the time being. Feel free to open a new pull request when you get the chance to look into it again

@devnoname120
Copy link
Owner

For my future reference, these look interesting:

https://github.com/Yetangitu/owncloud-apps/blob/master/files_opds/lib/epub.php
https://github.com/Yetangitu/owncloud-apps/blob/master/files_opds/lib/epubpreview.php

Looks like they are (at least partly) based on https://github.com/mikespub-org/php-epub-meta so we should use that one directly instead (for the parts that were copy-pasted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants