Skip to content

lib/rest: process HTML entities within XML#5848

Merged
ncw merged 1 commit into
rclone:masterfrom
CodingKoopa:master
Dec 1, 2021
Merged

lib/rest: process HTML entities within XML#5848
ncw merged 1 commit into
rclone:masterfrom
CodingKoopa:master

Conversation

@CodingKoopa

@CodingKoopa CodingKoopa commented Dec 1, 2021

Copy link
Copy Markdown
Contributor

What is the purpose of this change?

The MEGA SDK - and, by extension, MEGAcmd - escapes many HTML entities in the XML it serves as the WebDAV server. This is not conformant XML, and I have already submitted a PR to fix this. That having been said, I am submitting a fix for rclone too for a couple of reasons:

  • MEGAcmd development seems to be rather slow, and it could be some time until it's updated with the latest SDK - that is, if a fix is merged into the SDK. Moreover, the mega backend currently has an issue (Mega.nz Session Cache #4695) that makes it difficult to use, making this WebDAV functionality useful.
  • The MEGA SDK might not be the only WebDAV application exhibiting this behavior. In these cases, it should be preferable for rclone to process the entities rather than throwing an error.

For reference, this is what currently happens when using a problematic filename with MEGAcmd's WebDAV server, and rclone:

2020/11/16 12:50:00 ERROR : IO error: couldn't list files: XML syntax error on line 253: invalid character entity &add;

Was the change discussed in an issue or in the forum before?

Most directly, this issue is reported here and in #4885. The "rather strict" XML parser is also discussed in this comment.

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

Thanks!

MEGAcmd currently includes escaped HTML4 entites in its XML messages.
This behavior deviates from the XML standard, but currently it prevents
rclone from being able to use the remote.
@ncw

ncw commented Dec 1, 2021

Copy link
Copy Markdown
Member

What do you think the backwards compatibility problems with this patch might be given that the jottacloud and sugarsync backends use the XML parser in lib/rest as well as the webdav backend. Does this need to be an option?

@CodingKoopa

CodingKoopa commented Dec 1, 2021

Copy link
Copy Markdown
Contributor Author

Great question! I did notice that those backends use this function too, and indeed was concerned about breaking them. I didn't address it initially because I deemed the act of processing HTML entities to be a "non-destructive" action, particularly as the alternative is the parser throwing an error. If you think it would be wiser to restrict this to webdav, I could make it an option. After all, there's no reason why those two services should start acting out of specification, compared to the larger amount of WebDAV software.

@ncw

ncw commented Dec 1, 2021

Copy link
Copy Markdown
Member

Here are the docs: https://pkg.go.dev/encoding/xml#Decoder

	// Strict defaults to true, enforcing the requirements
	// of the XML specification.
	// If set to false, the parser allows input containing common
	// mistakes:
	//	* If an element is missing an end tag, the parser invents
	//	  end tags as necessary to keep the return values from Token
	//	  properly balanced.
	//	* In attribute values and character data, unknown or malformed
	//	  character entities (sequences beginning with &) are left alone.

The first fix seems OK and unlikely to cause a problem. The second fix - means that the parse would have thrown an errror rather than leave the undecodable entities according to the source code, so that is OK too.

The second part of your patch is decoder.Entity = xml.HTMLEntity and after a look at the source code of the XML library, I don't think this will cause a problem either since those entities would have thrown an error before.

So I think, on further reflection, this patch is safe and won't break anything.

We'll see soon enough in the integration tests and if it does cause a problem we'll need to make it conditional.

So I'll merge this now - thank you :-)

@ncw ncw merged commit 0681a5c into rclone:master Dec 1, 2021
@ncw

ncw commented Dec 1, 2021

Copy link
Copy Markdown
Member

PS since you are interested in using mega with webdav, would you like to write some docs in the webdav backend for it? Or maybe we could make a webdav provider for it? Possibly rclone could even start/stop the mega process.

@CodingKoopa

Copy link
Copy Markdown
Contributor Author

Sure, I can think about what I could do ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants