WordPress.org

Make WordPress Core

Opened 9 days ago

Last modified 7 days ago

#54088 new defect (bug)

Uploading media containing Norwegian letter å does not automatically readjust it to become aa.

Reported by: paaljoachim Owned by:
Milestone: 5.9 Priority: normal
Severity: normal Version:
Component: Media Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

I did a test yesterday and noticed when uploading an image containing Norwegian letters æ ø å that the å did not convert to aa.

It looked like this:
æ -> ae (converted)
ø -> o (converted)
å -> å (did not convert)

Attachments (1)

Uploading-image-containing-Norwegian-Letters.gif (969.9 KB) - added by paaljoachim 9 days ago.
Norwegian letter å does not convert to aa

Download all attachments as: .zip

Change History (15)

@paaljoachim
9 days ago

Norwegian letter å does not convert to aa

This ticket was mentioned in Slack in #core by paaljoachim. View the logs.


9 days ago

#2 @SergeyBiryukov
9 days ago

  • Component changed from General to Media

#3 @paaljoachim
9 days ago

I focused on this topic because I am redoing tutorials on my WordPress tutorial site. This is an old tutorial I believe is likely not needed any longer: https://www.easywebdesigntutorials.com/cleaning-up-filenames-that-have-non-utf8-characters-in-them/ (I am adding it in here just in case there are aspects in the tutorial that is needed.) Thanks.

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


8 days ago

#5 @antpb
8 days ago

  • Milestone changed from Awaiting Review to 5.9

For anyone digging into this, the solution will likely be within the remove_accents() function used sanitize_file_name

https://developer.wordpress.org/reference/functions/remove_accents/

#6 follow-up: @antpb
8 days ago

I need to do some more digging but an initial glance at the logic behind converting å is only turning it into a but seemingly not even doing that from the video provided.

https://github.com/WordPress/wordpress-develop/blob/e83a341cc082864edf69257fded43d70d8a27685/src/wp-includes/formatting.php#L1254

#7 @antpb
8 days ago

  • Keywords needs-patch needs-unit-tests added

#8 in reply to: ↑ 6 @knutsp
8 days ago

Replying to antpb:

I need to do some more digging but an initial glance at the logic behind converting å is only turning it into a

Some background:
"å" does not stem from a ligature. The letter stems form Old Norse "á", a longer and darker form of the sound written as "a". Swedish has had it since the 16th century, Norwegian since 1917 and Danish since 1948. Danish still use "aa" in many geographical names (alternative, official spelling), but this is not the not the case in Norway and Sweden (only old family names).

A few years ago there was a suggestion here on Trac to transliterate "å" to "aa" in slugs, instead of just "a" (as initially in WP). There was some opposition to this in Scandinavia, at least in Norway (advocated by me). Generally, but specially in the Norwegian variant Nynorsk, the has been a stronger opposition to use "aa" of "å". This is because in some words the next letter is also an "a", giving "aaa" on words like "Tåa" and "Åa". But also because it doesn't add readability and just becoming longer. So I say, at least as my personal opinion, keep it like that. We are uses to it and don't complain. Keep special for Danish.

The main thing here and now is of course to make it work properly for filenames.

#9 follow-up: @paaljoachim
8 days ago

Hei Knut. Thank you for adding the additional information!

My name is Paal (American/English spelling would likely be Paul). Same spelling as my father. In modern Norway Paal is spelled Pål. So the alternative to using å is usually aa. If I write Norwegian with an English keyboard I would use the aa instead of å.

I agree the main thing here is making it work properly for filenames.
I would prefer a conversion of å to aa but if there is "a lot" of resistance to aa than a single a would also be totally fine.

Last edited 8 days ago by paaljoachim (previous) (diff)

#10 @johnbillion
8 days ago

There's already a test for this but only for the remove_accents() function, not that it actually applies those transformations to the name of an uploaded file. https://github.com/WordPress/wordpress-develop/blob/16b04903feec8216bdd2e6230f4ad511a9238db1/tests/phpunit/tests/formatting/removeAccents.php#L15

#11 in reply to: ↑ 9 @knutsp
8 days ago

Replying to paaljoachim:

If I write Norwegian with an English keyboard I would use the aa instead of å.

Good point, and this was mentioned back then. Also international standards on the field. However, writing is slightly different than slugs, as distinguishing between "a" and "å" might feel needed.

So, it was argued from a conservative point of view, don't change what works just fine. That a change was made just for Danish surprised me a bit, but that effectively silenced the discussion.

Small thing. If there is a need on WP for standardization across our relatively small Scandinavian languages, "aa" will be just fine by me.

I have linked to this is in the Norwegian Slack.

#12 @bjornjohansen
8 days ago

Oh, no! Please don’t transliterate å to aa, nor ø to oe. Æ is (originally) a ligature, so it’s fine to use ae. Visually, ae is close to an æ, so it’s easy to read. Texts where å is transliterated to aa (or ø to oe) is really hard to read, as it breaks the “look at the full word to recognize and read it” feature in the brain.

It also looks like it was written by Henrik Ibsen 150 years ago. As Paal mentions, in modern Norway Paal is spelled Pål.

Surnames, which are rarely changed/updated, became common in the period where eg. aa was still used. They became mandatory in 1923 when å had recently been introduced to Norwegian, and had yet not been introduced into Danish (which Norwegian was extremely much based on). Over the last 100 years, a lot of family names have been updated to use å, but this is not something that people change lightly, so it’s still common to see them there. First names using aa are rare.

In the WP context this is only done for normalizing slugs and filenames. Using the longer versions makes them … well … longer. As Knut also mentioned, having “aaa” is not exactly ideal.

I see no reason to make the slugs longer and less readable, to confirm to an old and conservative method that is irrelevant and outdated to most people. I tried to find what The Language Council of Norway (Språkrådet) has to say about it, but could not find anything.

If anything gets changed in WP regarding this, we would need a filter on the transliteration table, so people can choose what they like.

#13 follow-up: @paaljoachim
7 days ago

Hei @bjornjohansen

I do think the most common approach when not able to use Norwegian letters is to use æ = ae, ø = o and å = aa. But I do feel your passion here. Having å become a or aa in a filename does not really matter to me. The important part is actually the process being done, and that the å becomes converted in a filename.

It sounds like you really really really want to instead see å converted to a...:)
That is fine by me..:)

Last edited 7 days ago by paaljoachim (previous) (diff)

#14 in reply to: ↑ 13 @bjornjohansen
7 days ago

Replying to paaljoachim:

It sounds like you really really really want to instead see å converted to a...:)

Haha, yes. I’m a bit passionate about this. It’s personal :-)

BTW, it looks like filenames are keeping æ, ø, and å. So it’s just in the slugs where æ and ø are transliterated, while å isn’t.

Note: See TracTickets for help on using tickets.