UTF8 Encoding and MP3 tags

  • Thread starter Thread starter gene kelley
  • Start date Start date
G

gene kelley

I have an app that has a class for reading mp3 tags.
According to id3.org, ID3V2.3 tags are always UTF8 encoded (unless flagged
otherwise).

In a routine, I have this line which returns an array of characters:
System.Text.Encoding.UTF8.GetChars(tBytes)

The returned characters are correct 99% of the time. The 1% problem cases are those
tag strings that contain characters > Chr(128).
For example:
André Rieu is an expected result. The actual result is Andr Rieu.

Can anyone tell me if I'm missing something here, or, is this just another case of
non-compliant ID3V2.3 tags (not unusual)?

Gene
 
All text frames have the encoding byte as the first byte after the frame
header for text frames. A $00 indicates ISO8859 (UTF-8) and a $01 indicates
Unicode. If it's Unicode, you have to check for the BOM as well. I have
written my own tag reader/writer and I've not had any problems reading any
mp3 files. I do, however, check for invalid characters and eliminate them as
some tags are written by poorly written tag writers. Even the best tag
writer's I've found don't fully implement the ID3V2.3 specs with regard to
encoding.
 
All text frames have the encoding byte as the first byte after the frame
header for text frames. A $00 indicates ISO8859 (UTF-8) and a $01 indicates
Unicode. If it's Unicode, you have to check for the BOM as well. I have
written my own tag reader/writer and I've not had any problems reading any
mp3 files. I do, however, check for invalid characters and eliminate them as
some tags are written by poorly written tag writers. Even the best tag
writer's I've found don't fully implement the ID3V2.3 specs with regard to
encoding.


OK, I reworked the routine and seems to work as expected now. I have randomly tried
about 200 mp3 albums from what I have on hand and had no issues show up with any of
those tested.

Do you happen to know how the name of the mp3 file's CODEC is retrived?

Gene
 
Not sure what you mean by CODEC name.

I guess it's also referred to as "Encoder" in some apps.

Typical expected result would be something like "Lame 3.98" or "FhG".

Gene
 
Are you using a self-written class? I've been using UltraId3 which claims
to be 100% standards compliant.
 
Are you using a self-written class? I've been using UltraId3 which claims
to be 100% standards compliant.

I've looked at UltraId3. My app is only concerned with reading tags. While it's
valid to write 100% compliant tags, I disagree with UltraID3's design to only read
compliant tags. That's just not the "real world".

Gene
 
I have found it very difficult to pinpoint the exact location of
information within an MP3 ID3v2x encoded file, because even though the
standard (according to http://www.id3.org/easy.html) for tags is to be
before the audio data I have encountered the tags being anywhere inside
the file. I suppose you could use a hex viewer on several files, see
where the encoder shows up and then write a routine to search the
incoming MP3s for a common value between the files to locate your
encoder.
 
ID3v2.4.0 allows tags to be prepended or appended. In the appended case, you
have to search from the back of the file forward for the start header. Also,
I believe is legit to embed part of the tag in the music file somewhere as
specified in the Seek Frame. It should be rare occasion that someone
embedds part of a tag within the music data...I've never run across it.
 
The ID3 tags have no frames that I know of for embedding any CODEC
information. The MP3 music header should provide sufficient information to
determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th
bit of the second byte of the header (ID bit).
 
The ID3 tags have no frames that I know of for embedding any CODEC
information. The MP3 music header should provide sufficient information to
determine what encoder is needed, i.e., MPEG1 or MPEG2 as encoded in the 4th
bit of the second byte of the header (ID bit).

Frame TSSE was apparently designed for that purpose, but it is very rarely used.

The encoder info, though, would be more a function of the encoding software as to
where, in the mp3 file, it writes it's signature. "Lame3.xx" is plainly visible in
several location of an mp3 file when viewed in a hex reader, but the others are not.

If you are familiar with a small utility called EncSpot, it's primary use is mainly
to display the encoder name used in given mp3 files. So, I assume that if one knows
where to look and what to look for in the file, the encoder info can be found, but I
have yet to find any leads.

Thanks,

Gene
 
Back
Top