A macro to export a sheet as UTF8 encoded Unicode file

  • Thread starter Thread starter Dario de Judicibus
  • Start date Start date
D

Dario de Judicibus

I need support to create a macro. I have very little skill on VB for Excel.

I have a two columns sheet. I need to export it to a file (selected by user
by save as dialog) with the following format:

0xFFFE
UTF-8(cell(1,1).Value) 0x2028 UTF-8(cell(1.2).Value) 0x2029
UTF-8(cell(2,1).Value) 0x2028 UTF-8(cell(2.2).Value) 0x2029
....
UTF-8(cell(last,1).Value) 0x2028 UTF-8(cell(last.2).Value) 0x2029
0xFFFF

Any help appreciated.

Dario de Judicibus
 
Wei-Dong Xu

Well, it looks like it is a way to do what I need. I do not understand why I
have to set security to low, anyway, since I need to start the macro
manually. The only problem is the dependence on a shareware component. The
project I am doing is a freeware, and all components and tools I am used a
pure freeware (no adware, no shareware). It's the new version of an existing
freeware application widely used in Internet. We are a team of several
people all around the world, and I cannot set a dependency on commercial
software for our code. Any other way to convert to UTF8? I am not an expert
of VBA. I can develop a translator from Unicode to UTF8 in C++, but I do not
know how to use it in Excel. By the way, that ActiveX is doing much more
than what I need. Any alternative? Thank you in advance.

Dario de Judicibus
 
MyFile.writeLine "0xFFFE"
MyFile.write " 0x2028 "
MyFile.write " 0x2029" & Chr(13) & Chr(10)
MyFile.write "0xFFFE"

PS: by 0xFFFE I intend two bytes whose hex value is FFFE. No Chr(13) and
Chr(10) is requested. 0x2028 and 0x2029 are the line and paragraph separator
of Unicode.

DdJ
 
I also think that Unicode to UTF8 conversion might be done directly in VBA.
Algorith should not be complicated. Rules are:

ISO 10646 range covered UTF-8 representation
----------------------- --------------------
Bits Hex Min Hex Max Byte Sequence in Binary
7 00000000 0000007f 0vvvvvvv
11 00000080 000007FF 110vvvvv 10vvvvvv
16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv


That is, any character from 0000 [0000 0000 0000 0000] to 007F [0000 0000
0111 11111] becomes
0bbb bbbb where b is 0 or 1, any character from 0080 [0000 0000 0000 0000]
to 07FFF [0000 0111 1111 1111] becomes 110bbbbb 10bbbbbb, and so forth.

DdJ
 
Hi Dario,

Thank you for replying!

From my understanding to your scenario, I'd suggest you build one ActiveX control with C/C++ for the UTF-8 encoding issue. Since the Excel and
VBA focus a lot on unicode, you'd better use C/C++ codes to produce the result txt file for UTF-8 encoding.

Furthermore, if you will use .Net to develop the office solution, the Text namespace of .Net framework will help you a lot on this issue. The
System.Text namespace provides a very strong support for encoding including UTF-8 and Unicode etc.

Please feel free to let me know if you have any further questions.

Does this answer your question? Thank you for using Microsoft NewsGroup!

Wei-Dong Xu
Microsoft Product Support Services
Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Please feel free to let me know if you have any further questions.
Does this answer your question? Thank you for using Microsoft NewsGroup!

Thanks to you. Your suggestions are formally correct. However I have the
impression that they are big guns to kill a small mosquito. Encoding from
Unicode 16 bit to UTF8 is just a matter of simple byte encoding. Easy to do
in C++ by using bitwise and shift operators. I expect that it is simple in
Excel macro language too. Just few lines of code. I do not think that using
ActiveX, dotNET and other big environments it is worth when I simple need to
develop a pipe. What I need to do is

write hex fffe
for each row
read cell of first column
encode it by UTF8
write encoded string
write hex 2028
read cell of second column
encode it by UTF8
write encoded string
write hex 2029
end
exit

utf8 encode
for each 16 bit chunk
if chunk16 >= 0 & chunk16 < 80
newchunk8 = 0vvvvvvv
if chunk16 >80 & chunk16 < 7ff
newchunk16 = 110vvvvv 10vvvvvv
if chunk16 > 800 & chunk16 <= ffff
newchunk24 = 1110vvvv 10vvvvvv 10vvvvvv
end
return concat of all new chunks

I suspect it is not hard by VBA. Thank you anyway.

DdJ
 
Hi Dario,

Thank you for replying!

From my understanding to your issue now, the most tough thing is the bitwise operration. I introduce the detail for you.

For hex, you can use hex() to change one decimal value to hex value, for example Hex function will return a string representing the hexadecimal
value of a number.

The same function of & and | from c/c++ in VBA is "and" and "or". You can write one code like:
msgbox (4 and 2)
The result is: 0.

For your issue, you need to bitwise the byte of the value from Excel cell. You can use the code below to read the unicode value of Excel cell to
byte array.

' Code begin

Dim oByteArray() as byte
oByteArray = Application.ActiveSheet.Cells(rowNumber, colNumber).value

' Code end

Then you can check the result byte array in the watch window of VBA. With this byte array, you can compare the value and transfer the them
into UTF-8. After that, FileSystemObject will help you to save the result to one text file.

Please feel free to let me know if you have any further questions.

Does this answer your question? Thank you for using Microsoft NewsGroup!

Wei-Dong Xu
Microsoft Product Support Services
Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top