RaidTec FlexArray RAID configuration

  • Thread starter Thread starter Marco De Vitis
  • Start date Start date
M

Marco De Vitis

Hi,
I need to understand what's wrong in a SCSI storage enclosure like this:
http://www.dmi-inc.com/pdf/raidtec/flexarray_hi12.pdf

It was mounted on a Linux system, and suddenly appeared to have bad
filesystem errors.
I couldn't work around the problem in any way through software.
I'd like to check its RAID configuration and see if just some HDs are
damaged, but I can't find a way to do it!

Its LCD panel, its serial port and a related Windows application (the
latter two according to the manual, as I didn't have a way to check
myself) are only used for a few configuration options like alarms etc.
By the way, the only error shown in the LCD panel is about a fan
receiving low current.

How on earth is that thing actually configured?? I mean RAID levels, etc...

Any clues? Thanks.
 
Marco De Vitis said:
Hi,
I need to understand what's wrong in a SCSI storage enclosure like this:
http://www.dmi-inc.com/pdf/raidtec/flexarray_hi12.pdf
It was mounted on a Linux system, and suddenly appeared to have bad
filesystem errors.
I couldn't work around the problem in any way through software.
I'd like to check its RAID configuration and see if just some HDs are
damaged, but I can't find a way to do it!
Its LCD panel, its serial port and a related Windows application (the
latter two according to the manual, as I didn't have a way to check
myself) are only used for a few configuration options like alarms etc.
By the way, the only error shown in the LCD panel is about a fan
receiving low current.
How on earth is that thing actually configured?? I mean RAID levels, etc...
Any clues? Thanks.

First, if the device itself and its monitoring software does not
display any problem, then this may actually not be a problem with
the storage. Can you post the exac first error messages from the
Linux system log here?

As to the hardware, I think this may be SCSI-Raid controller on the
drive side with an SCSI host interface showing the RAID array as a
single drive. If so, you only chance may be the Windows application,
and that immediately shows the stupidity of such a setup.

If you want to test the individual disks, you habe a differenc
option: Stop this thing and test the disks one by one in a
different computer. This is SCSI, so a bit different procedure
to the usual SMART stats query is needed. What you can do is
read the complete disks (dd_rescue <disk> /dev/null, e.g.)
and then query the SCSI device error pages (scsictl, if I
remember correctly).

Incidentially, this shows (again) the superiority of software
RAID, were you do not have to jumt through hoops like inadequate
proprietary management software.


Arno
 
Il 20-09-2009 15:14, Arno ha scritto:
the storage. Can you post the exac first error messages from the
Linux system log here?

Hard to remember, also because it's not mine.
But it was first noticed that some files were unreadable, and the error
was something like "short read".
Then, when trying to reinitialize the thing (mk2fs etc.), things got
even worse, and it seemed to "lock" during access attempts, starting to
emit a sort "beep - beep" sound alert which, of course, the manual does
not mention at all :-P.
If you want to test the individual disks, you habe a differenc

This can be a bit hard in this specific case, I'd prefer to find out how
to manage the thing and let it make its checks...
 
Marco De Vitis said:
Il 20-09-2009 15:14, Arno ha scritto:
Hard to remember, also because it's not mine.
But it was first noticed that some files were unreadable, and the error
was something like "short read".

THis basically only means the device stopped to supply data. It
can be anything, and does not need to be a disk problem.
Then, when trying to reinitialize the thing (mk2fs etc.), things got
even worse, and it seemed to "lock" during access attempts, starting to
emit a sort "beep - beep" sound alert which, of course, the manual does
not mention at all :-P.

Ah, but this is not a data-recovery operation then? Good.
This can be a bit hard in this specific case, I'd prefer to find out how
to manage the thing and let it make its checks...

Understandable. Regard the manual drive check as a fallback option
then or as the "we could do this, but it is not really cost
effective" option.

Arno
 
Il 21-09-2009 0:05, Arno ha scritto:
Ah, but this is not a data-recovery operation then? Good.

Exactly.
We (they) have a backup of the data. We just need to know if the array
is still usable.
 
Marco De Vitis said:
Its LCD panel, its serial port and a related Windows application
RAIDman

(the
latter two according to the manual, as I didn't have a way to check
myself) are only used for a few configuration options like alarms etc.

RAIDman lets you setup new arrays, view array status, and diagnose and
recover failed arrays.
By the way, the only error shown in the LCD panel is about a fan
receiving low current.

How on earth is that thing actually configured??

RAIDman would be best. Borrow a Windows laptop or something. A Google
search should find it; it's also known as UltraRAIDman. Be cautious
when using it; it's easy to blow away all the data without warning with
a single click if you don't fully understand what you are doing.

Or you can use the serial port on the back of the array and talk to it
via minicom on Linux or Hyperterminal on Windows.

Those Raidtec units are now rather old and have not been supported for
years. We have one in service at a remote site; the LCD panel failed
ages ago so any interaction has to be via the serial interface or
RAIDman.

Make sure all the fans on the back are working. The strip of green LEDs
along the top front edge indicate status - the first eight LEDs indicate
fan status (out if the fan has failed), and the last one indicates power
supply status - if flashing, one of the hot-swap power supplies has
failed.

The rear fans are Papst model 612 NGHH. They are straightforward to
swap out.
 
Corrections (dug through some old documents):

* there are six rear fans, not eight, with corresponding front panel
LEDs. The seventh LED indicates PSU status.

* the RS232 serial port for talking to the RAID from a PC is SER2. You
should find that SER1 is blanked off. SER3 and 4, if fitted, are RS485
and used for attaching additional expansion enclosures.
 
Marco De Vitis said:
Then, when trying to reinitialize the thing (mk2fs etc.), things got
even worse, and it seemed to "lock" during access attempts, starting to
emit a sort "beep - beep" sound alert which, of course, the manual does
not mention at all :-P.

It means there is a fault: either the enclosure is too hot, or a fan,
drive or PSU has failed. To find out which, look at the LEDs, the LCD
panel or talk to the unit with RAIDman or via the serial port.

If it's a drive failure, a red LED should illuminate on the front of the
appropriate drive carrier (RAIDtec refer to those as "shuttles".) Each
drive has three LEDs; the green LED is power, yellow is disk activity,
and red is failed.

The alarm can be silenced using one of the buttons on the LCD panel,
from memory there is a "book" icon which changes to a ! on an alarm.
Pressing the button below that should silence the alarm and display the
error log.
 
Il 23-09-2009 12:35, Fred Bloggs ha scritto:
It means there is a fault: either the enclosure is too hot, or a fan,
drive or PSU has failed. To find out which, look at the LEDs, the LCD
panel or talk to the unit with RAIDman or via the serial port.

Thanks Fred, but this is not the case.
As I wrote, I have the manual (thanks Google, actually), and all you are
telling is clearly described there... it just does not apply to my
situation.

All drives are OK, according to their LEDs.
The LCD panel only repeatedly shows a warning about a fan, telling that
its current is low. And the corresponding fan led flashes. And it emits
a continuous long "beeeeeeeeeeeeeeeeeeeep" when this error appears
(unless I manually disable the beeper alarm).
Actually, I visually checked all fans and they all work. Maybe this is a
false alarm, or maybe a fan is really slower than normal, but anyway
this is not a problem for heat: 4 disk bays are empty, and the unit is
in a very cold (and noisy) server room. This same error might have been
already there for months without anyone ever noticing.

The sound I described above when the unit locks, instead, is different:
it's an intermittent "beep - beep... beep - beep... beep - beep...", and
it CANNOT be silenced from the LCD panel menu. And the LCD or LEDs do
not show any error.
 
Il 23-09-2009 11:55, Fred Bloggs ha scritto:
RAIDman lets you setup new arrays, view array status, and diagnose and
recover failed arrays.

Are you *really* sure that RAIDman can do this?
The manual doesn't tell anything about it, it only talks about the same
simple operations which can also be done using the LCD display.
RAIDman would be best. Borrow a Windows laptop or something. A Google
search should find it; it's also known as UltraRAIDman. Be cautious

I searched around a bit and can't find it yet... I'd be grateful if you
have it handy and can send it to me somehow (www.yousendit.com or else).
Or you can use the serial port on the back of the array and talk to it
via minicom on Linux or Hyperterminal on Windows.

Again, are you sure RAID management operations can be done this way? IT
would be enough for me, but the manual only talks about the same basic
operations so I didn't even try.

Thanks.
 
Marco De Vitis said:
Actually, I visually checked all fans and they all work. Maybe this is a
false alarm, or maybe a fan is really slower than normal, but anyway
this is not a problem for heat: 4 disk bays are empty, and the unit is
in a very cold (and noisy) server room. This same error might have been
already there for months without anyone ever noticing.

Well, that's fine. If you can ignore the alarm and are sure the array
is cool enough, then don't worry about it.
The sound I described above when the unit locks, instead, is different:
it's an intermittent "beep - beep... beep - beep... beep - beep...", and
it CANNOT be silenced from the LCD panel menu.

Without hearing it myself, I'd guess at a dead drive (this sort of noise
coming from a drive is it losing its servo and trying to reacquire it.
what you're hearing is the head skipping over the platter surface at
high speed.)

Be warned if all the drives are the same make and model and were fitted
at the same time, the others may not be far behind.

Use the mark 1 earhole to locate the dying drive and replace it.
 
Marco De Vitis said:
Il 23-09-2009 11:55, Fred Bloggs ha scritto:


Are you *really* sure that RAIDman can do this?

Yes. Like I said, we actually have one of those arrays and I am the one
responsible for installing and maintaining it. You may have to put
RAIDman into configuration or maintenance mode; I can't remember. if
you don't do that, it'll only allow you to view the config and not
change it.
The manual doesn't tell anything about it, it only talks about the same
simple operations which can also be done using the LCD display.

you've got the wrong manual then.
Again, are you sure RAID management operations can be done this way?

Absolutely 100% certain. Why don't you try it for yourself?
 
Il 24-09-2009 13:30, Fred Bloggs ha scritto:
you've got the wrong manual then.

It's the only one I could find on the web... this one:
http://snipr.com/s3ok1 [ftp_veracomp_pl]
Absolutely 100% certain. Why don't you try it for yourself?

I'll do it now that I have your confirmations, thanks ;).
I'm not lazy, it's just that arranging for this test in this particular
case requires some time (for reasons I'll not detail here) and I'm
currently on a very tight schedule for other jobs.
 
Il 24-09-2009 13:28, Fred Bloggs ha scritto:
Without hearing it myself, I'd guess at a dead drive (this sort of noise
coming from a drive is it losing its servo and trying to reacquire it.
what you're hearing is the head skipping over the platter surface at
high speed.)

Uhm... seems strange to me, it really sounded like an electronically
generated sound... but who knows.
 
Back
Top