Rendering dlya Titanika delali 105 Linux i 55 NT mashin; I 350 SGI
> Kstati, o klasterah. CHitali, na chem effekty dlya fil'ma "Titanik"
> delalis'? Na 120 DEC Alpha s ustanovlennym na nih Linux RedHat 4.2.
> A vot v drugoj stat'e beretsya interv'yu u glavnogo inzhenera kompanii po
> razlivu effektov v "Titanike", on govorit, chto polovina mashin rabotala na
> Alpha NT, i chto eti servera (v otlichie ot Linuksa) ni razu ne upali i ne
> bylo ni odnogo sboya, i chto bez NT oni by Titanik ne zapustili v plavanie.
>
http://www.ssc.com/lj/issue46/2494.html
detal'no i s fotkami.
> CHto govoryat professionaly:
Esli vkratce, to: "Glavnyj inzhener po videoeffektam - on ne
komp'yutershchik, a hudozhnik, i ne v kurse podrobnostej.
Pervonachal'no bylo zaryazheno 80 mashin pod Linux, i 80 pod NT, no
poskol'ku NT ispol'zovalis' v proekte znachitel'no men'she, 40 NT
pereveli na Linux. Neskol'ko Linux'ov padali - iz-za peregreva,
no v osnovnom iz-za togo, chto upavshie NFS servera na NT
podveshivali smontirovshih ih klientov. NT padali tozhe, no
men'she, ibo ne grelis' iz-za bolee slabogo ispol'zovaniya, a tak
zhe NT ne vedet statistiki, i fakty padeniya i perezagruzki NT ne
fiksirovalis' v logah.
Nu a vse hudozhniki sideli na Silicon Graphics stanciyah. I SGI v
proekte bylo ispol'zovano 350 shtuk.
cut here
---------------------------------------------------------------
From: nospanmichael@nopsamgreenes.com (mgSimplify Author)
Subject: Re: WINDOWS NT HELPS SINK THE TITANIC
Date: Sat, 07 Feb 1998 19:24:40 GMT
Message-ID: <6bicc6$85$1@usenet85.supernews.com>
Here we go again: I have attached Darryl Strauss' response to Boucher's original
comments below. My interest is more than cursory - I'm building a commercial
product based on Linux instead of NT and Boucher's comments gave me pause. It
wasn't till I read Strauss' response that I had a better feel for what was going
on.
Bill Parker wrote:
>This is from Windows NT Magazine and is interesting:
>* WINDOWS NT HELPS SINK THE TITANIC
>(contributed by Sarah Hogan, sarah@winntmag.com)
>You might have heard that Digital Domain used Linux to create the
>high-tech visual effects for the hit movie Titanic. What you might not
>have
>heard is that Windows NT also had a big part in the film.
>In his article "Linux Helps Bring Titanic to Life," Daryll Strauss
>writes that Digital Domain chose RedHat 4.2 on 120 Linux Alpha computers
>because "the flexibility of the existing devices and available source
>code gave Linux a definitive advantage" over NT and Digital UNIX. (You
>can
>find this article at http://www.linuxjournal.com/issue46/2494.html.)
>When NT fans queried Grant Boucher, head engineer at Digital Domain,
>about this decision, Boucher explained that, "Half of the 160 Alpha
>render farm was Windows NT 4.0...I am sorry to disappoint all the Linux
>fans out there, but in a production environment, Linux was found to be
>seriously wanting when compared [with] NT. NT was the only operating
>system during Titanic that did not crash the servers at
>all...ever...Titanic
>would not have delivered without [NT]."
Darryl Strauss wrote in response to a letter I wrote him asking him why the
disparity between his article and Grant Boucher's comments:
[I left the header from Grant's message included below so that you can
trace the original conversation. He posted this to the alpha-nt mailing
list January 6th. It seems that people are forwarding his message again
while neglected to include my followup to this discussion. I've included
that response below. This was my final posting on this thread. There was
another posting from Grant after mine, but I felt it had degenerated to
a level that further responses would not be useful. You are welcome to
look it up yourself if you are so inclined. - |Daryll]
-----Forwarded message from Daryll Strauss -----
Message-ID: <19980107185209.60060@jolt>
Date: Wed, 7 Jan 1998 18:52:09 -0800
From: Daryll Strauss
To: alphant@listserv.mke.ra.rockwell.com
Subject: Digital Domains use of Linux on Titanic
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.85
Organization: Digital Domain
I felt like I needed to address some of the comments Grant has made
about our Linux Alpha cluster. I'm trying to avoid this becoming a flame
war and instead just concentrate on the facts of the case.
- |Daryll
From: Grant Boucher
Sent: Tuesday, January 06, 1998 5:57 AM
Subject: Re: ALPHANT Digest V1 #431
GB> uh, as the person who recommended, supervised, and implemented DEC Alpha
GB> at Digital Domain, I would like to clear up a few matters....
Grant was digital artist at Digital Domain. The official decisions
about the purchase of the systems were made by our director of
technology. I did the installation of the cluster and implemented the
Linux portion of the cluster.
GB> first, half of the 160 Alpha render farm was Windows NT 4.0. Only half
GB> was linux.
Half the machines were Linux originally, until they (the Titanic crew)
found that the NT boxes really weren't as useful. The 105 machines I
quoted in my article was the configuration roughly one third of the way
into the project. 40 machines were converted from NT to Linux.
GB> Unlike the Linux machines, the NT machines and the Digital Unix servers
GB> NEVER crashed, routed IP packets automatically (just hit the check box
GB> under Network config for NT) and basically rang rings around the Linux
GB> machines for ease of use, installation, and reliability. It took
GB> days of kernel recompiles just to get the linux boxes to even
GB> barely work and they NEVER properly routed packets (an NT machine was
GB> configured in 15 minutes when they finally gave up on Linux).
First, the NT boxes did crash. The systems administrator for the NT boxes I'm
sure would attest to that. Unfortunetly, they don't report their uptime,
and were silently rebooted. So, there really isn't a measure of how
reliable the NT boxes were. I do think they remained up more than the
Linux boxes for reasons I've explained later.
Second, we run a slightly unusual network. I did have trouble with the
FDDI card under Linux. We opted not to use it because of the problems,
but also because we could spare the NT boxes (they weren't being heavily
used) and it was a solution that minimized downtime. We were very busy
and it was the expedient solution. The other problem is that the NT box
did route packets, but not very quickly. The overall performance was not
very good for the speed of the link.
Third, I did describe in my article the troubles we had with that
version of the Linux kernel. They weren't minor, but we did manage to
resolve them relatively quickly. As I mentioned, I believe most of them
would not be true for current users.
GB> The Linux farm was unreliable and problematic for weeks when compared
GB> with the NT farm, and this was the SAME hardware, network etc. I am
GB> sorry to disappoint all the Linux fans out there, but in a production
GB> environment, Linux was found to be seriously wanting when compared to
GB> NT. NT was the ONLY operating system during Titanic that did not crash
GB> the servers at all...EVER. Irix on the SGIs and Linux on the Alphas
GB> both crashed DAILY...sometimes more than a few times a day.
I'm not sure where Grant got his numbers about downtime. Perhaps he is
extrapolating from the initial setup. Once the machines were up and
configured they worked very reliably. The machines are still in heavy
use and have an average uptime of around 60 days.
The most common cause for crashes was environmental
conditions. Unfortunetly, we under equipped the air conditioning in the
room, and the outside air temperature approached 110 degrees in some
places. A few of the processors that were being used in that area died
(quite understandably). In one of those places a couple of the Linux
boxes died, the NT boxes in those areas stayed alive. That was because
the Linux boxes were being heavily used while the NT boxes sat idle.
The other crash that was more serious for Linux was caused by bugs in
the NFS implementation. When a Linux box was being actively used and the
SGI server went down. This caused the NFS implementation on Linux to
hang. This was a serious problem for us, that sometimes required
resetting the machines. This was also a fairly infrequent occurrence. I'd
estimate once every couple weeks. Again, I believe current versions
would not have these problems.
GB> since these were simple Command line renderers, with simple parameters
GB> passed to them, your comment makes no sense whatsoever...again, ONLY the
GB> linux and irix boxes crashed during the production of Titanic...the NT
GB> boxes were the most reliable on the production...period.
The problem with the NT boxes is that they never got a reasonable NFS
implementation. The NFS on the NT Alphas was extremely slow. The lack
of support for symbolic links made using our disk space effectively very
difficult. The limitation of 26 mounted drives was insufficient. We
avoided this problem in the most expedient way possible. We dedicated NT
file servers and moved all the NT data to those file servers, that way
they didn't have to interconnect with the rest of the NFS
environment. They could remain their own isolated NT solution.
> The openness of especially Linux makes everybody can see
> what could be made better, everybody can help with the
> debugging of applications.
GB> huh? you are really reaching here...Linux is a shareware OS and the
GB> decision to risk the biggest film of all time on it was a terrible mistake
GB> in my opinion.
Linux is, of course, a freely available operating system. Having source
allowed us to fix problems we encountered that we could not have done
with a standard commercial OS. Of course, we would hope we don't have
problems to fix, but frankly that never happens. There are bugs in every
OS, and our environment stresses the operating systems.
GB> big mistake...Windows NT is a totally different animal than Windows 95
GB> and Titanic would not have delivered without it. I suggest you take a
GB> closer look at it. Linux is a shareware version of an antiquated OS
GB> from the 1970s...nothing more, nothing less. :}
Well this is obvious bait. So I won't address much. I agree Window95 and
WindowsNT are entirely different animals. Linux is a very modern
operating system and many of the technologies are very current in
operating systems.
GB> LightWave was the ONLY software running on the NT farm and NT
GB> workstations. The choice of linux for the other farm was merely a
GB> convenience for two programmers (the ones who wrote the article), who
GB> could have easily ported command-line code to NT as well as Linux.
GB> This, and other similar decisions, cost the facility (and actually Fox)
GB> a fortune in time and lost productivity as every time the linux machines
GB> bombed out, dozens of compositors were left in the lurch (every one of
GB> them being paid very high rates per hour mind you). The only problem
GB> exhibited by the LightWave/NT machines came from the render control
GB> software, which we just replaced when it became clear that the control
GB> software was "found wanting". This problem was not the least bit OS
GB> related.
Lightwave was used on the NT systems.
The choice of Linux was made for a number of reasons. The primary one
was integration into the rest of our facility. The ease of porting our
applications did come into play. Our distributed rendering system and
compositing system were much easier to get running under Linux than
NT. Since then we have ported those applications, as it makes the NT
systems more productive.
Not having an effective means of distributed rendering on the NT boxes
was a serious problem. That was not the case for the Linux boxes.
GB> In fact, one of my favorite Linux moments was one of the authors of the
GB> article asked the NT sysadmin "how many OS related crashes do you get a
GB> day?" The answer was, of course, "None" because neither of us would
GB> have recommended NT machines on a production like Titanic if they
GB> weren't 100% reliable. Perhaps he was trying to see if the hardware was
GB> to blame. The author, puzzled, decided not to tell us how many times
GB> per day the Linux OS was crashing. :}
As I said before we definetly had environmental problems in the
room. The question I asked was related to that fact. My choice of
operating system was not stopping systems to the point that they
wouldn't boot the ARC console. Other shutdowns were also diagnosed by
our vendor as heat problems. The fact that NT never failed this ways
indicates it wasn't being used as heavily.
GB> I am sure that someone with enough experience and technical knowledge
GB> could have configured the Linux farm to work as flawlessly as the DEC
GB> Unix and Windows NT machines, but the simple fact is that the NT
GB> machines practically configured themselves, ran flawlessly from the
GB> minute they were powered on, and still are. And ANYBODY could have set
GB> them up...all of the production NT machines were administered by one
GB> person...and he had plenty of free time on his hands to get really good
GB> at Bust-A-Move on the N64. Now THAT'S reliability!
I agree that Alpha Linux is still too hard to use in general. The Intel
version is much simpler and new OS releases make the installation
process even easier. In our case, my engineering time was cost effective
compared to buying ANY OS on those machines.
We have no way to measure the reliability of the NT stations as they
don't record their uptime. They also don't record their usage, which
made our billing process very difficult. Since the Linux users could
easily identify, report, and avoid problems which allowed their
complaints were handled quickly and efficiently.
By the way, I was the only person to support and manage the Linux
machines. I had enough time to continue my normal job of writing
software for the rendering of Titanic while doing it.
GB> In fact, the only problems we ever had with the NT machines was the fact
GB> that we had to cripple their networking because the D2 SGIs were over a
GB> year and a half out of date with Irix OS revisions, meaning we had to
GB> run NFS2 instead of NFS3 for everything...YEESH!
I already listed the numerous performance and stability problems we
encountered using NFS on the NT boxes. At the time our IRIX was not the
latest, but it was interoperability problems between NT NFS and IRIX NFS
that caused the problem. There was no proof that the IRIX OS was the
cause of that interoperability problem.
GB> Sidebar -> Now, it seems as though installing a Samba client on the SGI
GB> is the smartest, fastest, cheapest (free!) way to get Irix to NT
GB> connectivity. I know this post comes off as rather harsh, but you have
GB> no idea how much of a cluster-f**k the whole Linux thing turned out to
GB> be. This is one of THE principle reasons our new FX facility is
GB> entirely NT...period. Hope this clears things up...my intention is not
GB> to start a flame war, but merely to make sure the TRUTH gets out.
Digital Domain and its technical staff were quite please with the
performance of Linux. The work on this show would have cost
substantially more if we had not been able to use it effectively.
Samba works fairly well. Again it allows NT to remain isolated and not
interoperate with the rest of the facility. I'm not convinced this is
the best solution, but it does avoid the problem of not having an
adequate NFS implementation.
If you ask current employees and the management at Digital Domain I
believe they will all tell you that Linux was a success. Unlike Grant's
response, my article was read and reviewed by the management at Digital
Domain.
GB> Peace.
GB> Grant Boucher
GB> Formerly, Digital FX Supervisor, Head of the Windows NT Division, and
GB> Digital Titanic
GB> Technical Supervisor, Digital Domain
GB> Presently, CEO of station X studios, LLC.
Digital Domain has never had a "Windows NT Division", nor would we want
one. Basing a division or a company the choice of operating system their
computers run would be foolish. We use whatever tools make the most
sense for the task at hand. We continue to use NT and to port
applications to NT. My personal opinion is that NT will be more
important at our facility over time.
Opinions will always differ between two individuals, even those who
witness the same events. One has to look at the credibility and biases
of the person making the claim as well as their relationship to the
facts presented. I tried to provide a fair an even coverage of our
experiences. I believe I'm in a reasonable position to address these
topics.
Daryll Strauss
Manager, Software Development
Digital Domain
PS. I'm not a usual reader of this list. I'm going to remain subscribed
for a while to partake in the current discussions. Also feel free to
mail me directly if you are so inclined.
Last-modified: Fri, 13 Feb 1998 09:30:38 GMT