Tuesday, June 12, 2007

ATSC HD – The end of the story

Boosting D 805 performance 25+% yielded about 15% increase in speed transcoding SD tape captures to 640x480@29.97 around 750Kbps AVC. The following table summarize different encodes on different computes (italic is for expected times)
D 805 (2x3.4GHz)T2300(2x1.6GHz)D 805(2x2.6GHz)M 725(1.6GHz)M 360(1.6GHz)Dual Xeon(4x3.4GHz)
VTR(640x480@25)600kbps35+/2139?/19.533/1818/9
VTR(640x480@29.97) 750kbps35+/2139?/19.533/1817/945/30.5
mencoder 25fps 750kbps21.7(26)
mencoder 29.97fps 750kbps28.48
DVD (interleave) 600kbps40/2035/1818/917/9
480pSD 500kbps23/15
720pSD 800kbps17/1019/1113/813.5/13
720pSD 1Mbps10/719/118/5.5
mencoder 720pSD 1Mbps97
720pHD 1Mbps11.5/711.0/78.5/5.820/14
mencoder 720pHD 1.2Mbps7.4 of 25 (9of30)4.5
1080pHD 1.8Mbps7.3/47.2/46/???
mencoder 1080pHD 2Mbps4 of 25 (5of29.97)4.5
It pretty much boils down to the following
On performance
  1. 480p SD is real-time on current generation of processors (Core Duo, Core 2 Duo) regardless where it comes from. By real-time I mean either one-pass or 2-nd pass. Two pass encodes would go faster then 2xRT on faster processors, but the second pass would still be around RT. One pass encode although being slower because it needs more bandwidth (750 kbps vs. 600 kbps on good sources or 800+ vs. 750 on noisy sources) would still be faster then 2-pass with filtering and would create same result. 480p SD is real-time not because the processors got that much faster, but rather because there are now two of the same processors as before, i.e. for most part improvement in processor efficiencies got eaten up by scheduling between 2 threads.
  2. 720p SD is 2+xRT at 800kbps, but it needs 1000kbps on one-pass, thus it is at least 2.5xRT. Since the first pass takes the same 2xRT it would take 4.5xRT to do two-pass encode and thus it makes no sense to do 2-pass encode 720p SD on current generation of processors. It is questionable if it makes sense to encode 720p SD period – 480p would work fine for SD and could be done twice as fast.
  3. 720p HD is 3.5-5xRT. 2nd-pass could be done in 3.5xRT at around 1Mbps. One-pass would need 1.25Mbps and thus could go as slow as 5xRT. Quad Core 2 supposedly more then 2-times faster (2-times for 4 cores, “more” for Core 2) thus 720p HD would take 2-3xRT, i.e. one-pass should clock 10+ fps – still more then two times real-time – even 4 cores cranking the best they could would make it close to what Pentium M 360 (as in Toshiba) could do to DVD since 2005.
  4. 1080p HD is 5+RT - is not happening until $350 computer would have 10 Dells D600 inside. D 805 and Yonah sometimes have difficulties playing 1080p since WMP 11 cannot thread across cores.
So as far as performance is concerned when it comes to ATSC until every shows on the air is not in HD, it is 640x480@25fps single pass at around 700kbps – no need for wide screen TV, xbox should work, D 805 should do it in 1.33xRT i.e. 22.5 fps for 29.97 stations, 45 fps for 59.94 station, 18 fps/25 fps for x264. Movies are still the best from DVD and if need be could be upconverted to 720p.
On encoding
MainConcept MP2 decoder (ad2mcdsmpeg.ax) that comes pretty much with every software these days is the best decoder because it could deinterlace and output in YV12
  • 480p from clean sources (DVD, ATSC) could be encoded around 600kbps at 25 fps in two-passes at 2xRT and 700kbps in single pass around RT (1.33RT for ATSC). Bellow 600kbps it is pushing it, but for less then 480 height is possible (like I did it with China @500kbps for 360 and better yet 368 height).
  • 480p from bad sources (tapes) better be left at original 29.97 frame rate and encoded at 750kbps in 2-passes at less then 2xRT. Deinterlacing should be done by Mainconcept decoder, so the only filter is DeGrainMedian(limitY=5,limitUV=5,mode=3) and Undot. Nothing could be done to bad sources – junk in – junk out.
  • 720p SD is not worth it, but since I played with it… Two-pass encodes need 700-800kbps at 25 fps. One pass needs 800+ (really around 1Mbps). Goes around 2.75xRT. Doesn’t work on XBOX.
  • 720p HD – needs 1.25Mbps for 2-pass at 25 fps. For single-pass 1.25Mbps on 25 fps could be OK, but 1Mbps is definitely not enough.
  • 1080p HD – would need 1.75-2Mbps but I couldn’t really tell since I cannot even play it.
This is as far as 32-bit architecture would go – no need for 24” widescreen monitor, no need for HD-DVD. What needed is 64-bit software that may be would speed things up 2-times. For now the following is left to be tested on 32-bit
  • SD show as one-pass at 480p on emachine at 700kbps for 25 fps
  • Tape as one-pass at 480p on emachine at 29.97 at 750kbps.
  • Upconvert widescreen DVD to 720p and 2-pass at 1Mbps. Time and log PSNR and SSIM
  • 1080p on Lenovo (and upclocked emachine) – would it play? Cannot play on old emachine.
  • DRVMSToolbox to automatically transcode
Oh and BTW, 32-bit Vista started to rank computers. This is what I’ve got
CPURAMAeroGPUDisk
Lenovo4.64.53.33.04.1
eMachine4.63.92.23.05.0
D805+GMA950@3.3GHz4.94.53.03.05.2
D6003.44.01.91.03.7
P.S. Did a bit more testing... more research actually... and cannot compile mencoder on OSX and they stopped pre-building in 2006, so on OSX there is no support for neither dvr-ms, nor recent x264 options... Pretty much a dead end trying to test 2.33GHz Core Duo2 I've got in MacBook. Parallels loads CPU just about 70% and at that transcodes 720pSD 1Mbps at 9of30 or 7.5fps or the same as overclocked D 805. So theoretically 2.33GHz Core Duo2 (4GB L2 on 667MHz) could be 30% faster, but practically there is no software for OSX and even 30% faster would still be worse then 2xRT single pass for 720pSD and 720pHD and for 1080pHD it would be still around 5-6xRT. So as expected Core Duo2 will not make any difference as far as HD goes. Would 64-bit make any difference?

Since on MacBook Parallels will not fully load CPU the only way to find out is to either install Vista x64 or Ubuntu on D 805. There is no overclocking software for Linux, so at the end it would be Vista x64 for which I would have to build mencoder myself and that would take too much time. So until somebody would build 64-bit version of mencoder I am done playing with HD and yeah, there is no way I am buying overpriced Apple hardware 64-bit Leopard or not. Only Quad would make me happy and that means L775 socket.

P.P.S. Automating DRVMSToolbox would imply writing a custom action in C# to call different mencoder profiles (different crop options) which is not a big deal, but would take time... Since most of the programming is crap it ain't worth my time either. Get back to this only once there is a 64-bit mencoder for Windows and do it by writing a shell script to run overnight thru Scheduled Tasks. There are 64-bit versions of x264 BTW to test that DVD upconverts.

Overclocking Pentium D 805 или как я стал "злым хакером"

HD takes copious amount of time to transcode. Thus my first urge was to get a new more powerful computer, but mine are less then a year old. So I started looking what will be coming and realized that Yonah (Core Duo) was the shortest product lifecycle in Intel history – less then a year. Another thing I realized is that crappy emachines box that I got for Christmass for $350 is actually pretty cool box for $350 that is. There are two things about it
  • Pentium D 805 processor (2x2.6GHz, 2x1MB L2 on 533MHz FSB). I did go for it because of two cores, but I never realized it is instead two Pentium 4 Prescott processors glued together and are highly capable of overclocking. With 20 multiplier simply sticking D 805 into 800Mhz motherboard would increase the clock from 2.6GHz to 4GHz on both Pentium 4 inside (yeah, it would need to be cooled to run that fast). Did I mention EM64T? On top of it all it is a 64 bit processor.

    I wanted to get D 830 instead (2x3.0GHz, 2x1MB L2 on 800MHz FSB) but that would have only 15 multiple yet would come with 800MHz FSB motherboard. They wanted $400 for it and were sold out, so it didn’t work out and looks like for better.

  • Intel D1102GGC2 motherboard with ICS 951417 PLL that ClockGen and such almost understand (should use ICS 9541xx like ICS 954119 to set ICS registers). With build in ATI graphics (and thus ATI chipset and ATI was bought by AMD) I was thinking that the motherboard would come from some cheap sweatshop somewhere in China especially considering that it was inside $350 computer, but it did come from Intel. By no means it is the best motherboard around, but Intel has support (new BIOSes and such like a toolkit to “cook” your own BIOS) and this motherboard is well capable of running at 800MHz.
What this all mean is that $350 junk computer is well capable to beat anything that was available last Christmas for over $1000 including Pentium D 960 ($549 3.6GHz 2x2MB L2 on 800MHz FSB) and T2700 Yohan ($637 2.3GHz 2MB L2 on 667MHz FSB) and would be on par with the best Core 2 Duo in 667/800 MHz FSB like T7700 ($530 2.4GHz 4MB L2 on 800MHz FSB Socket P) released in May 2007 or T7600 ($637 2.3GHz 4 MB L2 on 667MHz FSB Socket M). LGA 775 Core 2 Duos all running on 1066MHz FSB would probably be a bit faster. On 800 MHz FSB the only Core 2 Duo available and released in May 2007 is $133 E4400 (2.0 GH 2MB L2 on 800MHz FSB) that D 805 would beat without any problem running faster then 2.8 GHz despite 40% Core 2 Duo improvement compared to Pentium D.

What this all also mean is that it should be relatively easy to overclock that junk $350 computer. And it was indeed. Popping the hood open (yeah it did involve the use of a head-mounted flashlight) I learned that PLL is ICS 951417. Talking to emachines clueless support and googling I got to know that the box has Intel D1102GGC2 motherboard and while hanging up the phone I was already downloading genuine Intel BIOS that let me set RAM speed to 667MHz out of the box. Changing FSB speed would require “cooking” new BIOS using Intel Toolkit, but I just downloaded ClockGen and randomly selecting ICS 954119 (instead of ICS 951417 that I have and they didn’t list) I set FSB speed to 667MHz and computer didn’t blink. D 805 felt very relieved to run at 3350 MHz with all clocks matching up at 667MHz. 25% increase in performance came in no time.

We are now running at the top of Core Duo range and at the bottom of Core 2 Duo. Let’s go faster. 800MHz FSB (4GH processor clock) crashed the computer right away. After about an hour of playing with memory speeds etc I learned that anything above 170MHz (680Mz FSB) would make system unstable, so I left it 2*3.4GHz, 680MHz FSB and 667MHz memory or about what I would get from most Core 2 Duo systems apart from $533 E6700 (2*2.67GHz) and Core 2 Quad Q6600 (4*2.4GHz) that would go "on sale" at the end of July 2007 at $266 vs. $530 currently (reduced in April, 2007 from $851)

Вот как я стал “злым хакером“. HD or not, I doubt I would be buying another desktop in the next 5 years again because really there is no point, except if it is Q6600 on 1066MHz FSB. $350 could sure buy a lot of fun.

Wednesday, June 06, 2007

HD or no HD

Going from interlaced SD sources (DVD or MPEG2 tapes) to interlaced HD sources (ATSC) is going from 39/19.5 (Yonah) or 33/18 (D 805) to 12/9 fps.

Yonah encode SD at 1.25 RT (2nd pass or single pass) and overall 2-pass SD source transcode on Yonah is 0.6xRT+1.25xRT~ 2xRT. D805 is marginally slower (more on the 1st pass due to slow FSB?). Cropping and scaling HD sources to 640x480 would make 1st pass 2xRT and 2nd pass 2.77xRT and thus 4.77RT overall, or 2.2 times slower on (2nd or single pass) and 2.4 times slower overall on 2-pass encodes. There is just too much data in 1080i@30 and 720i@60 regarless of that at the end I end up encoding 640x480.

Overclocking D 805 should increase performance 154%, thus on my 25fps converts of SD sources 2nd pass should be 27.72 or faster then RT and about RT without changes to FPS. Similar performance should be expected from Core 2 Duo. This wouldn’t help ATSC transcodes scaled to 640x480 for on either - 2nd pass would be about 13.5 fps or ~2xRT and overall 2pass encode would take ~3.18xRT. Core 2 Duo would make 1 pass encode of ATSC wrapped SD scaled to 640x480 about as fast as 2 pass SD sources. Only Quad Core 2 would get us in the same ballpark with ATSC wrapped SD at 640x480 as Yohan with SD sources.

The above is based on 12/9 fps for ATSC SD which I need to verify once again.

ATSC wrapped HD would be even worse. Scaling to 640x368 would get us into the same 12/9 fps (4.77xRT) ballpark. Going to true 720p HD (1280x720) would double that to 4.5 fps on the 2nd pass, so even on 1 pass encodes we are over 5xRT that Quad Core 2 would probably make 1-pass 720p encode 2xRT and well built system should do 2-pass in 3xRT.

1080p would take even longer. How much longer would be interesting to find out. Thus the tests are:

  1. 2-pass 640x480 ATSC on a) D 805 to prove 12/9 fps; b) Lenovo to see if it better – Those tests should take 4.77xRT, thus 1.5 hrs on shows. Go for 500kbps with those.
  2. 2-pass 1000kbps 720p HD test to see how slow is the 1st pass. Looking at 9/5 fps or ~ 8xRT or 6 hours for 45 minute show.
  3. Overclocking D 805 to see how far it would be from 13.5 fps on the 2nd pass/1-pass transcodes of ATSC wrapped SD. This test could be done on either mencoder (1-pass) with adjustment for frame-rates and would take 2xRT or 40 minutes; or with x264 to test if 1-pass would get better, but would take 3xRT or 1 hour.
  4. Vista x64 test.
  5. If I don’t have it already do 1080p HD 2-pass transcode
Thus I need 3 SD shows and 2 HD show and the fastest way is to cut commercials into drv-ms and use cut drv-ms. Lenovo would need about 3-5GB to do the test. For the sake of argument, put T2300 (478) into D600 and see if it would POST and start cooking that BIOS.

Overall, 5-8xRT 720p HD would need yet another generation of processors and definitely would need Quad to transcode 45 minutes shows in 1-2 hours vs. 6 hours it takes today.

Followers