Friday, November 09, 2007

Distributed encoder

Trimming formula for balanced distributed encoding comes as a solution to an equation like this x+x+1.5x=1-> 2x+rx = 1 -> x = 1/(2+r)... if MacBook is twice faster the multiple is 0.25 giving MacBook 0.5. Avisynth cannot do decimals, so multiple needs to be expressed as fractions. Using multiple it is easy to estimate distributed encoder speed knowing the speed of slowest box, i.e. lenovo/emachine in my current setup.

For 720p->480p SD on 23.976 material overclocked D805 is doing ~12.5fps on single pass encodes from freenas (or local, lenovo over WiFi cannot load CPU more then 80% and is doing ~10 fps), thus encodes are taking 23.976/12.5 ~ 2xT at the box yielding for distributed encoder 2xT*multiple = 2xT*0.25=0.5xT. (I run with 0.28 multiple so in my test 21 minute show encoded in 12 minutes or 0.57xT or 1.75RT as expected). For 1080i->720p HD on 29.98 material both Core Duos are doing ~5.9 fps from freenas even over WiFi (lenovo from emachine it is 5.26fps and emachine from local is 5.67 fps), thus it is 29.98/5.9 ~ 5xT at the box and 5xT*0.25=1.27xT (1.42xT for 0.28 multiple), so 40 minute show should encode in 51 (57) min.

Q6600 should be 3+ times faster than Yonah, thus 720p->480p on Q6600 box SD should take 0.67xT (vs. 0.5-0.57 for current Distributed Encoder) and be done in 13.5 minutes (vs. 10-12 min+2 min repack = 14 min in total for Distributed Encoder or about the same). 1080i->720p HD should take about 1.7xT, run at ~17fps for 29.976 source and 40 min show should be done in ~66 min (vs. ~51-57 min +7.5 min for repack ~ 60 min for Distributed Encoder).

Doing Distributed Encoder with Q6600, Duo and Duo2 would give the multiple x = 1/(3+1+1.5)=1/5.5=0.18, thus with Distributed Encoder with 720p->480p SD should take 0.18*2xT=0.36xT, run at 66+ fps on 23.976 and be done in 7.5 minutes (vs. 10 min+repack for old Distributed Encoder). 1080i->720p HD should take about 0.18*5xT=0.9xT (1.1xRT!!!), run at ~33fps for 29.976 source and 40 min show should be done in about 36 min... If repackage could be done in 4 minutes we would be transcoding in faster then RT and in about running time with commercials including source prep and commercials removal. See if HandBrake running on Q6600 could beat that by doing 23.976 transcode.

Overall, we went for 1080i->720p HD from unreasonable 20-25 min to cut commercials and 5x40 min = 3 hr 20 min or about 4 hrs for every 40 min show to about 1 hr 20 minutes with current Distributed Encoder all the way to within an hour...

But it doesn't stop there. If we throw another Core 2 into the mix (a Mini or whatever) the multiple would become x=1/(1+2+2+3)=0.125, thus 720p->480pSD would take 0.25xT or 20minutes transcoded in 5 minutes and for 1080i->720p HD it would be 0.125*5xT=0.625xT or 40 minutes be done in 25 minutes.... This would be the shit!!! (as much time to cut commercials out as to transcode).

P.S. Tapes are just slow because of crappy quality of the video... Try 23.976@700 if it won't help quality it ain't worth it encode them at over 750kbps, so probably try 480x352 since 320x240 is too small (verify on big screen TV). Changing PAR won't help since for one it is poorly supported and for another I need to enhance vertical by a factor of 2 and PAR is to expand horizontal - no easy way out here... P.P.S. New encode times

encoder1080i->720p720p->480pSD
Yonah5.2-5.9fps10-12.5fps
MacBook~9fps 3.3xT +50-73%15.14fps +20-50%
HB MacBook3.3xT-5.5xT
Distributed~18fps 1.6xT~42fps (0.7xT)
Proc Time~60min<15min
Q660015-18fps30-37.5fps
HB on MacBook run 1:50min doing 1080i->720p @29.97 (30 min 1st pass and 1:20 2nd pass). @23.976 it took 65min (25+40), so handbrake is not much faster then x264 when deinterlacing....

Wednesday, October 10, 2007

HD-DVD and stuff

HD-DVD or 3x DVD are still not happening. There are just two software players - Cyberlink and DVD player in upcoming Leopard. Well, DVD player on Mac would only play DVD Studio created HD-DVD... I don't even know what to say about that... it wouldn't play most HD-DVD (apart from MPEG-2 encoded I guess). Cyberlink doesn't play HD-DVDs if it cannot find HDCP video card, so there is really no point in creating 3x DVD since nothing could play them even on the computer. $300 standalones would be out this Christmas shopping season, but next year they would be cheaper and computers would start to come out with HD-DVD drives and then I guess playback and authoring software would become available... Until then there is no point despite that I figured out how to make X264 encoded HD-DVD in DVD Studio.

There are two tricks:

  • Like iPod DVD Studio cannot handle B-frames but unlike iPod could do CABAC. 4:3 SD material would be stretched to 16:9 so SD cannot be played back by standalone and thus SD material in AppleTV or whatever TV-appliance playback only. HD without B-frames... Hmm, 1280x720p@23.978 would require over 2Mbps, thus only about 3.5 hrs could be fitted on DVD (read even 2 movies might not fit)
  • There gotta be a bug in mp4box because the only way to import is to remux raw h264 with ffmpeg. It gotta be raw track (will not work remuxing from mp4) and if audio is muxed too then video do not show up in DVD Studio (delete audio from mp4 and DVD Studio would see video)
  • DVD Studio is very picky and doesn’t even like VRB .ac3 that comes inside ATSC streams – re-encoding with BeSweet fixes that. Surprisingly enough DVD Studio accepts AAC audio as well
Top it off with that it took DVD Studio over 2.5 hours to package 43 minutes worth of material and it is clear that DVD Studio is piece of junk. Additionally, I greatly doubt DVD Studio authored 3x DVD would be playable by standalones.

And DVD Studio is the only authoring software apart from Scenarist that could author H.264 HD-DVD. The rest are MPEG-2 only.

On HD encoding... Compressor is unbelievably slow – single pass ATSC transcode took 9 hrs for 43 minute show or 12.75xT or 0.07xRT on MacBook while on overclocked emachine it took 3.6xT or 2.5 hrs for the same show. Two pass encode with Compressor takes about twice as long as single pass, i.e. absolutely unreasonable 16-18xT or 12-15 hours and in fact is the only option since single pass Compressor encode doesn’t work – it encoded 2.0Mbps target to 250kbps, nice rate control algorithm and overall excellent job Apple!

On Bandwidth vs. Quality... 1.5Mbps single pass seems not to be enough but is almost OK for ATSC. At 2.0Mbps also not enough but better. 2.0Mbps would make a good compromise for ATSC. For DVD upconverts 2.0Mbps is also not enough even with 2-pass. Such high bitrates are because DVD Studio cannot import AVC with B-frames. With B-frame support 1.0-1.2Mbps is not enough for 2-pass, but it could be squeezed into 1.5Mbps I guess.

On Lenovo on SD upconverts mencoder did 5.4 fps/23.976 or 4.35xT or 2 hr movie would take ~ 9 hrs to transcode and that's on top of 25 minutes to run DGIndex. Compare that to 43+ fps/23fps or 1.8xRT +0.95xRT or 0.55xT+1.05xT=1.6xT for 640x362 encodes with Handbrake... and that's with b-frames and all. 2-pass encode with x264 took 22.7/7.46 or 1.3xT+3.2xT = 4.5xT on 1280x720@23.976 encode or the same as 4.35xT single pass with mencoder. For comparisson 1080i to 720p ATSC transcode took 9 fps/3.75fps or 3.33xT+7.99xT=11.32xT

HD bottomline is... 1280x720p@23.978 is the smallest and the most practical resolution for HD DVD. Since it takes ~5xT in the best case to transcode (from progressive material, deinterlacing would take twice as long), to be practical we still need to at least double processing power (Qxxx processor) to make it 2.5xT or about 5 hours to transcode your average movie (or twice as much for interlaced). Clearly that still would be too long and we need at least quadruple current processor speeds to get to reasonable 1.5xT or 2.5-3 hours as with Handbrake today. So $266 Q6600 on 1066 FSB wouldn’t quite cut it and from everything available today we need something along the lines of Mac Pro with two Xeon E5300 series or 8-cores in total for ridiculously huge amount of money (like $2500 that Apple wants for Mac Pro) or… I could spread encoding over several idling computers. This is not that hard actually. FreeNAS could push 50+Mbps over 100Mbps wire or ~7Mbps over 54Mbps WiFi – more than 5Mbps that any of my computers could take when encoding to H.264 or more than 10Mbps that Quad should take.

The scenario is

  • capture to local drive and make VideoRedo output mpg to FreeNAS (could theoretically capture to FreeNAS, but MCE captures only to “Recorded TV” or local drives). This would take 44.78 fps/29.97=0.67xT. Run DGIndex 39/110=0.35xT or 1xT so far to prep the source not counting whatever it takes VideoRedo to find commercials.
  • on FreeNAS create emachine.avs and lenovo.avs with trim(0,framecount()/2) and trim((framecount()/2) +1, 0) at the end respectively. (or even better use framenumbers from .vproj and transcode straight from .drv-ms) or slice by 6000 frames that takes 10 min for 1st and 26 min for 2nd passes on emachine transcoding from 1080i. Both emachine and Lenovo transcode 1080i at 3xT+7xT = 11xT; 720p at 2.72xT+4.28xT=7xT and SD upconvert at 1.3xT+3.2xT = 4.5xT
  • Wait for mp4 to appear and then join them with copy /b a.h264 + b.h264 result.h264 and do the audio and muxing.
Source prep would take 1xT, so I won't be able to match Handbrake, but say we would try to match pure transcode without source prep. Both emachine and Lenovo transcode 1080i at 3xT+7xT = 11xT; 720p at 2.72xT+4.28xT=7xT and SD upconvert at 1.3xT+3.2xT = 4.5xT. So to make SD upconvert take 1.5xT we need an extra Core Duo for a total of 3 processors or 6 cores (4.5xT/3). To make it reasonable for 720p source we need to do 4.5x or 2 processors I have plus a Quad, but 1080i would still take 11xT/4.5~2.5xT and only running 2 existing computer plus Quad and Mac Mini would make it 11xT/6.5~1.7xT in addition to 1xT for source prep.

So 4 computers I want to have would do HD, but computers still need to double in power before it would be reasonable.

Friday, August 31, 2007

x264 is useless for HD DVD

X264 would be totally useless for HD DVD for two reasons
  • h264 stream generated by x264 is not supported by DVD Studio Pro. There are two problems here – the encoder do not set 3:2 pulldown flag and do not set resolution properly, then mp4box tops it off muxing into incompatible .mp4. DVD Studio cannot work with anything but .mp4 or .mov, so the story pretty much ends here, however there is more
  • x264 only encodes progressive frames. Since HD DVD spec (and/or DVD Studio) only support progressive at 23.978 with 3:2 pulldown to 29.97 that means that HDCAM captured 29.97i video would need to have FPS converted to 23.98 before encoding and this creates motion artifacts.
So despite being 2.5 times slower and a lot worse quality wise, Apple’s Compressor is still the only working HD DVD solution. It does stretch 480p SD to 16:9 (I wonder if 960x720 would be supported without stretching), so HD DVD is useless to store AVC encoded SD and thus, for SD the storage is still interlaced MPEG-2 encoded DVD and that still leaves open the question – what to do with PAL->NTSC and 25p 640x480 camera movies (and what to do with Tape captures).

BTW, my DVD player doesn’t play DVD-RW and XBOX do not properly work with 29.97i and 23.98p, so still need to burn test movie on DVD+/-R at 23.98p and check it.

Finally, EyeTV on Apple captures SD with audio problems (both out of sync and pop/click artifacts). So first do check Pinnacle captured SD to make sure there is no clicks (sync seems to be fine) and burn a DVD at 23.98p at about 4Mbps (2+hrs per single layer) which seems to be more then enough for Tape captures. (Both Pinnacle and EyeTV capture at 8Mbps max). And yeah, surprisingly enough TMPGEnc takes 0.5xRT (2xT) to do SD encodes or no faster then x264 could do H264.

Tuesday, August 28, 2007

HD DVD

It is only a matter of time now for them to come out with a program that would backup HD-DVD onto double layer DVD media to be playable on standalone HD-DVD player. Definitely HD-DVD would enter mainstream this shopping season.

For now creating 3x DVD involves

  • transcoding using x264 at lower bitrates
  • modifying stream a bit with x264info (adding 3:2 pulldown flag for 29.97i)
  • building image with Sonic Scenarist
  • burning ISO.
For progressive material I guess x264info step could be skipped. AC3 audio could just be copied, so I guess it all boils down to building image with Scenarist and transcoding from 3.5xT (0.285xRT) for 720p to 7.1xT (0.14xRT) for 1080p on the 2nd pass (on MacBook it would be 3xT(0.3xRT) and 5xT(0.2xRT) respectively), so 1.5 hour material would take upto 2*1.5h*3.5xT~10 hrs for 720p and 2*1.5h*5xT~15 hrs for 1080p to transcode. Again, computers gotta become twice as fast before that type of processing would become practical. It would be fun to try however – because the size would be limited by a bit less then 4GB (say 4.2GB – size of AC3 to be more exact) that means bitrates of upto 4000*8/(1.5h*60*60)~5.9Mbps – one could fit HD content on even single layer DVD.

Monday, August 27, 2007

Receivers

All of the “all in one” home theater systems do not yet do HDMI pass-thru and switching. They just started to have HDMI out for DVD output to HDTV and that is pretty much it. Phillips HTS3555/37 is a good example – it has just HDMI out and COAX IN, no TOSLINK and no HDMI-In. Samsung HT-X40 on the other hand doesn’t have COAX but has TOSLINK in (Samsung speakers are ugly too). It is a bummer that “all in one” doesn’t have meaningful digital audio-in because without it they are no more then a fancy DVD player with HDMI out and bunch of speakers.

Real receivers for most part are still lacking HDMI switching. $180 SONY STR-DG510 being an exception having HDMI (2 in 1 out) and component (2 in 1 out) pass-thru, 2 TOSLINK 1 COAX. Prologic decoding should be manually switched – just like on my old SONY and since they don’t switch between Component and HDMI that means that things like Wii (component only) couldn’t be hooked up to LCD monitor thru receiver. Thus “real receivers” are a bit better but useless as well because they cannot handle anything but a computer hooked up to one HDMI and say HD-DVD player hooked to the other. Plus, HDMI pass-thru do not do any True-HD and such decoding, so today “real receivers” would be useless in the future – read should be replaced soon…

The bottom line is – there is no reason to buy new receiver now. Mac Mini would plug into living-room replacing DVD player outputting to old TV thru Video Adapter. In the game-room emachine will have neither TOSLINK, nor COAX, so it could plug into old stereo. I should start building up game-room only once I get the Mini but then it would work in the living-room. So I should start building up game-room only if I get other computer to drive 24” LCD, but until I get that computer I shouldn’t be buying the panel and may be by that time they release “all in one” receiver with both TOSLINK and COAX-in.

New Mini vs. HP s3100y

Apple released updated Mac Mini (MB138LL/A) and once again I found many reasons not to buy one. This time around updated Mini has a lot going for it. It is finally 64-bit (1.83GHz Core Duo 2), has a Gig of RAM (vs. 512MB before) and 80GB drive (vs. 60GB before) – not that bad actually for $599 apart from the usual – DVD writer cost $200 more.

In the meantime HP came up with their small PC - s3100y that I almost bought for $200 + $70 S/H. The deal was too good to be true and as always Office Depot didn’t have a computer to sell me (they were sold out for one, but apart from that delivery was 40 days out), so realistically s3100y could be picked up without $250 rebate, but say with $150 for $299+$70 S/H~$350 - $400 with 15” monitor and a printer. So what $400 vs. $600 would buy apart from a monitor and a printer? 32-bit 1.6GHz Core Duo in 775 socket on 800MHz FSB with no TV-out, but SPDIF, same Gig of RAM and 160GB SATA drive. Asus motherboard could be run at 1066MHz which made that HP box interesting, but as is the box has about “half” the CPU and twice the disk space (DVD-RW goes without saying). Upgrading to Core Duo 2 2.2GHz E4500 processor would cost $140 from HP or MicroCenter, thus Mac Mini rival from HP would cost $500-$550 or the same as Mac Mini with $50 rebate. HP differential to upgrade from 15” LCD to 24” is $470 and from 15” to 22” - $200 or the same as buying either 24” or 22” Acer monitor, so as always, the only thing that would come with HP for free is a printer.

The bottom line is that I should wait for Leopard to be bundled with the Mini after it is released in October, but I guess I would be buying just when Amazon would slap $50 rebate on it. First however I need to ebay Apple’s DVI to Video Adaptor ($19+$4 from Apple Store) to hookup Mini to TV using either composite or S-video..

Sunday, August 26, 2007

FreeNAS or 1TB for $150

My total capacity is 186.31GB+76.33GB+74.53GB=186.31+150.86=337.17GB. Two more 320GB SATA drives for $60/each (or another $20 off on each) + $30 IDE/SATA RAID card with RAID 5 would make about 1TB for $150 in 3 IDE + 2 SATA drives (i.e. I could still keep 25GB raid and extra IDE channel for CF-IDE if I buy power cable splitters). I could make 0.5T for $100 with 2 more 160GB drives at $30 (need just one really so it is $60 for 0.5TB, but then I need a drive for emachine) in 3 IDE = (150 + 180GB) + 1 SATA for overall 0.45TB. Using 25 GB in 13GB RAID (I still have 3 IDE channels left, but again just 5 overall drive slots and power supplies) I could probably boost it to 3*180=0.54TB, so 0.5TB for $100 is more doable then 1TB for $150 that would still need a drive for emachine (13GB just wouldn’t cut it you know)

The biggest problem is to back up the data when building the raid, thus the need for big drive to be left with emachine. So if I buy another box with 160GB or they would have 160GB drives for $30 before they would have 320GB for $60 it would be 0.5TB for $100, otherwise with 320GB drives for $60 it would be 1TB for $150… that would start as a 0.32GB RAID (one 320GB drive goes into FreeNAS and one into emachine) that would get upgraded to 1TB RAID 5 once $30 drives would be in 0.5TB range…

Thursday, August 09, 2007

The Future of HD

720pSD is 2.25x 480pSD, 720pHD is 1.33x 720pHD and finally 1080pHD is 2.25x 720pHD. Since every pixel needs to be processed those ratios pretty much determine how much slower it would be to encode 720pSD then 480pSD and so on. Since AVC do not encode every pixel, relationship between bitrate needed and picture size would not be linear as with encoding times. Here is the table to demonstrate that on single pass mencoder encodes on overclocked eMachine that roughly corresponds to 1.6GHz Core Duo.
SizekbpsH264kbpsXVID
480pSD7501.43xT-1.66xT (0.6xRT-0.7xRT)8501.85xT??? (0.54xRT???)
17fps43fps16fps
720pSD800-9002.3xT-3.0xT (0.33xRT-0.43xRT)10001.76xT(0.56xRT)
10fps25fps18fps
720pHD12503.5xT (0.285xRT)15003.0xT(0.33xRT)
8.55fps10fps
1080pHD2000+7.1xT (0.14xRT)25004.3xT(0.23xRT)
4fps7fps
So in reality it is not quite 2.25, 1.33, 2.25 but more like 1.5, 1.5, 2.0. Still it doesn’t change the picture that much… Today only 480pSD is about real time and 720pSD would need more then Core Duo2 to become practical that still would make 720pHD a bit better then half the real time and 1080p still way too slow to be practical. I guess there is no real reason for me to get Core Duo2 unless it is Quad and even then we could hope for 720pHD to be not too much slower then real time, but still slower. No need for more then XBOX or flat screen TV either – we are still in 480pSD world until I get that Quad.

P.S. I did run some test on 2.66GHz Core Duo2 MacBook and 720x304 DVD with Handbrake are about twice as fast then D805 at about 100fps/23.97~4xRT (0.25xT) vs. 29fps/23.97~1.23xRT(0.8xT). Upscaling to 720pHD on D805 it was 10fps/23.97~0.4xRT(2.4xT) vs. 12fps/23.97~0.5xRT(2xT) on MacBook under Windows or 14fps/23.97~0.6xRT(1.7xT) under OSX - OSX is more optimized, but still 2hr movie would take more then 2*3.5=7 hours to upscale to 720pHD and 2-pass encode. Definitely, HD is not here even with fastest Apple hardware.

Transcoding ATSC also was well just 15-40% faster 5.87fps/29.97~0.2xRT(5xT) vs. 4/29.97~0.14xRT(7xT) for 1080pHD and 9.63/29.97~0.3xRT(3xT) vs. 8.55/29.97~0.28xRT(3.5xT) for 720pHD, and as predicted, 480pSD was faster then RT at 69.25/59.94=1.15xRT or upto 75/59.94=`.25xRT on OSX. Two pass encodes of SD tapes with x264 under Vista run 53.70/30.25 or about the same as on Quad Xeon 3.4GHz Dell - 49.70/30.58. Finally comskip did run under wine faster then on Vista (250 fps vs. 192 fps).

P.P.S. There is a problem with ffmpeg and x264 muxer making .mp4 unplayable in QT. To fix it need to extract to raw with mp4box (06/2007 build) and remux (with -fps switch for 29.97). Also Sonic DS filter for MPEG-2 is way too slow - use Mainconcept.

Friday, July 20, 2007

HD Final Notes

After playing with AVC decoding on different hardware here is what I have to say about it
  • XBOX ($200 worth of Pentium III with 64 DDR on 133MHz FSB living in my living-room since 2003) is powerful enough to do 640x480 H264 encoded at 600-750Kbps with recent builds of x264 (surprisingly enough it drops too many frames on L’épopé en Amérique encoded in January). Yeah, it drops some frames, but it is not a biggie. XBOX cannot do anything bigger then 480p SD, but supposedly should decode 5.1 AAC into DTS and output to TOSLINK (or is it - do test).
    XBMC (or mplayer and thus Apple TV, 64-bit Vista and so on) cannot play drv-ms until this changes make it into mplayer build, but could or couldn’t play? MPEG wrapped HDTV, though this is of little practical use since the files are huge and would need to be AVC-transcoded anyway.
    XBOX DVD-ROM doesn’t like my DVD media (recognize but fails to mount I guess), but plays pressed DVDs just fine.
    With some coding of XBMC extensions I could make it do everything I need from online media and more. Namely, as it is my extension plays my RSS feeds and all I have to do is
    • modify my extension a bit to load a categorized list of my RSS from a server… May be TVTonic server?
    • implement “add to favorites” to add a link to my RSS
    • Figure out how to make launching extensions (and especially my extension) more straightforward which would imply looking at how skins are implemented… Say modify XBOX360 skin to have an extra tab for my extension…
    All in all, XBOX is the best and cheapest solution as long as it is attached to SD TV-set, but even with HD panel with HD kit it would upconvert 420p to 720p or 1080i. Today and tomorrow this is where we are with HD anyway.
  • Apple TV ($250-$300 worth of 1GHz Dothan, 256 DDR2, 64MB GPU on 400MHz FSB) with hacks would play non-Apple media and samba mount it. However to this day there is no RSS plugin that would stream the media off the Internet, so video feeds first need to be downloaded on a server and then streamed to Apple TV (as Apple intended). Writing a plugin means some OSX programming (plugins are bundles for Apple TV Finder.app). It is questionable if this plugin would be written since most are happy storing there stuff on Mac and making it available thru iTunes and the same goes for pictures, music, etc. So if I want Apple TV the way I used to enjoying my media I would have to write everything myself.
    Is it worth it? It might have been if Apple TV
    • was powerful enough to play ATSC HDTV
    • was powerful enough to play 1080p H264
    But it ain’t. So yeah, Apple TV is cool, but it is more expensive and for extra $100 vs. XBOX you would get half of functionality (as of today), twice amount of work to make it usable and at the end the best it could do (vs. XBOX that is) is to play 720p. Frankly there is more – Apple TV would pass-thru Dolby Pro Logic for receiver to decode. Cool! Means that Apple TV would play 720p HD mp4 with 5.1 AAC on Dolby Pro Logic receiver the same as DVD player would play AC3 on Dolby Digital receiver. But ain’t XBOX is capable to software decode 5.1 AAC into DTS and output to Dolby Digital receiver? If XBMC could do that we are back to 480p vs. 720p and I am skeptical if Apple TV is truly capable to do 720p AVC.
    Bottom line is the same as 9 months ago – until version 2.0 Apple TV is useless and knowing that Apple has no resources to do anything but iPhone, I am skeptical Apple TV 2.0 is coming any time soon.
  • 1.6GHz Dothan, 32MB GPU on 400MHz FSB is fine for 720p AVC, but is not capable of 1080p. It is hosed decoding ATSC HDTV and Microsoft Media Center (as in MCE2005 or Vista) is not even tuning to the channel.
  • 1.6GHz Yonah, 32MB? shared GPU on 667MHz FSB is OK for 1080p despite that Direct Show is unable to multi-thread decoding. Cannot really tell if frames are dropped at 1080p, but to be safe let’s say more then 1.6GHz Yonah is needed for 1080p. There is also no problem watching 1080i ATSC, however this needs a bit more testing.
Overall, HD is not here yet on decoding side either, so XBOX still is the best option. Apple TV is not worth the money because it is marginally better (from functionality perspective) then good old XBOX, but might be interesting from writing missing plug-ins perspective. From all hardware available today it is still MacMini that makes sense as a future Media Center, but with no updates in a year (still being Yonah when Intel is almost done with next product life-cycle) and at the same price MacMini is even more expensive at $600 and might not do everything that is expected from it. May be with the release of Leopard they would update MacMini, but rumors are that they rather drop it. Then at least may be they would drop the price on the old and bundle it with FrontRow 2.0 so that it wouldn’t have to be copied from Apple TV… but then again it would be not only $500+ for a MacMini, but more $$$ for a flat panel, and it is not clear if and how MacMini output Dolby Pro Logic or 5.1 in general… HD is not here yet, but close…

Tuesday, June 12, 2007

ATSC HD – The end of the story

Boosting D 805 performance 25+% yielded about 15% increase in speed transcoding SD tape captures to 640x480@29.97 around 750Kbps AVC. The following table summarize different encodes on different computes (italic is for expected times)
D 805 (2x3.4GHz)T2300(2x1.6GHz)D 805(2x2.6GHz)M 725(1.6GHz)M 360(1.6GHz)Dual Xeon(4x3.4GHz)
VTR(640x480@25)600kbps35+/2139?/19.533/1818/9
VTR(640x480@29.97) 750kbps35+/2139?/19.533/1817/945/30.5
mencoder 25fps 750kbps21.7(26)
mencoder 29.97fps 750kbps28.48
DVD (interleave) 600kbps40/2035/1818/917/9
480pSD 500kbps23/15
720pSD 800kbps17/1019/1113/813.5/13
720pSD 1Mbps10/719/118/5.5
mencoder 720pSD 1Mbps97
720pHD 1Mbps11.5/711.0/78.5/5.820/14
mencoder 720pHD 1.2Mbps7.4 of 25 (9of30)4.5
1080pHD 1.8Mbps7.3/47.2/46/???
mencoder 1080pHD 2Mbps4 of 25 (5of29.97)4.5
It pretty much boils down to the following
On performance
  1. 480p SD is real-time on current generation of processors (Core Duo, Core 2 Duo) regardless where it comes from. By real-time I mean either one-pass or 2-nd pass. Two pass encodes would go faster then 2xRT on faster processors, but the second pass would still be around RT. One pass encode although being slower because it needs more bandwidth (750 kbps vs. 600 kbps on good sources or 800+ vs. 750 on noisy sources) would still be faster then 2-pass with filtering and would create same result. 480p SD is real-time not because the processors got that much faster, but rather because there are now two of the same processors as before, i.e. for most part improvement in processor efficiencies got eaten up by scheduling between 2 threads.
  2. 720p SD is 2+xRT at 800kbps, but it needs 1000kbps on one-pass, thus it is at least 2.5xRT. Since the first pass takes the same 2xRT it would take 4.5xRT to do two-pass encode and thus it makes no sense to do 2-pass encode 720p SD on current generation of processors. It is questionable if it makes sense to encode 720p SD period – 480p would work fine for SD and could be done twice as fast.
  3. 720p HD is 3.5-5xRT. 2nd-pass could be done in 3.5xRT at around 1Mbps. One-pass would need 1.25Mbps and thus could go as slow as 5xRT. Quad Core 2 supposedly more then 2-times faster (2-times for 4 cores, “more” for Core 2) thus 720p HD would take 2-3xRT, i.e. one-pass should clock 10+ fps – still more then two times real-time – even 4 cores cranking the best they could would make it close to what Pentium M 360 (as in Toshiba) could do to DVD since 2005.
  4. 1080p HD is 5+RT - is not happening until $350 computer would have 10 Dells D600 inside. D 805 and Yonah sometimes have difficulties playing 1080p since WMP 11 cannot thread across cores.
So as far as performance is concerned when it comes to ATSC until every shows on the air is not in HD, it is 640x480@25fps single pass at around 700kbps – no need for wide screen TV, xbox should work, D 805 should do it in 1.33xRT i.e. 22.5 fps for 29.97 stations, 45 fps for 59.94 station, 18 fps/25 fps for x264. Movies are still the best from DVD and if need be could be upconverted to 720p.
On encoding
MainConcept MP2 decoder (ad2mcdsmpeg.ax) that comes pretty much with every software these days is the best decoder because it could deinterlace and output in YV12
  • 480p from clean sources (DVD, ATSC) could be encoded around 600kbps at 25 fps in two-passes at 2xRT and 700kbps in single pass around RT (1.33RT for ATSC). Bellow 600kbps it is pushing it, but for less then 480 height is possible (like I did it with China @500kbps for 360 and better yet 368 height).
  • 480p from bad sources (tapes) better be left at original 29.97 frame rate and encoded at 750kbps in 2-passes at less then 2xRT. Deinterlacing should be done by Mainconcept decoder, so the only filter is DeGrainMedian(limitY=5,limitUV=5,mode=3) and Undot. Nothing could be done to bad sources – junk in – junk out.
  • 720p SD is not worth it, but since I played with it… Two-pass encodes need 700-800kbps at 25 fps. One pass needs 800+ (really around 1Mbps). Goes around 2.75xRT. Doesn’t work on XBOX.
  • 720p HD – needs 1.25Mbps for 2-pass at 25 fps. For single-pass 1.25Mbps on 25 fps could be OK, but 1Mbps is definitely not enough.
  • 1080p HD – would need 1.75-2Mbps but I couldn’t really tell since I cannot even play it.
This is as far as 32-bit architecture would go – no need for 24” widescreen monitor, no need for HD-DVD. What needed is 64-bit software that may be would speed things up 2-times. For now the following is left to be tested on 32-bit
  • SD show as one-pass at 480p on emachine at 700kbps for 25 fps
  • Tape as one-pass at 480p on emachine at 29.97 at 750kbps.
  • Upconvert widescreen DVD to 720p and 2-pass at 1Mbps. Time and log PSNR and SSIM
  • 1080p on Lenovo (and upclocked emachine) – would it play? Cannot play on old emachine.
  • DRVMSToolbox to automatically transcode
Oh and BTW, 32-bit Vista started to rank computers. This is what I’ve got
CPURAMAeroGPUDisk
Lenovo4.64.53.33.04.1
eMachine4.63.92.23.05.0
D805+GMA950@3.3GHz4.94.53.03.05.2
D6003.44.01.91.03.7
P.S. Did a bit more testing... more research actually... and cannot compile mencoder on OSX and they stopped pre-building in 2006, so on OSX there is no support for neither dvr-ms, nor recent x264 options... Pretty much a dead end trying to test 2.33GHz Core Duo2 I've got in MacBook. Parallels loads CPU just about 70% and at that transcodes 720pSD 1Mbps at 9of30 or 7.5fps or the same as overclocked D 805. So theoretically 2.33GHz Core Duo2 (4GB L2 on 667MHz) could be 30% faster, but practically there is no software for OSX and even 30% faster would still be worse then 2xRT single pass for 720pSD and 720pHD and for 1080pHD it would be still around 5-6xRT. So as expected Core Duo2 will not make any difference as far as HD goes. Would 64-bit make any difference?

Since on MacBook Parallels will not fully load CPU the only way to find out is to either install Vista x64 or Ubuntu on D 805. There is no overclocking software for Linux, so at the end it would be Vista x64 for which I would have to build mencoder myself and that would take too much time. So until somebody would build 64-bit version of mencoder I am done playing with HD and yeah, there is no way I am buying overpriced Apple hardware 64-bit Leopard or not. Only Quad would make me happy and that means L775 socket.

P.P.S. Automating DRVMSToolbox would imply writing a custom action in C# to call different mencoder profiles (different crop options) which is not a big deal, but would take time... Since most of the programming is crap it ain't worth my time either. Get back to this only once there is a 64-bit mencoder for Windows and do it by writing a shell script to run overnight thru Scheduled Tasks. There are 64-bit versions of x264 BTW to test that DVD upconverts.

Overclocking Pentium D 805 или как я стал "злым хакером"

HD takes copious amount of time to transcode. Thus my first urge was to get a new more powerful computer, but mine are less then a year old. So I started looking what will be coming and realized that Yonah (Core Duo) was the shortest product lifecycle in Intel history – less then a year. Another thing I realized is that crappy emachines box that I got for Christmass for $350 is actually pretty cool box for $350 that is. There are two things about it
  • Pentium D 805 processor (2x2.6GHz, 2x1MB L2 on 533MHz FSB). I did go for it because of two cores, but I never realized it is instead two Pentium 4 Prescott processors glued together and are highly capable of overclocking. With 20 multiplier simply sticking D 805 into 800Mhz motherboard would increase the clock from 2.6GHz to 4GHz on both Pentium 4 inside (yeah, it would need to be cooled to run that fast). Did I mention EM64T? On top of it all it is a 64 bit processor.

    I wanted to get D 830 instead (2x3.0GHz, 2x1MB L2 on 800MHz FSB) but that would have only 15 multiple yet would come with 800MHz FSB motherboard. They wanted $400 for it and were sold out, so it didn’t work out and looks like for better.

  • Intel D1102GGC2 motherboard with ICS 951417 PLL that ClockGen and such almost understand (should use ICS 9541xx like ICS 954119 to set ICS registers). With build in ATI graphics (and thus ATI chipset and ATI was bought by AMD) I was thinking that the motherboard would come from some cheap sweatshop somewhere in China especially considering that it was inside $350 computer, but it did come from Intel. By no means it is the best motherboard around, but Intel has support (new BIOSes and such like a toolkit to “cook” your own BIOS) and this motherboard is well capable of running at 800MHz.
What this all mean is that $350 junk computer is well capable to beat anything that was available last Christmas for over $1000 including Pentium D 960 ($549 3.6GHz 2x2MB L2 on 800MHz FSB) and T2700 Yohan ($637 2.3GHz 2MB L2 on 667MHz FSB) and would be on par with the best Core 2 Duo in 667/800 MHz FSB like T7700 ($530 2.4GHz 4MB L2 on 800MHz FSB Socket P) released in May 2007 or T7600 ($637 2.3GHz 4 MB L2 on 667MHz FSB Socket M). LGA 775 Core 2 Duos all running on 1066MHz FSB would probably be a bit faster. On 800 MHz FSB the only Core 2 Duo available and released in May 2007 is $133 E4400 (2.0 GH 2MB L2 on 800MHz FSB) that D 805 would beat without any problem running faster then 2.8 GHz despite 40% Core 2 Duo improvement compared to Pentium D.

What this all also mean is that it should be relatively easy to overclock that junk $350 computer. And it was indeed. Popping the hood open (yeah it did involve the use of a head-mounted flashlight) I learned that PLL is ICS 951417. Talking to emachines clueless support and googling I got to know that the box has Intel D1102GGC2 motherboard and while hanging up the phone I was already downloading genuine Intel BIOS that let me set RAM speed to 667MHz out of the box. Changing FSB speed would require “cooking” new BIOS using Intel Toolkit, but I just downloaded ClockGen and randomly selecting ICS 954119 (instead of ICS 951417 that I have and they didn’t list) I set FSB speed to 667MHz and computer didn’t blink. D 805 felt very relieved to run at 3350 MHz with all clocks matching up at 667MHz. 25% increase in performance came in no time.

We are now running at the top of Core Duo range and at the bottom of Core 2 Duo. Let’s go faster. 800MHz FSB (4GH processor clock) crashed the computer right away. After about an hour of playing with memory speeds etc I learned that anything above 170MHz (680Mz FSB) would make system unstable, so I left it 2*3.4GHz, 680MHz FSB and 667MHz memory or about what I would get from most Core 2 Duo systems apart from $533 E6700 (2*2.67GHz) and Core 2 Quad Q6600 (4*2.4GHz) that would go "on sale" at the end of July 2007 at $266 vs. $530 currently (reduced in April, 2007 from $851)

Вот как я стал “злым хакером“. HD or not, I doubt I would be buying another desktop in the next 5 years again because really there is no point, except if it is Q6600 on 1066MHz FSB. $350 could sure buy a lot of fun.

Wednesday, June 06, 2007

HD or no HD

Going from interlaced SD sources (DVD or MPEG2 tapes) to interlaced HD sources (ATSC) is going from 39/19.5 (Yonah) or 33/18 (D 805) to 12/9 fps.

Yonah encode SD at 1.25 RT (2nd pass or single pass) and overall 2-pass SD source transcode on Yonah is 0.6xRT+1.25xRT~ 2xRT. D805 is marginally slower (more on the 1st pass due to slow FSB?). Cropping and scaling HD sources to 640x480 would make 1st pass 2xRT and 2nd pass 2.77xRT and thus 4.77RT overall, or 2.2 times slower on (2nd or single pass) and 2.4 times slower overall on 2-pass encodes. There is just too much data in 1080i@30 and 720i@60 regarless of that at the end I end up encoding 640x480.

Overclocking D 805 should increase performance 154%, thus on my 25fps converts of SD sources 2nd pass should be 27.72 or faster then RT and about RT without changes to FPS. Similar performance should be expected from Core 2 Duo. This wouldn’t help ATSC transcodes scaled to 640x480 for on either - 2nd pass would be about 13.5 fps or ~2xRT and overall 2pass encode would take ~3.18xRT. Core 2 Duo would make 1 pass encode of ATSC wrapped SD scaled to 640x480 about as fast as 2 pass SD sources. Only Quad Core 2 would get us in the same ballpark with ATSC wrapped SD at 640x480 as Yohan with SD sources.

The above is based on 12/9 fps for ATSC SD which I need to verify once again.

ATSC wrapped HD would be even worse. Scaling to 640x368 would get us into the same 12/9 fps (4.77xRT) ballpark. Going to true 720p HD (1280x720) would double that to 4.5 fps on the 2nd pass, so even on 1 pass encodes we are over 5xRT that Quad Core 2 would probably make 1-pass 720p encode 2xRT and well built system should do 2-pass in 3xRT.

1080p would take even longer. How much longer would be interesting to find out. Thus the tests are:

  1. 2-pass 640x480 ATSC on a) D 805 to prove 12/9 fps; b) Lenovo to see if it better – Those tests should take 4.77xRT, thus 1.5 hrs on shows. Go for 500kbps with those.
  2. 2-pass 1000kbps 720p HD test to see how slow is the 1st pass. Looking at 9/5 fps or ~ 8xRT or 6 hours for 45 minute show.
  3. Overclocking D 805 to see how far it would be from 13.5 fps on the 2nd pass/1-pass transcodes of ATSC wrapped SD. This test could be done on either mencoder (1-pass) with adjustment for frame-rates and would take 2xRT or 40 minutes; or with x264 to test if 1-pass would get better, but would take 3xRT or 1 hour.
  4. Vista x64 test.
  5. If I don’t have it already do 1080p HD 2-pass transcode
Thus I need 3 SD shows and 2 HD show and the fastest way is to cut commercials into drv-ms and use cut drv-ms. Lenovo would need about 3-5GB to do the test. For the sake of argument, put T2300 (478) into D600 and see if it would POST and start cooking that BIOS.

Overall, 5-8xRT 720p HD would need yet another generation of processors and definitely would need Quad to transcode 45 minutes shows in 1-2 hours vs. 6 hours it takes today.

Followers