Dibuat oleh Olli Niemitalo pada tahun 2003-01-21, terakhir dimodifikasi 2012-08-04 Pada tahun 1998, saya memiliki beberapa waktu tambahan sementara yang lainnya membaca untuk ujian akhir di sekolah menengah atas, dan masuk ke pemrosesan sinyal digital. Saya menulis seperti yang saya pelajari, dan inilah hasilnya. Ini tidak sepenuhnya akurat di tempat tapi bisa menjadi tutorial bagus ke dunia audio DSP. Sebelumnya dokumen ini disebut tutorial pengolahan suara sonar untuk otak, tapi saya agak tumbuh dari identitas adegan saya selama bertahun-tahun. Nikmati seni ASCII Ini ditulis untuk penggemar pemrosesan sinyal digital audio (sesuai judulnya) dan yang lainnya membutuhkan informasi praktis mengenai subjek ini. Jika Anda tidak memilikinya sebagai pengalaman membaca linier dan menghadapi kesulitan, periksalah apakah ada sesuatu untuk membantu Anda dalam bab sebelumnya. Pada plot respons frekuensi filter, frekuensi linear dan skala magnitudo digunakan. Perubahan halaman dirancang untuk 60 printer linespage. Bab mengocok persamaan IIR ditulis oleh kakak ku Kalle. Dan, berkat Timo Tossavainen karena membagikan pengetahuan DSP-nya Copy dan gunakan teks ini dengan bebas. Oleh Olli Niemitalo, oiki. fi Perhatikan bahwa sampel dapat berarti (1) suara sampel atau (2) samplepoint Data suara sampel adalah setumpuk sampel, nilai amplitudo yang diambil dari gelombang suara sebenarnya. Tingkat pengambilan sampel adalah frekuensi tembakan. Misalnya, jika frekuensi 44100, 44100 sampel telah diambil dalam satu detik. Heres contoh sampling: Suara asli adalah kurva, dan 0s adalah titik sampel. Garis lurus horizontal adalah tingkat nol. Suara sampel hanya bisa mewakili frekuensi hingga setengah samplerate. Ini disebut frekuensi Nyquist. Bukti mudah: Anda harus menyimpan setidaknya dua samplepoints per siklus gelombang, bagian atas dan bawah gelombang untuk bisa merekonstruksinya nanti: Jika Anda mencoba memasukkan frekuensi Nyquist di atas suara sampel Anda, semua yang Anda dapatkan Adalah distorsi ekstra karena mereka muncul sebagai frekuensi yang lebih rendah. Suara terdiri dari komponen frekuensi. Mereka semua terlihat persis seperti gelombang sinus, tapi memiliki frekuensi, fase dan amplitudo yang berbeda. Mari kita lihat satu frekuensi: Sekarang, kita mengambil frekuensi yang sama dari suara lain dan perhatikan bahwa ia memiliki amplitudo yang sama, namun sebaliknya (diputar 180 derajat). Penggabungan dua sinyal dilakukan hanya dengan menambahkan keduanya. Jika kita melakukan hal yang sama dengan dua gelombang sinus ini, hasilnya akan menjadi: Diam. Jika kita memikirkan kasus lain, di mana perbedaan fasa kurang dari 180 derajat, kita mendapatkan gelombang sinus yang semuanya memiliki amplitudo dan fase yang berbeda, namun frekuensi yang sama. Heres cara untuk menghitung fase dan amplitudo sinewave yang dihasilkan. Mengkonversi amplitudo dan fasa menjadi satu bilangan kompleks, dimana sudut adalah fasa, dan nilai absolut amplitudo. Jika Anda melakukan ini pada kedua sinewave, Anda dapat menambahkannya sebagai bilangan kompleks. Seperti yang Anda lihat, fase gelombang sinus baru adalah 45 derajat dan amplitudo sqrt (1212) sqrt (2) sekitar 1.4 Sangat penting bahwa Anda memahami hal ini, karena dalam banyak kasus, lebih praktis untuk menyajikan amplitudo dan Fase frekuensi sebagai bilangan kompleks. Saat menambahkan dua suara sampel bersama-sama, Anda mungkin benar-benar menghapus beberapa frekuensi, yang memiliki fase berlawanan dan amplitudo yang sama. Amplitudo rata-rata suara yang dihasilkan adalah (untuk sumber asli independen) sqrt (a2b2) di mana a dan b adalah amplitudo sinyal asli. Penggunaan utama filter adalah untuk mengukur amplitudo komponen frekuensi dalam sebuah suara. Sebagai contoh, sebuah filter lowpass mematikan semua komponen frekuensi di atas frekuensi cutoff, dengan kata lain, mengalikan amplitudo dengan 0. Ini memungkinkan semua frekuensi di bawah frekuensi cutoff tidak diimbangi. Jika Anda menyelidiki perilaku filter lowpass dengan menggerakkan berbagai sinewave frekuensi yang berbeda melewatinya, dan mengukur amplifikasi, Anda akan mendapatkan respons frekuensi besarnya. Heres sebidang kurva respons frekuensi magnit dari filter lowpass: Frekuensi berada pada sumbu dan amplifikasi pada sumbu. Seperti yang Anda lihat, amplifikasi (penskalaan) frekuensi di bawah frekuensi cutoff adalah 1. Jadi, amplitudo mereka tidak terpengaruh dengan cara apa pun. Tapi amplitudo frekuensi di atas frekuensi cutoff bisa dikalikan nol sehingga lenyap. Filter tidak akan menambahkan komponen frekuensi baru ke suara. Mereka hanya bisa mengukur amplitudo frekuensi yang sudah ada. Misalnya, jika Anda memiliki sampel yang benar-benar sepi, Anda tidak bisa mengeluarkan suara darinya dengan menyaringnya. Juga, jika Anda memiliki sampel gelombang sinus dan menyaringnya, hasilnya akan tetap sama dengan gelombang sinus, hanya mungkin dengan amplitudo dan fasa yang berbeda - tidak ada frekuensi lain yang dapat muncul. Profesional tidak pernah bosan mengingatkan kita betapa pentingnya melupakan fase ini. Komponen frekuensi dalam suara memiliki amplitudo dan. Fase. Jika kita mengambil gelombang sinus dan gelombang kosinus, kita melihat keduanya mirip, namun memiliki perbedaan fasa pi2, seperempat dari siklus penuh. Juga, saat Anda memainkannya, mereka terdengar sama. Tapi, coba kenakan headset dan putar sinewave di saluran kiri dan gelombang kosinus pada saluran kanan. Sekarang Anda mendengar perbedaan Fase itu sendiri tidak mengandung informasi penting bagi kita sehingga tidak terdengar, namun perbedaan fasa, dari suatu frekuensi, antara kedua telinga dapat digunakan dalam memperkirakan posisi asal suara sehingga terdengar. Filter memiliki respons frekuensi yang besar, namun mereka juga memiliki respons frekuensi fase. Heres sebuah kurva contoh yang bisa dari filter lowpass: Jika Anda menyaring suara, nilai dari respons frekuensi fasa ditambahkan ke fase frekuensi suara asli. Linear (garis lurus) fase adalah hal yang sama seperti penundaan biasa, meskipun mungkin terlihat liar dalam plot jika terjadi beberapa kali. Jika Anda, misalnya, filter lowpass tidak memiliki respons frekuensi fase linier, Anda tidak dapat mengubahnya menjadi filter highpass dengan hanya mengurangkan outputnya dari yang asli dengan penundaan yang sama. Matematika kompleks dengan filter Respon filter untuk frekuensi tunggal dapat dinyatakan sebagai bilangan kompleks, di mana sudutnya adalah respons fase filter dan nilai mutlak respons besarnya. Bila Anda menerapkan filter ke suara, Anda benar-benar melakukan perkalian kompleks dari semua komponen frekuensi dalam suara berdasarkan nilai respons filter yang sesuai. (Baca bab Menambahkan dua sinewave bersama jika Anda merasa sulit untuk mengerti.) Contoh: Respon filter adalah (0,1) pada 1000Hz. Anda menyaring gelombang sinus, dengan informasi amplitudo fasa yang disajikan sebagai bilangan kompleks (0,1), dengan frekuensi yang sama dengannya: Fasa gelombang sinus diputar 90 derajat. Tidak ada perubahan amplitudo. Menggabungkan filter Respon gabungan kedua filter yang dimasukkan ke dalam serial ini adalah respons A yang dikalikan dengan respon B (bilangan kompleks seperti biasa). Jika Anda hanya perlu mengetahui respons besarnya, Anda juga bisa memperbanyak nilai absolut. Pada gambar tersebut, kedua filter mendapatkan masukan dari sumber yang sama. Keluaran mereka kemudian ditambahkan kembali, membentuk hasil akhir. Sekarang Anda perlu menggunakan tambahan dalam memecahkan respons gabungan. Filter FIR lebih lugas, dan lebih mudah dimengerti. Respons impuls yang terbatas berarti bahwa ketika input filter tetap nol untuk waktu tertentu, output filter juga menjadi nol. Filter respons impuls tak terbatas tidak pernah sepenuhnya mereda setelah mematikan masukan, namun lebih tenang dan lebih tenang sekalipun. Filter FIR dasar bisa jadi: di mana masukan berarti nilai sampel dimasukkan ke filter. Dalam kasus ini, orang akan berbicara tentang tap tap 3. Ini sampai ke koefisien (a0, a1, a2) apa filter ini akan lakukan terhadap suara. Memilih nilai koefisien adalah bagian yang paling sulit, dan kemudian bisa mendapatkannya nanti. Untuk merancang filter Anda sendiri, Anda perlu memahami beberapa matematika di belakang dan mengetahui metode yang benar. Pada contoh filter di atas, hanya nilai input terakhir yang digunakan. Dalam filter realtime, ini adalah persyaratan, karena Anda tidak tahu masukan masa depan. Dalam editor sampel dan semacamnya, Anda tidak memiliki batasan ini, karena Anda memiliki keseluruhan data masukan yang siap saat Anda mulai. Jika filter Anda adalah: dan Anda memerlukan versi realtime, cukup konversikan ke: Satu-satunya perbedaan adalah satu sampel penundaan pada filter realtime. Tidak seperti filter FIR, filter IIR juga menggunakan nilai keluaran sebelumnya dalam menciptakan keluaran mereka sekarang. Heres contoh sederhana: Ini bisa disebut 3 input, 3 output tap filter. Filter IIR tidak pernah bisa menggunakan nilai keluaran di masa depan, karena memang belum ada ada beberapa cara untuk menerapkan filter IIR yang sama. Beberapa mungkin lebih cepat daripada cara input-output-and-coefficient yang biasa. Bagaimanapun, setiap filter IIR dapat ditulis dalam bentuk ini, dan harus digunakan dalam desain filter dan perhitungan perhitungan. Respons impuls (Apa filter yang akan dilakukan terhadap impuls satu samplepoint) dari filter IIR sering terlihat kurang lebih seperti ini di sampledata: Beberapa filter IIR yang dirancang dengan buruk tidak stabil. Hal ini menyebabkan ouput menjadi lebih keras dan lebih keras daripada lebih tenang dan lebih tenang. Contoh sederhana dari hal ini adalah: output (t) input (t) 2output (t-1). Begitu mendapat data masukan, itu menjadi gila. Jenis filter yang dijelaskan di atas memproses sampel data dengan sampel. Tidak demikian, jika Anda menerapkan filter Anda menggunakan FFT, Fast Fourier Transformation. FFT biasanya beroperasi dengan potongan panjang 2n. Pertama, Anda harus menyiapkan respons impuls filter yang direncanakan. Kemudian konversikan, gunakan FFT, ke informasi spektral - bilangan kompleks yang mewakili fase dan amplitudo komponen frekuensi. Komponen ini disebut tempat sampah, karena frekuensi mereka tetap dan didistribusikan secara merata, dan jika data asli mengandung frekuensi di antara keduanya, maka sebagian besar energi frekuensi tersebut akan didistribusikan di antara tempat sampah di dekatnya. Sekarang, Anda juga FFT sampel data yang ingin Anda filter, dan kalikan frekuensi yang dihasilkan dengan filter dari filter. Kemudian IFFT (Inverse FFT) digunakan untuk mengubah informasi menjadi beberapa data sampel yang disaring. Jadi, perkalian dua data domain frekuensi menghasilkan konvolusi dari dua domain data waktu. Namun, ada tangkapan: FFT beroperasi dengan sinyal periodik, yaitu jika Anda memiliki respons impuls filter sepanjang potongan FFT maka data sampel non-nol di tengah potongan FFT akan menghasilkan pembengkokan buntut ekor. Dari filter di sekitar batas FFT. Untuk menghindari masalah ini, Anda dapat menggunakan FFT dua kali selama respons impuls filter, dan saat melakukan FFT pada data sampel, cukup mengisi buffer masukan FFT sampai setengah jalan dan tetapkan sisa masukan ke nol. Untuk input yang lebih lama, Anda akan memproses data dalam potongan seperti itu dan kemudian menambahkan potongan yang disaring yang dihasilkan bersamaan. Ini disebut metode overlap-add. Pilihan lainnya adalah overlap-save (lihat itu kalau mau). FFT juga dapat digunakan untuk menganalisis isi frekuensi data sampel, untuk alasan apapun. Jika Anda hanya mengambil sepotong data sampel, memiliki tepi tajam, yang buruk untuk FFT. Fungsi windowing digunakan untuk menghaluskan tepi ini. Dibesarkan kosinus, cos (x pi2) 2, adalah salah satu kemungkinan fungsi windowing. Di sini Anda melihat apa yang terjadi ketika Anda menerapkan fungsi windowing itu ke sejumlah data sampel: Terkadang (resampling, delay yang didefinisikan secara pasti) Anda perlu mendapatkan samplevalues dari antara samplepoint yang diketahui. Itu saat Anda membutuhkan interpolasi. Jika Anda tidak melakukan interpolasi, dan buang saja bagian pecahan sampleoffset Anda, Anda mendapatkan banyak distorsi frekuensi tinggi: Misalnya, samplepoints asli mencoba merepresentasikan gelombang sinus. Semakin dekat kurva interpolasi ke gelombang sinus, semakin baik algoritma interpolasi. Metode interpolasi simpliest adalah interpolasi linier. Garis lurus ditarik di antara dua samploint yang berdekatan: Masih terlihat cukup edgy untuk menjadi gelombang sinus. Namun, peningkatan yang tidak terpojok cukup signifikan. Ada juga kelemahan - frekuensi di bawah frekuensi Nyquist dilemahkan, bahkan lebih dari sekedar tanpa interpolasi. Heres rumus untuk interpolasi linier: baru tua (int) (lama (int1) - old (int)) fraksi, di mana int berarti bagian integer sampel offset dan fraksi bagian pecahan. Langkah selanjutnya adalah kurva Hermite, yang memberikan kualitas yang jauh lebih baik daripada interpolasi linier: Dengan interpolasi linier, Anda perlu mengetahui 2 samplepoint pada waktu untuk dapat menarik garis. Dengan kurva Hermite, jumlahnya adalah 4. Kurva interpolasi melewati dua titik tengah, dan titik 1 dan 4 digunakan untuk membentuk kurva. Rumusnya adalah kubik: Dan yang ini di sini adalah tempat a, b, c, d dipecahkan: Interpolasi yang sempurna juga ada. Dengan mengganti semua titik sampel dengan kurva sinc yang benar, sin (pi x) (pi x), dan dengan menambahkannya, Anda mendapatkan interpolasi yang tepat dan sempurna. Inilah salah satu samplepoint yang diganti dengan kurva sinc skala: Sinc curve tidak lama lagi, jadi Anda harus menggunakan semua samplepoints untuk menghitung satu nilai interpolasi. Solusi praktis adalah membatasi jumlah sampel untuk mengatakan 1000. Masih akan terlalu lambat untuk aplikasi realtime, namun akan memberikan ketepatan yang tinggi. Jika Anda bersikeras untuk menggunakan sinc dalam algoritma interpolasi realtime, cobalah menggunakan fungsi windowing dan sejumlah rendah (setidaknya 6) kurva sinc. Downsampling Jika Anda ingin melakukan downsample (mengurangi samplerate), Anda harus terlebih dahulu menyaring frekuensi Nyquist di atas, atau akan muncul sebagai distorsi pada sample downsampled. Dalam proses disain filter, Anda sering perlu melakukan kompromi. Untuk memiliki tepi tajam atau lereng curam dalam respons besarnya, Anda memerlukan filter yang besar, dan karena itu lambat. Dengan kata lain, filter dengan jumlah keran rendah praktis selalu memiliki respons magnitudo yang miring. Dalam kasus filter IIR, tepi tajam besarnya sering kali merupakan respons frekuensi fase yang jelek (sangat nonlinier), dan respons fase dekat-ke-linear merupakan respons magnitudo yang kecil. Dengan filter FIR, upaya untuk menciptakan tepi yang sangat tajam dapat menyebabkan melambai dalam besaran frekuensi terdekat. Filter IIR sangat bagus untuk rutinitas realtime, karena cepat, propertinya (misalnya frekuensi cutoff) dapat dengan cepat berubah di tengah tindakan, dan, suaranya terdengar seperti filter analog sebenarnya. ) Respon fase nonlinier dari filter IIR biasanya tidak penting. Filter FIR dapat digunakan di mana kualitas dan fase linier penting, misalnya, dalam contoh editor. Orang yang menyaring sinyal lain daripada suara, sering kali menginginkan respons frekuensi fase linier. Dengan sinyal stereo, penting untuk memiliki perubahan fasa yang identik pada saluran kiri dan kanan. Beberapa filter dan respons frekuensi magnitudo bergaya mereka: Jika Anda memiliki program perhitungan simbolis, saya sangat menyarankan Anda untuk menggunakannya dalam perhitungan mekanis, hanya untuk membuat hidup Anda lebih mudah. Turun adalah program DOS lama, namun tetap sangat berguna. Kebisingan putih Suara putih berarti jenis suara yang memiliki spektrum datar. Anda dapat dengan mudah membuatnya dengan menggunakan nomor acak sebagai samplevalues. Jika Anda ingin mengetahui respons frekuensi frekuensi dari sebuah filter, terapkan pada sampel panjang white noise dan kemudian jalankan analisis spektrum pada keluaran. Apa yang Anda lihat adalah besarnya respons frekuensi filter. Cara lain adalah dengan mengirim satu sampel impuls, yang awalnya memiliki spektrum datar. Sebuah impuls terlihat seperti ini di sampledata: 0, 0, 0, 0, 1, 0, 0, 0, 0 - dimana impulsnya adalah 1 di tengahnya. Dari keduanya, impuls itu lebih cepat, tapi menggunakan white noise bisa memberi hasil lebih bersih, karena kesalahan akan kurang terlihat. Untuk alasan yang sama, saat Anda menonton video, gambar diam akan terlihat lebih bersalju daripada gambar yang sedang berjalan. Mengambil analisis spektrum pada sampel yang panjang biasanya dilakukan dengan membaginya ke potongan yang lebih kecil, menganalisisnya secara terpisah dan kemudian mengambil rata-rata semua analisis. Pilihan pribadi saya disini akan menjadi program Cool Edit 96, yaitu untuk Windows. Metode kutub-nol adalah cara termudah merancang filter IIR cepat dan sederhana. Bila Anda telah mempelajarinya, Anda akan bisa merancang filter sendiri. Heres the complex Z-plane, yang digunakan dalam metode pole-zero: Bayangkan frekuensi yang akan dililitkan mengelilingi lingkaran unit. Pada sudut 0 kita memiliki 0Hz, pada pi2 kita memiliki samplerate4, pada pi kita memiliki samplerate2, frekuensi Nyquist. Anda tidak peduli dengan frekuensi yang lebih tinggi, karena tidak akan pernah muncul dalam sinyal, tapi bagaimanapun, pada 2pi (siklus penuh) kita memiliki frekuensi sampling. Jadi jika Anda menggunakan frekuensi sampling 44100 Hz, 0 Hz akan berada pada (1,0), 11025 Hz pada (0,1) dan 22050 Hz pada (-1,0). Apa itu kutub dan nol maka itu adalah hal-hal kecil yang lucu yang bisa Anda tempatkan di pesawat Z, seperti ini: Ada beberapa peraturan yang harus Anda ingat. Kutub harus selalu berada di dalam lingkaran unit, tidak pernah di luar atau di atasnya. Zeros bisa diletakkan dimana saja. Anda bisa menggunakan sejumlah kutub dan nol, tapi mereka semua memiliki pasangan konjugasi, jika tidak diposisikan di sumbu. Pasangan konjugasi berarti bahwa jika Anda memasukkan misalnya nol ke (0,6, 0,3), Anda harus memasukkan nol lain ke koordinat konjugasi, (0,6, -0,3). Dan hal yang sama dengan kutub. Tapi hei Apa yang kutub dan nol DO Polandia memperkuat frekuensi, nol menipis. Semakin dekat kutub adalah dengan frekuensi (pada lingkaran satuan, ingat), semakin diperkuat. Semakin dekat angka nol adalah frekuensi, semakin banyak yang dilemahkan. Sebuah nol pada lingkaran unit benar-benar membungkam frekuensi yang didudukinya. Sekarang bisa jadi saat yang tepat untuk mencobanya sendiri. Ada program disain filter gratis yang memungkinkan Anda bermain dengan kutub dan nol. Salah satu kandidatnya adalah: QEDesign 1000 demo for Windows. Di suatu tempat di Internet, Anda akan menemukannya. Merancang filter bandpass Filter simpliest yang dirancang menggunakan pole-zero adalah filter bandpass berikut: Poles memperkuat frekuensi, sehingga Anda bisa menarik kesimpulan bahwa frekuensi yang paling diperkuat adalah yang pada sudut yang sama dengan tiang. Dan Anda hampir benar Satu-satunya masalah berasal dari kutub konjugasi, yang juga memberi amplifikasi sendiri. Efeknya paling kuat pada sudut dekat dengan 0 dan pi, di mana jarak antara dua kutub adalah yang terkecil. Tapi jangan biarkan ini membingungkan Anda, segera kembali lagi nanti. Jadi sudut kutub menentukan frekuensi passband. Apa efek dari nilai absolut (radius) maka Seperti yang dinyatakan, kutub memperkuat frekuensi, dan amplifikasi lebih kuat saat tiang mendekati frekuensi. Pada filter bandpass kami, meningkatkan radius tiang menyebabkan respons besarnya menjadi lebih curam dan passband lebih sempit, seperti yang Anda lihat di sini: Posisi tiang: Plot respons frekuensi yang sesuai (dinormalisasi): Mari panggil radius r mulai sekarang. (Beberapa dari Anda mungkin ingat huruf q dari analog, filter resonan. Ini sama saja.) Dalam hal ini kita memiliki batasan: 0 r lt 1, karena kutub harus berada di dalam lingkaran satuan. Jadi perubahan r perubahan kecuraman, resonansi. Resonansi ini - bukan sihir, hanya satu frekuensi yang diperkuat lebih banyak daripada yang lain. Dari kutub dan nol untuk menyaring koefisien Ada fungsi transfer: di mana z adalah frekuensi, dalam bentuk koordinat lingkaran yang melingkar-sekitar-satuan-lingkaran. H (z) memberikan respon (kompleks) filter pada frekuensi z. P1, p2, p3 dan seterusnya adalah posisi kutub dan z1, z2, z3 dan seterusnya pada posisi nol. A0 adalah koefisien masukan pertama dari filter. Heres the IIR filter formula lagi, jika Anda lupa: Filter bandpass kami hanya memiliki satu tiang, dan pasangan konjugasinya, jadi kami dapat menyederhanakan fungsi transfer: dan ganti p1 dan p2 dengan koordinat kutub konjugasi: Mari memberi Pembagi melihat lebih dekat. Katakanlah: Kekuasaan z di sini sebenarnya adalah indeks pada output filter: Jadi kita tahu bagaimana menghitung koefisien sisi output dari posisi tiang: OK Katakanlah frekuensi passband berada pada bidang Z pada posisi ph: The Tiang berada pada sudut yang sama dengan frekuensi pada lingkaran satuan, namun memiliki jari-jari r. Oleh karena itu: Sekarang kita tahu bagaimana posisi tiang tergantung pada frekuensi, kita dapat menulis ulang koefisien sisi output: Tapi kita tidak boleh melupakan dividen (dari fungsi transfer), di mana kekuatan z adalah indeks ke input filter : Ini harus ditambahkan ke apa yang telah kita selesaikan dari sisi output: Selanjutnya kita harus memutuskan apa yang harus dimasukkan ke a0. Ini disebut normalisasi. Tujuan a0 adalah hanya untuk skala output dari filter. Dalam filter bandpass kami, kami menginginkan amplifikasi pada frekuensi passband menjadi 1. Jadi, kita dapat menulis persamaannya: Di sana sekarang siap saringan: Memperbaiki filter bandpass sederhana Kami dapat mengkompensasi efek tiang konjugasi dengan menambahkan Nol ke sumbu, di antara kutub. Misalnya, jika kita memiliki tiang pada koordinat (0,6, 0,5) dan (0,6, -0,5), nah beri angka nol pada (0,6, 0): Fungsi transfer untuk ini adalah: Koefisien sisi output sama persis dengan sebelumnya. . Koefisien sisi input dapat dipecahkan seperti ini: Jika Anda ingin menggunakan filter ini, Anda harus bisa melakukan normalisasi sendiri. Aku tidak akan melakukannya di sini. Kata-kata bijak Mudah membuat saringan lebih efisien: Ganda semua kutub dan nol. Respons frekuensi filter baru adalah kuadrat dari yang lama. Ada cara yang lebih baik, tapi ini yang paling mudah. Jika Anda menempatkan nol pada sebuah tiang, Anda menetralkan keduanya. Sebuah tiang di luar lingkaran unit menyebabkan filter menjadi tidak stabil. Sebuah tiang pada lingkaran unit dapat mengubah saringan menjadi osilator. Sejumlah besar kutub dan nol berarti sejumlah besar keran. Zeros mempengaruhi koefisien input, output poles. Kutub dan nol harus memiliki pasangan konjugasi, karena jika tidak, Anda akan mendapatkan koefisien filter yang kompleks dan, akibatnya, sinyal keluaran kompleks. Dengan nilai r rendah, frekuensi yang paling diperkuat tidak selalu pada sudut yang sama dengan tiang, karena efek tiang konjugasi. Cobalah untuk membedakan respons besarnya jika Anda menginginkan presisi yang tepat. Sebuah filter IIR tanpa tiang adalah filter FIR. 1 selalu berlaku. Bandpass dengan r Baca bab Desain filter IIR menggunakan metode pole-zero. Takik dengan r Semakin tinggi r, semakin sempit stopband. Lowpass dengan r Hal ini dapat dilakukan dengan beberapa cara: Semakin tinggi r, semakin kuat resonasinya. Filter lowpass resonan adalah tipe filter yang paling banyak digunakan dalam synthesizer. Allpass dengan r Highpass dengan r Impulse, sinc Jika Anda membaca tentang interpolasi sinc di bab Interpolasi suara sampel, Anda tahu bahwa Anda dapat mengganti satu puncak sampel (impuls) di sampledata dengan fungsi sinc yang diregangkan dengan benar. Benar diregangkan berarti amplitudesinc (t). Bila Anda menjalankan analisis spektrum pada dorongan hati, Anda mendapatkan spektrum datar dengan batas atas pada samplerate2, frekuensi Nyquist. Karena dorongan tulus, ini juga spektrum sinc: Anda bisa menarik kesimpulan bahwa Anda mendapatkan fungsi sinc jika Anda mengumpulkan semua frekuensi dari 0 sampai SR2, dan membagi jumlah dengan jumlah frekuensi, untuk memenuhi persamaan sinc (0) 1. Dan kamu benar. Dari analisis spektrum, Anda tahu bahwa semua frekuensi yang dijumlahkan sama memiliki amplitudo yang sama. Tapi bagaimana fasa mereka di pusat impuls Fungsi Sinc simetris sekitar x0, begitu juga kosinus - jadi sinc harus terbuat dari kosinus. Jika Anda menguji ini dengan sekitar 100 kosinus, Anda mendapatkan perkiraan yang cukup dekat dengan sinc. Jumlah dari semua frekuensi dari 0 sampai 1 (sebanding dengan SR2), dibagi dengan nomor mereka, dapat ditulis sebagai: (Ini berarti tidak terbatas) Seperti yang telah dilakukan di atas, x harus diganti dengan pi t, karena panjang siklus dosa adalah 2 pi, yang harus diregangkan ke 2 (yang merupakan panjang gelombang frekuensi Nyquist di sampledata). Fase shift Bagaimana jika kita mengganti kosinus dengan sine Mari kita mencobanya Theres formula universal (yang, btw, saya menemukan sendiri) yang bisa kita gunakan: Sekarang, jika kita mengganti semua dorongan dalam suara dengan fungsi baru ini, kita benar-benar melakukan sebuah -90 derajat pergeseran fasa Hal ini dapat dilakukan dengan membuat filter FIR, di mana koefisien diambil dari fungsi baru ini: (1-cos (pi t)) (pi t), namun dalam urutan terbalik, dengan mengganti t dengan - t , Jadi menjadi: (cos (pi t) -1) (pi t). Heres sebuah contoh yang menjelaskan mengapa hal itu perlu digunakan - bukan t: Katakanlah Anda ingin mengganti semua impuls dalam sinyal dengan urutan 1,2,3. Jika sinyal input adalah 0,1,0, akal sehat mengatakan seharusnya menjadi 1,2,3. Jika Anda hanya menggunakan 1,2,3 sebagai koefisien filter dalam urutan itu, sinyal yang disaring menjadi: Bukan yang Anda minta Tapi jika Anda menggunakan koefisien 3,2,1, Anda mendapatkan hasil yang benar, Ok, kembali ke -90 derajat pergeseran fasa filter. Bila Anda memilih koefisien filter dari (cos (pi t) -1) (pi t), pada t0 Anda dengan tidak benar mendapatkan pembagian nol. Hindari hal ini dengan menghitung batas t-gt0, di atas kertas, atau dengan proggy matematika. Jika Anda menggunakan otak Anda sedikit, Anda akan melihat itu adalah 0, karena rumus filter adalah jumlah sinus, dan sin (0) 0, jadi pada t0 itu adalah jumlah angka nol. Seperti sinc, fungsi baru kita tidak memiliki akhir, jadi kompromi harus dilakukan dalam jumlah keran. Hal ini menyebabkan gelombang dalam respon besarnya, dan atenuasi frekuensi yang paling rendah dan paling tinggi. Dengan menerapkan fungsi windowing ke koefisien, Anda bisa menyingkirkan ombak, tapi saya tidak tahu apapun yang bisa membantu redaman, kecuali keran lainnya. Fungsi windowing yang digunakan dengan FFT bekerja disini juga. Bagian tengah fungsi windowing harus berada pada t0, dan harus diregangkan sehingga ujungnya terbaring pada keran pertama dan terakhir. Anda juga bisa mendapatkan pergeseran fasa dari setiap sudut a: Perhatikan bahwa pembalikan t telah dilakukan di sini, jadi kita dapat mengambil koefisien langsung dari rumus ini. Batas t-gt0 secara alami cos (a), karena semua kosinus ditambahkan bersama-sama memiliki fase a pada x0. Jika Anda belum menyadarinya, ide utama pembuatan FIR filter adalah membuat fungsi yang berisi frekuensi yang ingin Anda lewati penyaringan. Amplitudo frekuensi pada fungsi secara langsung menentukan respons frekuensi besarnya filter. Tahapan frekuensi menentukan respons fase. Membalikkan koefisien hanya diperlukan dengan filter pergeseran fasa, karena filter yang tidak memperkenalkan pergeseran fasa apapun simetris di sekitar t0. Mendefinisikan rentang frekuensi yang disertakan Jika Anda menggunakan rumus sinc sebagai koefisien filter Anda, sebenarnya Anda tidak melakukan penyaringan, karena semua frekuensi dari 0 sampai Nyquist sama-sama disajikan dalam sinc. Disini Anda akan melihat bagaimana Anda dapat memilih frekuensi mana yang akan ada dalam rumus koefisien filter Anda. Ingat di mana kita awalnya mendapatkan sinc dari: Dalam integral, batas atas (1x) sebenarnya mewakili frekuensi tertinggi termasuk (1), dan batas bawah (0x) terendah (0). Jadi jika Anda menginginkan sebuah formula untuk filter bandpass, Anda dapat menulis: di mana bagian atas dan bawah adalah frekuensi cutoff sedemikian rupa sehingga 1 berarti frekuensi Nyquist, dan 0 berarti 0Hz. Sekarang taruh saja frekuensi yang Anda inginkan, hitung, dan ganti x dengan (pi t). Misalnya, jika Anda ingin membuat filter lowpass halfband (yang secara alami memiliki frekuensi cutoff pada samplerate4, sama seperti frekuensi Nyquist 2): Untuk membuat filter multi-band, Anda dapat menggabungkan beberapa formula filter bandpass dengan menambahkan Mereka bersama. Contoh equalizer Jika Anda ingin membuat equalizer (filter yang memungkinkan Anda menentukan besaran untuk frekuensi tertentu), Anda mungkin mengumpulkan banyak formula filter bandpass, yang diperkecil oleh besaran yang Anda inginkan untuk segmen frekuensi. Ini memberi Anda respons yang sangat besar yang terlihat sangat mirip dengan batu bata: Mungkin Anda ingin lebih mirip ini: Ada tiga cara. Cara pertama adalah dengan menggunakan batu bata yang lebih kecil, yang berarti bahwa Anda membagi frekuensi menjadi segmen yang lebih sempit dari sebelumnya dan menggunakan interpolasi untuk mendapatkan nilai besarnya untuk filter bandpass sempit baru yang kemudian Anda kombinasikan. Cara kedua adalah mendefinisikan polinomial (seperti ax3bx2cxd) yang memiliki karakteristik yang diinginkan (dan di mana x1 mewakili freqSR2), dan untuk membuat respons besarnya filter Anda mengikutinya. Ini mungkin. Cara ketiga adalah menambahkan beberapa formula bandwith ramp filter. Dalam respon besarnya solusi ini terlihat seperti garis lurus yang ditarik antara frekuensi yang didefinisikan di dekatnya. Ini juga mungkin, dan, menurut pendapat saya, solusi terbaik. Respon frekuensi magnitudo berbentuk polinomial Dalam sinc, semua gelombang kosinus ditambahkan bersama memiliki amplitudo yang sama, seperti yang Anda lihat di sini - semua frekuensi diperlakukan sama: Anda dapat mengubahnya dengan meletakkan di sana fungsi g () yang mendefinisikan amplitudo gelombang kosinus Dari frekuensi yang berbeda: Jika fungsi g (x) adalah bentuk axb, perhitungannya seperti ini: Sebagai contoh sederhana, jika kita menginginkan respons frekuensi besarnya menjadi garis lurus, mulai dari 0 pada 0Hz, dan berakhir pada 1 pada SR2, kita definisikan g (x) x: Dan perhitungan rumus koefisien filter untuk ini: Dalam kasus lain, untuk mendapatkan formula untuk polinomial penuh, lakukan perhitungan untuk masing-masing persyaratannya (axb) secara terpisah dan jumlah hasilnya. Bandpass magnitude-ramp Heres contoh respon frekuensi besarnya dari filter jalan: Untuk membuat jalan bandpass, Anda harus terlebih dahulu menentukan polinomial g (x) yang menjelaskan bagaimana besarnya berperilaku di dalam batas bandpass. Besarnya linier di dalam batas, sehingga polinomial g (x) harus berbentuk cxd. C dan d dapat dipecahkan dari persamaan: di mana x1 adalah batas frekuensi yang lebih rendah, dan x2 lebih tinggi. Y1 dan y2 adalah besaran dari frekuensi batas. Ingat bahwa disini x1 sama dengan frequencySR2. OK, berikut ini adalah penyelesaian c dan d: G (x) cxd adalah polinomial, dan Anda sudah tahu bagaimana membuat respons frekuensi besarnya memiliki bentuk yang sama (Respon frekuensi berbentuk polinomial berukuran besar) sebagai polinomial. Anda juga sudah tahu bagaimana memasukkan hanya rentang frekuensi yang terbatas (Bagian Mendefinisikan rentang frekuensi yang disertakan) dalam rumus koefisien Anda. Kombinasikan pengetahuan ini, dan Anda bisa menulis rumus koefisien untuk ramp bandpass filter: Catatan tentang penerapan equalizer. Jika equalizer harus disesuaikan realtime, menghitung ulang seluruh rumus filter equalizer dengan semua fungsi trigonometri dapat berubah terlalu berat. Mungkin lebih baik untuk mendahului koefisien untuk beberapa filter yang saling tumpang tindih, misalnya untuk equalizer tiga saluran: Ketika menghitung koefisien untuk keseluruhan equalizer, cukup pilih koefisien yang sesuai dari ini, skala sesuai dengan slider equalizer, dan penjumlahan. Jika Anda mengambil koefisien filter FIR langsung dari formula filter Anda, Anda mendapatkan respons magnit berganda. Alasannya sederhana: Jumlah koefisien terbatas, namun rumus filternya tidak, namun terus memiliki nilai nol di luar kisaran yang Anda gunakan untuk koefisien. Fungsi windowing membantu. Not using a windowing function is the same thing as using a rectangular ( flat inside its limits) windowing function. Using a windowing function means that you multiply the values taken from your infinitely long filter formula by the corresponding values taken from your finitely long windowing function, and use the results as filter coefficients. Here are some windowing functions, and the produced magnitude responses of a FIR lowpass filter with a low number of taps, illustrated: As you see, the steeper the cutoff, the more waves you get. Also, if wed look at the magnitude responses in dB scale, wed notice that from the three, cos4 gives the best stopband ( the frequency range that should have 0 magnitude) attenuation. Mathematically, multiplication in the time domain is convolution in the frequency domain, and windowing is exactly that. (Also, multiplication in the frequency domain is convolution in the time domain.) I hope i didnt slam too many new words to your face. Time domain means the familiar time-amplitude world, where we do all the FIR and IIR filtering. The frequency domain means the frequency-amplitudeampphase world that you get into through Fourier transformation. And convolution In the time domain, FIR filtering is convolution of the input signal with the filter coefficients. Say you convolute 0,1,0,0,2,0,1,0 with 1,2,3 (where 2 is at the center): Youll get 1,2,3,2,4,7,2,3. If you understand this example, you surely understand convolution too. Ideally (impossible), there would be no windowing, just the constant value 1 infinitely in time. And a steady constant value in the time domain is same as 0Hz in the frequency domain, and if you (in the frequency domain) convolute with 0Hz, it is the same as no convolution. Convolution in the frequency domain equals to multiplication in the time domain, and convolution in the time domain equals to multiplication in the frequency domain. Sounds simple, eh But note that in this frequency domain, there are positive AND NEGATIVE frequencies. Youll learn about those in chapter Positive and negative frequencies. Words of wisdom You get flat (but not necessarily continuous) phase response if your filter (filter coefficients) is symmetrical or antisymmetrical (sides are symmetrical but have opposite signs, and the center crosses zero) around t0, even if you limit the number of coefs andor window them. Sometimes you can optimize your filter code a lot. Some coefficients may turn zero, so you can skip their multiplications. If your filter is symmetrical around t0, you can instead of input(t)ainput(-t)a write (input(t)input(-t))a). If your filter is antisymmetrical around t0, replace input(t)a-input(-t)a) with (input(t)-input(-t))a. Sinc(t) is 1 at t0, and 0 at other integer t values. Calculating the limit t-gt0 is very simple. If your filter formula was originally a sum cosines (meaning its not a phase shift filter), the limit t-gt0 is simply the area of the magnitude frequency response, in such way that the area of no filtering is 1. The actual filter implementation (after possible coefficient calculations) depends much on how the input data is fed to the filter. I can see three cases: You have the whole input data in front of you right when you start. A sample editor is a good example on this. This is the easiest case. With FIR filters, just take values from the input data, multiply with coefficients and sum, like this: output(t) a0input(t-2) a1input(t-1) a2input(t) a3input(t1) a4input(t2). The only problem is what to do at the start and at the end of the input table, because reading data from outside it would only cause problems and mispredictability. A lazy but well working solution is to pad the input data with zeroes, like this: This is how its mostly done with FFT filtering. With FIR filters, it isnt that hard to write a version of the routine that only uses a limited range of its taps, like this: and to use that version at the start and at the end. For this, it is easiest if you have a table of coefficients instead of hard-coding them into the routine. Data is fed to the filter in small chunks, but it is continuous over the chunk borders. This is the most common situation in programs handling realtime audio. One sample at a time. Case 2 can be treated as this, because the chunks can always be chopped into single samples. It is a fact that you cannot use future inputs in this case, so a FIR filter would have to be of form such as: output(t) a0input(t-4) a1input(t-3) a2input(t-2) a3input(t-1) a4input(t). Clearly this kind of a filter creates a delay, but thats just a thing you have to learn to live with. Also, you only get in one sample at a time, which is not enough for filtering, so you have to store the old input values somehow. This is done using a circular buffer. The buffer is circular, because otherwise youd soon run out of memory. Heres a set of pictures to explain the scheme: The buffer must be at least as long as the filter, but it is practical to set the length to an integer power of 2 (In the above example: 2532), because then you can use the binary AND operation to handle pointer wrapping always after increasing or decreasing one (In the above example, AND with 31). Even better, use byte or word instructions, and wrapping will be automatically handled in overunderflows caused by the natural limits of byte or word. Note that the buffer should be filled with zeroes before starting. A similar circular buffer scheme is also often the best solution for implementing the output part of an IIR filter, no matter how the input part was realized. There are both positive and negative frequencies. Until now we havent had to know this, because we have been able to do all the calculations by using sines as frequencies. Dont be fooled that positive frequencies would be sines, and negative ones something else, because that is not the case. In all real (meaning, not complex) signals, positive and negative frequencies are equal, whereas in a complex signal the positive and negative frequencies dont depend on each other. A single sinewave (real) consists of a positive and a negative frequency. So any sine frequency could be expressed as a sum of its positive and negative component. A single, positive or negative, frequency is: and could also be written as: As stated, a sinewave consists of a positive and a negative frequency component. Heres the proof: (The phase of the negative frequency must also be inverted, because it rotates to the other direction) As you see, the imaginary parts nullify each others, and all that remains is the real part, the sine wave. Amplitude of the sine wave is the sum of the amplitudes of the positive and the negative frequency component (which are the same). This also proves that in any real signal, positive and negative frequencies are equal, because a real signal can be constructed of sine waves. The complex Z-plane is a good place to look at positive and negative frequencies: Positive frequencies are on the upper half of the circle and negative frequencies on the lower half. They meet at angles 0 and the Nyquist frequency. Aliasing usually means that when you try to create a sine wave of a frequency greater than the Nyquist frequency, you get another frequency below the Nyquist frequency as result. The new frequency looks like as if the original frequency would have reflected around the Nyquist frequency. Heres an example: The cause of aliasing can be easily explained with positive and negative frequencies. The positive component of the sine wave actually gets over the Nyquist frequency, but as it follows the unit circle, it ends up on the side of negative frequencies And, for the same reasons, the negative component arrives on the side of positive frequencies: The result is a sine wave, of frequency SR-f. Analytic signal It is sometimes needed to first create a version of the original signal that only contains the positive frequencies. A signal like that is called an analytic signal, and it is complex. How does one get rid of the negative frequencies Through filtering It is possible to do the job with an IIR filter that doesnt follow the conjugate-pair-poles-and-zeros rule, but a FIR filter is significantly easier to create. Well use the old formula that we first used to create sinc: but this time, instead of cosines, only including the positive frequencies: As you see, the filter coefficients are complex. We should also halve the amplitude of the positive frequency (it should be half of the amplitude of the cosine, because the negative component is gone) but thats not necessary, because itd only scale the magnitude. To convert the complex analytic signal back to real, just throw away the imaginary parts and all the frequencies get a conjugate (on the z-plane) pair frequency. Here the amplitudes drop to half, but as we skipped the halving in the filtering phase, it is only welcome. The real to analytic signal conversion could also be a good spot for filtering the signal in other ways, because you can combine other filters with the negative frequency removal filter. Amplitude modulation Amplitude modulation means multiplying two signals. All samplepoints in the modulated signal are multiplied by the corresponding samplepoints in the modulator signal. Heres an example: What happens if we modulate a signal with a sinewave The original signal is (as we have learned) a sum of frequecy components, sinewaves of various frequencies, amplitudes and phases. Note that the signal we are talking about here is real, not complex. Say sNUMBER is one of the frequency components. So, we can write the original signal as: Now, if we multiply this signal with the modulator signal m, we get: This is good, because as you see, its the same as if the frequency components were processed separately, so we can also look at what happens to each frequency component separately. A frequency component can be written as: where amp is the amplitude, f the frequency and a the phase. The modulator sine can be written the same way (Only added the letter m): Multiply those and you get: If we discard the phase and amplitude information, we get: which is two frequencies instead of the origial one. Heres a graph that shows how the frequencies get shifted and copied. The original frequency is on the - axis and the resulting frequencyfrequencies on the axis: In the graph Modulated, the frequencies that would seem to go below zero, get aliased and therefore reflect back to above zero. In sampled signal, the Nyquist frequency also mirrors the frequencies. Frequency shifting With some tweaking and limitations, you could make a frequency shifter by using sinewave modulation, but theres a better way. Lets try modulating the signal with e(i mf x) instead of cos(mf x). Phases and amplitudes are irrelevant, so ive just ignored them. (I hope you dont mind) Lets see what happens to a single positivenegative frequency when it is modulated: The answer is very simple. The original frequency got shifted by the modulator frequency. Notice how the rule Multiplication in the time domain is convolution in the frequency domain. applies here also. Heres an example on the z-plane unit circle. p0, p1, p2 are the positive frequencies and n0, n1, n2 their negative conjugate frequencies. Say the modulator frequency rotates the frequencies 14 full cycle counterclockwise: In the modulated signal, the original pair frequencies (like p0 and n0) are no longer conjugate pairs. Thats bad. Another bad thing is that negative frequencies get on the side of positive frequencies and vice versa. But if we first filter all the negative, and those of the positive frequencies that would arrive on the wrong side of the cirle, and then modulate the filtered signal: (The filter formula is in the chapter A collection of FIR filters in section Combined negative frequency removal and bandpass) Now it looks better To make this filtered amp modulated complex signal back to real again, just discard the imaginary part and all the frequencies get a conjugate pair: For most sounds, frequency shifing doesnt do a very good job, because they consist of a fundamental frequency and its harmonics. Harmonic frequencies are integer multiples of the fundamental frequency. After you have shift all these frequencies by the same constant frequency, they no longer are harmonics of the fundamental frequency. There are ways to do scaling instead of shifting, but just scaling the frequencies would be same as resampling, and resampling also stretches the sound in time, so it has to be something smarter. The main idea is to divide the sound into narrow frequency bands and to shiftscale them separately. OK, so frequencies usually come with harmonics - Why Just think where sounds in nature originate from: vocal cords in our throat, quitar strings, air inside a flute. All vibrating objects, and you have probably learned at school that objects have several frequencies in which they like to vibrate, and those frequencies are harmonics of some frequency. What happens in those objects is that they get energy from somewhere (moving air, players fingers, air turbulence), which starts all kinds of vibrationsfrequencies to travel in them. When the frequencies get reflected, or say, go around a church bell, they meet other copies of themselves. If the copies are in the same phase when they meet, they amplify each other. In the opposite phases they attenuate each other. Soon, only few frequencies remain, and these frequencies are all harmonics of same frequency. Like so often in physics, this is just a simplified model. A note about notation. ) The fundamental frequency itself is called the 1st harmonic, fundamental2 the 2nd, fundamental3 the 3rd, and so on. Chromatic scale In music, harmonics play a very important role. The chromatic scale, used in most western music, is divided into octaves, and each octave is divided into 12 notes. The step between two adjanced notes is called a halftone. A halftone is divided into hundred cents. An octave up (12 halftones) means doubling the frequency, an octave down (-12 halftones) means halving it. If we look at all the notes defined in the chromatic scale on a logarithmic frequency scale, we note that they are evenly located. This means that the ratio between the frequencies of any two adjacent notes is a constant. The definition of octave causes that constant12 2, so constant 2(112) 1.059463. If you know the frequency of a note and want the frequency of the note n halftones up (Use negative n to go downwards) from it, the new frequency is 2(n12) times the old frequency. If you want to go n octaves up, multiply by 2n. But why 12 notes per octave As said, harmonics are important, so it would be a good thing to have a scale where you can form harmonics. Lets see how well the chromatic scale can represent harmonics. The first harmonic is at the note itself: 0 halfnotes 1. The second harmonic is at 1 octave 2. The third harmonic is very close to 1 octave 7 halftones 19 halftones 2(1912) 2.996614. Dan seterusnya. Heres a table that shows how and how well harmonics can be constructed: Not bad at all The lowest harmonics are the most important, and as you see, the errors with them are tiny. I also tried this with other numbers than 12, but 12 was clearly the best of those below 30. So, the ancient Chinese did a very good choice The above table could also be used as reference when tuning an instrument, for example a piano (bad example - no digital tuning in pianos), to play some keys and chords more beautifully, by forcing some notes to be exact harmonics of some other notes. A common agreement is that one of the notes, middle-a, is defined to be at 440Hz. This is just to ensure that different instruments are in tune. Flanger is simply: where d is the length of the variable delay. D values have a lower limit, and the variation comes from sine: Because d is not integer, we must interpolate. Most probably, annoying high frequency hissing still appears. It can be reduced by lowpass filtering the delayed signal. Wavetable synthesis means that the instruments being played are constructed of sampled sound data. MOD music is a well-known example. Also most of the basic home synthesizers use wavetable synthesis. Say you have a sampled instrument, and want to play it at frequency f 440Hz, which is middle A in the chromatic scale. To be able to do this, you need to know A) the samplerate of the sample and the frequency of the sampled instrument, or B) the wavelength of the instrument expressed as number of samples (doesnt have to be integer). So you decide to precalculate the wavelength to speed up the realtime routines a little: The samplerate of your mixing system, SR, is 44100Hz. Now that you know this, you can calculate the new wavelength, the one you want (number of samples): In the mixer innerloop, a sample offset variable is used in pointing to the sampledata. Every time a value is read from the sampledata and output for further mixing, sample offset is advanced by adding variable A to it. Now we must define A so that ol (256) is stretched (here shortened) to nl (100.22727), in other words, so that for ol samplepoints in the sampledata, you produce nl output values: Everything on one line: Thats it By using A as the addvalue, you get the right tone. Click removal There are some situations when unwanted clicks appear in the output sound of a simple wavetable synthesizer: Abrupt volume (or panningbalance) changes. A sample starts to play and it doesnt start from zero amplitude. A sample is played to the end and it doesnt end at zero amplitude. (Biased sampledata or badly cut out sample) A sample is killed abruptly, mostly happens when new notes kill the old ones. Poor loops in a sample. And what does help Heres some advice: Volume changes must be smoothed, maybe ramped, so that itll always take a short time for the new volume to replace the old. Clicky sample starts can be muffled, meaning that the volume is first set to zero and then slided up. This could of course be done beforehand too, and some think muffling sample starts is wrong, because the click may be deliberate. Some drum sounds lose a lot of their power when the starts are muffled. Another case is when the playing of a sample is not started from its beginning. That will most probably cause a click, but muffling is not the only aid - starting to play from the nearest zero crossing also helps. Abrupt sample ends should also be faded down. This may require some sort of prediction, if you want to fade down the sound before its ran over by another sound. This prediction can be made by using a short information delay buffer. It may be easier to just use more channels, to allow the new sound to start while the other one is being faded out in the background, on another channel. When the sampledata ends at a value other than zero, the cause may be that the sampledata is not centered around the zero level, or that the creator of the sample has just cut the end of the sample away. The easiest way to fix this is to fade out the end of the sample beforehand. However, this is not always possible. Symmetric form Turning an IIR filter backwards Getting rid of output(tn) Getting rid of input(tn) FIR frequency response IIR frequency response. Olli wrote he tried to make his text as down-to-earth as possible. Well, heres a more mathematical approach. But Ive still tried to make this intuitive and FUN rather than boring myself with lengthy proofs. This also means that there may be errors, most probably in signs. Symmetric form Say you have this IIR filter: You can put its equation to this symmetric form: Now define a new function, middle(t): You can rewrite this as: Notice how the transition from input(t) to middle(t) is a FIR filter and the transition from output(t) to middle(t) is another. So the IIR filter in fact consists of two FIR filters facing each other. This gives a simple approach to frequency response calculations (see the section IIR frequency response). Turning an IIR filter backwards You can solve input(t) from the IIR equation: Now swap input and output and you have a filter that undoes what the original did. But if the frequency response of the original filter was ZERO for some frequency, the inverted one will amplify that frequency INFINITELY. This is just logical. The inverted filter will also have an opposite phase shift, so that if R(f) is the frequency response of the original filter as a complex number and r(f) is the frequency response of the inverted filter, R(f)r(f)1 for every f. Getting rid of output(tn) Say you have somehow found that you need an IIR filter like this: You need to know both output(t2) and output(t-2) to be able to compute output(t). Doesnt seem very practical. But you can shuffle the equation a little: Now define a new variable ut2 and use it instead of t: Then solve output(u): Now you can use the filter. Getting rid of input(tn) Notice how in the previous example, input(t) became input(u-2). Had there been input(t1), it would have become input(u-1) which can be used in real time filters. Generally, you can get rid of input(tn) this way if the equation also uses output(tm) where mn, because you can define utm which turns input(tn) to input(u-(m-n)) which you get in time. If mltn, this is not possible: Here m0 and n1, so you cant get rid of input(t1) and keep the filter mathematically equivalent to the original. However, you can delay the output by one time unit: Usually, this small delay doesnt matter. But it changes the phase frequency response of the filter and this DOES matter if you then mix the filtered signal with the original one or others derived from it in that case, youd better make sure that all of the signals have the same delay. (Except if you happen to like the extra effect.) (For example, if you have a filter output(t)input(t-1), it doesnt do much as such. But if you mix the filtered signal with the original one, the mixing becomes a filter in itself and you can compute its frequency response and all.) If you try to force the original filter through the utm trick by introducing a dummy 0output(t1) term: youll just get division by zero. FIR frequency response Treat a sine wave as a rotating phasor e(it2piffs) where: The real component of this phasor is the regular sine wave. The neat thing about this is that you can multiply it with various complex numbers to scale the magnitude and shift the phase at the same time. By defining ze(i2piffs), the phasor can be written as zt. This is the same z that is used in pole-zero calculations (see chapter IIR filter design using pole-zero method). Heres the general FIR equation: Now, lets look what the filter does to an infinitely long sine wave with frequency f. But this sine wave can be replaced with the rotating phasor if we then throw away the imaginary component of the output. m(k) is real so the real and imaginary components cant affect each other. Here the zt factor doesnt depend on k, so it can be moved outside the sum: z depends on f (ze(i2piffs), remember) but the value of the sum doesnt depend on t. Ill call it R(f): output(t) is a rotating phasor at the same frequency as input(t) it just has a different amplitude and phase as defined by R(f). This means that for an infinitely long sine wave of frequency f, R(f) shows how the filter affects its amplitude and phase. In other words, R(f) is the frequency response of the filter. Its a complex function. If you dont remember what this means, see section Complex math with filters in chapter Whats a filter in this file. IIR frequency response When two filters are concatenated so that one filters output is fed to the other filters input, the responses are multiplied at each frequency: A filter that just connects its input to its output and doesnt change the signal at all has a frequency response of 1 at all frequencies: Now assume that we have a filter with frequency response R(f) and we make another filter with frequency response Rinv(f) that UNDOES everything the first filter did to the signal when they are concatenated. So the inverse filter also has an inverse frequency response. Remember, an IIR filter consists of two FIR filters facing each other (see section Symmetric form). This setup can be treated as a normal FIR filter followed by an inverted FIR filter: This means that if you can calculate the frequency responses of the two FIR filters, you can calculate the IIR frequency response by dividing one with the other. An example. You have this IIR filter. Change the names of functions a little: Compute the frequency response of filter input1-gtoutput1 (originally input-gtmiddle). The general formulas: In this particular case: The input2-gtoutput2 (originally output-gtmiddle) filter: Now the whole IIR: To actually calculate the frequency response at some frequency, youd apply Eulers formula and the usual complex number rules: R in the filters means resonance, steepness and narrowness. Fastest and simplest lowpass ever Fast lowpass with resonance v1 19 Comments raquo Thanks for posting this. It8217s a nice collection of audio DSP nuggets. May I suggest that the URL at the top of the original text document (iki. fiodspdspstuff. txt ) be pointed directly to this page. Comment by ColdCold 8212 2009-11-16 16:06 Thanks Mate, Greatly appreciate this tutorial. DSP in simple terms is not easy to come by on the Web Comment by Don 8212 2010-05-10 04:29 Thanks a lot. Very useful concepts explained in a lucid manner. Comment by Ravi 8212 2010-08-30 14:59 Hi, About notch filter.. Why I can8217t get the frequency cut effect Sample rate:1600 freq 1950 q 0.1 z1x cos(2pifreqsamplerate) a0a2 ((1-q)(1-q))(2(fabs(z1x)1)) q a1 -2z1xa0a2 b1 2z1xq b2 -(qq) 8212821282128212- frequency: 1950.000000 q: 0.100000 z1x: 0.195090 a0a2: 0.438887 a1: -0.171245 b1: 0.039018 b2: -0.010000 Each sample calculation: 82128212821282128212821282128212821282128212821282128212 reg0 a0a2 ((double)samplecurrentsampleminus2) a1sampleminus1 b1reg1 b2reg2 reg2 reg1 reg1 reg0 82128212821282128212821282128212821282128212821282128212 Is it correct Output is clean voice, but 1950Hz carrier is still there. BR Comment by Alexander Vangelov 8212 2011-03-16 22:46 Freq should be between 0 and samplerate2. (Just a quick comment before I go to bed) Comment by Olli Niemitalo 8212 2011-03-17 00:53 Thank you, it works :) I missed a zerro digit in parametters (just before I go to bed) Sample rate: 16000 Freq: 1950.000000 q: 0.400000 z1x: 0.720854 a0a2: 0.504599 a1: -0.727484 b1: -0.576683 b2: 0.160000 Comment by Alexander Vangelov 8212 2011-03-17 10:43 Very good tutorial, thanks Comment by Vadim 8212 2011-10-11 19:42 man, this is the best introduction (covering all topics) into DSP I stumbled upon perhaps I do have a chance to pass the exam. D sorry, for a double post. but8230 can you attest everything is correct for example, 822082218221 You can use any number of poles and zeros, but they must all have 8220conjugate pairs8221, if they are not positioned on the 8220-8221 axis. 822082218221 is this true I8217m playing with applets that allow for poles without conjugate pairs and seemingly band-pass filters (with regard to the magnitude response) can be built this way. can you please explain ( laps. fri. uni-lj. sidpsarhivappletiisipsystemv4.0srcapplet. html ) Doug, it is true, IF you want the filter to have a real output, not complex. If you make a bandpass with just one pole, and have the pole so close to the unit circle that the filter output is pretty much a single frequency, then the output of the filter will be a complex phasor rotating in one direction on the complex plane. If you switch the sign of the imaginary part of the position of the pole then you get as output a phasor that rotates in the opposite direction. If you have poles in both of those positions, then the output must contain both of those complex phasors in equal parts, thus the imaginary parts of the phasors cancel each other. So you get as output a real sinusoid. Good luck with the exam Comment by Olli Niemitalo 8212 2011-12-27 13:41 This is the first cogent explanation of poles and zeros that I have ever received. I feel better and worse at the same time, if you know what I mean. In any case. THANK YOU Comment by Mark McConnell 8212 2012-05-09 01:12 8230 Yehar8217s Digital Sound Processing Tutorial for the Braindead 8230 Nice Job Men82308230. I found it very helpful. Terima kasih. Can you put implementation of audio effects in computer. Comment by Trnform3r 8212 2012-09-16 10:07 Sure, for example as a VST effect. Comment by Olli Niemitalo 8212 2012-09-16 22:14 This is fantastic nice work and a very well explanation of DSP. Thank you :D Comment by tor 8212 2013-02-16 01:42 Thank you so much for this informative writing on the subject which makes life much easier since no-where could I find any book on the subject which makes it as clear as you did here. Keep it going and thank you again. Comment by FJ Botha 8212 2015-02-21 10:14 Frickin delicious Seriously, i thank people like you for simply existing and count my blessings that i found this brilliant introduction you created. The note takingoutline is digestable in one bite and it will stick with me during my upcoming solo winter sound holiday to the pampa and magellians strait, the large uninhabited Falkland rock, and if im still alive - christmas island. Dec to Feb. I hope to capture enough sound to keep me glazed and deadeyed until black metal villians capture Oslo Comment by Mick Dkaye 8212 2016-10-18 19:13 And love that Black Deck. Masonna weeps Comment by Mick Dkaye 8212 2016-10-18 19:16 Leave a comment5152004 News: Bicubic resampling Long, lengthy rant HHHH discourse on 3D to follow. One of the features Ive been working on for 1.6.0 is the ability to do bicubic resampling in the video displays using hardware 3D support. Weve been using simply bilinear for too long, and its time we had better quality zooms accelerated on the video card. Problem is, 3D pipelines arent really set up for generic FIR filters, so the task is to convolute and mutate the traditional 4x4 kernel into something that a GPU understands. To review, the 1D cubic interpolation filter used in VirtualDub is a 4-tap filter defined as follows: tap 1 Ax - 2Ax 2 Ax 3 tap 2 1 - (A3)x 2 (A2)x 3 tap 3 - Ax (2A3)x 2 - (A2)x 3 tap 4 Ax 2 - Ax 3 where taps 2 and 3 straddle the desired point and x is the fractional distance from tap 2 to that point. Applying this both horizontally and vertically gives the bicubic filter. The fact that you calculate the 2D filter as two 1D passes means that the 2D filter is separable this reduces the number of effective taps for the 2D filter from 16 to 8. We can do this on a GPU by doing the horizontal pass into a render target texture, then using that as the source for a vertical pass. As we will see, this is rather important on the lower-end 3D cards. Now, how many different problems did I encounter implementing this Lets start with the most powerful cards and work down: DX9, some DX8 class cards (Pixel Shader 1.4: NVIDIA GeForce FX, ATI RADEON 8500) Six texture stages, high-precision fixed point arithmetic or possibly even floating-point. There really isnt any challenge to this one whatsoever, as you simply just bind the source texture to the first four texture stages, bind a filter LUT to the fifth texture stage, and multiply-add them all together in a simple PS1.4 shader. On top of that, you have fill rate that is obscene for this task so performance is essentially a non-issue. Total passes: two. NVIDIA has some interesting shaders in their FXComposer tool for doing bicubic interpolation using Pixel Shader 2.0 in a single pass, without any need for temporaries. However, it chews up a ton of shader resources and burns a ton of clocks per pixel mdash I think the compiler said somewhere around 50 clocks. Im not sure thats faster than a separable method and it chews up a lot of shader resources. Did I mention it requires PS2.0. It does compute a more precise filter, however. I might add a single-pass PS2.0 path because it offers the possibility of more advanced effects such as doing warpsharp in the pixel shader. I have a GeForce FX 5600 now, but when I first wrote this path, I had no PS1.4 capable card, so I had to prototype on the D3D reference rasterizer. Refrasts awe-inspiring 0.2 fps performance gives new meaning to slow. Unfortunately, I think refrast is still a procedural rasterizer, like old OpenGL implementations just about all other current software rasterizers now use dynamic code generation and run orders of magnitude faster. DX8 class card (Pixel Shader 1.1: NVIDIA GeForce 34) Four texture stages mdash not quite enough for single-pass 4-tap, so we must do two passes per axis. Now we run into a problem: the framebuffer is limited to 8-bit unsigned values, and more importantly, cant hold negative values. The way we get around this is to compute the absolute value of the two negative taps first into the framebuffer, then combining that with the sum of the two positive taps using REVSUBTRACT as the framebuffer blending mode. Sadly, clamping to 0,1 occurs before blending and there is no way to do a 2X on the blend so we must throw away 1 LSB of the image and burn a pass doubling the image, bringing the total to five passes. And no, I wont consider whacking the gamma ramp of the whole screen to avoid the last pass. DX7 class card (Fixed function, two texture stages: NVIDIA GeForce 2) This is where things get uglier. Only two texture stages means we can only compute one tap at a time, since we need one of the stages for the filter LUT. This means that 9 passes are required, four for the horizontal filter, four for the vertical, and one to double the result. As you may have guessed a GF2 or GF4Go doesnt have a whole lot of fill rate after dividing by nine and I have trouble getting this mode working at 30 fps above about 800x600. That sucks, because my development platform is a GF4Go440. I came up with an alternate way to heavily abuse the diffuse channel in order to do one tap per texture stage: draw one-pixel wide strips of constant filter (vertical for the horizontal pass, horizontal for the vertical pass) and put the filter coefficients in the diffuse color. This cuts the number of passes down to five as with the GF34 path. Unfortunately, this turns out to be slower than the nine pass method. I doubt its TampL load, because 5600 triangles in 4 batches isnt exactly a complex scene more likely the render target textures are tiled and Im blowing the tiling pattern by drawing strips. Sigh. Ive been racking my brain trying to bring this one below nine passes, but I havent come up with anything other than the method above that didnt work. DX7 class card (Fixed function, three texture stages: ATI RADEON) Three texture stages means we can easily do two taps at a time for a total of five passes, which should put the original ATI RADEON on par with the GeForce 3 for this operation. Yay for ATI and the third texture stage Oh wait, this card doesnt support alternate framebuffer blending operations and thus cant subtract on blend. On top of that, D3D lets us complement on input to a blending stage but not output, and we cant do the multiply-add until the final stage. Never mind, the original RADEON sucks. So now what We first compute the two negative taps using the ugly but useful D3DTOPMODULATEALPHAADDCOLOR . How do we handle the negation By clearing the render target to 50 gray and then doing a blend into it with a source factor of zero and a destination factor of INVSRCCOLOR . basically computing 0.5(1-src). We then add the two positive taps with their filter scaled down by 50 and using a straight 11 additive blend. The result is the filtered pixel, shifted into the 0.5, 1 range. The vertical pass is computed similarly, but with input complement on both passes to flip the result inverted to 0, 0.5, after which we can take advantage of the already narrowed input range to compute the vertical filter with slightly higher precision. (The filtering operation is linear and can be commuted with the complement.) The final pass then doubles the result with input complementation again to produce the correct output. Rather fugly, but it does work. The precision isnt great, though, slightly worse than the GeForce 2 mode. Interestingly, the RADEON doesnt really run any better than the GeForce 2 despite having half the passes. DX0 class card (Intel Pentium 4-M 1.6GHz) Heres the sad part: a highly optimized SSE2 bicubic routine can stretch a 320x240 image to 1280x960 at 30fps and still leave enough time left over to upload the result to the video card. That means systems with moderate GPUs and fast CPUs are better off just doing the bicubic stretch on the CPU. Argh You might be wondering why Im using Direct3D instead of OpenGL. That is a valid question, given that I dont really like Direct3D (which I affectionately call caps bit hell). The reason is that I wrote a basic OpenGL display driver for 1.5.5 and found that it was unusable due to a bug in the NVIDIA drivers that caused a stall of up to ten seconds when switching between display contexts. The code has shipped and is in 1.5.10, but is hard-coded off in VideoDisplayDrivers. cpp . I might resurrect it again as NVIDIA reportedly exposes a number of features in their hardware in OpenGL that are not available in Direct3D, such as the full register combiners, and particularly the final combiner. However, I doubt that theres anything I can use, because the two critical features I need for improving the GF2 path are either doubling the result of the framebuffer blend or another texture stage, both of which are doubtful. 4262004 News: YV12 is b0rked My daily commute takes me across the San Mateo Bridge. Coming back from the Peninsula there is a sign that says: Emergency parking: 14 mile. Several people suggested declspec(naked) for the intrinsics code generation problem. Sorry, not good enough. Not only does naked disable the frame pointer omission (FPO) optimization and prevent inlining, but it also doesnt stop the compiler from using spill space if it needs to mdash which means you basically have to set up a stack frame anyway. Ive been trying for some time to get YV12 support working perfectly, but at this point it looks like a wash. The problem is that different drivers and applications are inconsistent about how they treat or format odd-width and odd-height YV12 images. Some support it by truncating the chroma planes (dumb). Some do that and have unused space betweeen the Cr and Cb planes (weird). Many simply crash (very dumb). And a few simply dont support it (lame but pragmatic). YVU9 tends to be even more broken. Arrrgh. Now, if people had sense, they would have handled this the way that MPEG and JPEG do, and simply require that the bitmap always be padded to the nearest even boundaries and that the extra pixels be ignored on decoding. Unfortunately, no one seems to have bothered to ever define the YV12 format properly in this regard, and thus we have massive confusion. 3172004 News: Taking the 64-bit plunge First, I finally fixed the FAQ link from the program, and also updated the knowledge base for known bugs in 1.5.10. I dropped the older KB entries, but theyre basically redundant with the change log in VirtualDub. I put together a new Athlon 64 based system a few days ago, installed the preview of Windows XP for 64-bit Extended Systems and the prerelease AMD64 compiler, and hacked up the VirtualDub source code a bit. The above is the result. It plays MPEG-1 files, but nearly all of the assembly optimizations are disabled and none of the video filters work, so its still far behind the 32-bit version, but its still neat to be able to experiment with 64-bit code. Although I have it installed for some time now, Ive been avoiding using Visual Studio 2003. The incremental improvements in the compiler simply arent worth putting up with the braindead, butt-slow IDE. Thus, Ive been continuing to use Visual C 6.0 SP5PP. Well, after installing XP64, I was vindicated mdash none of the VS IDEs will install on it, because they rely on the 32-bit Framework, which currently doesnt work under WOW32. Which means Im using. VC6, with the pre-release VC8 compiler from the Windows Server 2003 DDK. This is a bit clumsy since the VC6 debugger doesnt understand VC7 debug info, and certainly cant debug a 64-bit app, so I have to use the beta AMD64 WinDbg instead, but at least I have the AMD64 build in the same project file as the 32-bit build. Having a configuration called Win32 Release AMD64 is a bit weird, however. There are two major bottlenecks to getting VirtualDub running smoothly on AMD64: the compiler doesnt support inline assembly, and the OS doesnt support MMX for 64-bit tasks. I know a few of you are going to yell out use compiler intrinsics, but please look at this first: Visual Studio 2003 The code is at least correct this time, but it is still full of unnecessary data movement, which consumes decode and execution bandwidth. Now for the real kicker: those extraneous moves hurt on a Pentium 4, because on a P4, a register-to-register MMXSSESSE2 move has a latency of 6 clocks . If you have extra shifter or ALU bandwidth you can attack this by replacing movdqa with pshufd or pxorpor . but you cant do this when the compiler is generating code from intrinsics. And before you say that performance doesnt matter so much, remember that the purpose of those intrinsics is so that you can optimize hotspots using CPU-specific optimizations. 10-20 in a critical inner loop matters. This all only pertains to the Microsoft Visual C compiler, and as it turns out, the Intel CC Compiler generates much better MMX and SSE2 code. I suspect recent versions of gcc would beat MSVC too. As it stands right now, though, I still have to use Visual C, and that means Im still going to have to hand-roll a lot of assembly code for performance. And with AMD64, that means Im going to have to duplicate and reflow a lot of it. 292004 News: Playing hooky I finally caught that nasty head cold that seems to be travelling everywhere this month. It doesnt make you very ill, but just makes you stuffy, irritable, grumpy, and hoarse mdash sufficient for me to call in sick for the first time in quite a while. I probably could have gone in and gotten some work done, but hacking and coughing is not a great sound for your coworkers to hear. Besides, we already recorded all the sound effects we need. While sitting at home enduring a feeling that can only be described as oogy, I did the only thing I can do at such a time: code. The Super Nintendo version of Konamis Tokimeki Memorial plays a tune during its main game start screen that Ive been trying to find for a while mdash its a fast version of Shioris theme. I have several versions of it called yuunagi tayori . but they range from slow to annoyingly slow to even painfully slow, none of which I like nearly as much. Collecting Tokimemo CDs has given me a much nicer collection of music to listen to but none of them have the song I want. And I can play the. spc version of it through WinAmp, but it sounds. like an SNES. So I decided to write my own SPC player, to learn how SNES music works and (maybe) extract the music into nicer formats. SNES music involves two major components: an 8-channel DSP that generates sound, and an SPC700 microcontroller to control it. The SPC700 is an interesting beast. Its instruction set shares a lot in common with that of the venerable 6502, but unfortunately the instruction decoding template isnt quite as regular, as some oddball instructions decode two effective addresses. Ive been doing x86 so long that I kept making stupid mistakes in the CPU emulation, such as goofing the carry bit on SBC (its flipped on 6502SPC700 vs. x86), and forgetting to commit the result of ADC back to the register file. For several hours I ended up staring at instruction execution traces, trying to figure out which of the 150 instructions that just executed didnt work correctly, in foreign code that I havent dissected properly, on a foreign CPU that Ive never coded for before. Somehow, it sorta works right now. Compressed sample decoding isnt quite right and the ADSR envelopes are bogus, but the melody is there and the instruments are recognizable. Whats weird is that the CPU emulation is still screwed up in a way that causes the various tracks to eventually desync. Apparently, Konamis sound player has independent data streams for each of the eight sound channels. The result is that the different bass and melody tracks eventually rotate away from each other in the looping portion of the song, and it sounds pretty cool, like a remix. Im going to have to save off this version and figure out how to reproduce this bug later after Ive gotten the emulation working correctly. Either that, or it goes on Averys Random Pile of Half-Baked and Vaguely Working Projects. But before that can happen, it has to be named after a totally unrelated anime character, like Atsuko (my fanfic reader), or Takuya (my x86 dynamic recompiler). VirtualDub 1.6.0 is still in progress. Im getting closer to declaring alpha and starting to clean up and stabilize the build, but Im not there yet I have a couple of half-baked features I have to decide yaynay on and check that I havent forgotten any major promised ones. 232004 News: Fix for AVI files being locked after seeing them in Windows XP Explorer Thanks to an observant user on the forums, I found one reason for Windows XP Explorer locking AVI files after opening its parent folder: If you have VirtualDubs frameserver installed in proxy mode ( proxyon. reg ) on a Windows XP system, turn it off with the proxyoff. reg file and then restart Windows or log out of your current session so Explorer restarts. When proxying is enabled, VirtualDub installs its frameserver under the regular Windows AVI driver, tunneling AVI files through to AVIFile and AVS files through to Avisynth. The tunnel code leaks reference counts on the tunneled objects and this causes the corresponding files to stay locked in read-only mode until the process dies. which in this case, is Explorer. My bad mdash I will fix this for 1.6.0. If proxying is not enabled, this only affects. vdr files. Im surprised it took this long to find out the cause. The good news is that this also explains why some applications dont work properly even with proxy mode. 1272004 News: Inbox is offline Hadnt planned to post a news entry today, but a note: thanks to the new famp(ing virus going around, my Inbox is for all intents and purposes offline. By offline, I mean I am receiving what looks to be about 8-10 emails per minute right now. If you have something to email me about in the next couple of days you may want to wait until the storm has quieted down, especially since my email account is likely to become full at this rate while I am sleeping or at work. Those of you who still want to email me, please use a distinctive subject and dont title your email with something stupid like problem mdash theyre hard to pick out amongst the viruses and spam and with the volume Im seeing right now Im liable to delete it. 12142003 News: 3D solution No one solved the 3D unprojection problem before I did, so heres the solution. The original problem outlined last time, for those who didnt see it or dont remember, is to find a transform to unproject an image to a r ectangle, only given its projection and four source points on the 2D projection image. So for those of you who thought Z values from the depth buffer were available, bzzzzzzzt . Depth buffers dont come attached with video frames. The coordinates of normalized device coordinate (NDC) space are -1, 1 for both the X and Y axes, after perspective division. Given four points (xn, yn) in original image space, we need to derive a 3x3 matrix transform that produces four homogeneous points (xn, yn, wn) that map to the NDC space corners. The matrix will be applied as follows: Requiring the four points to map to the corners of NDC space result in the following constraints: Simple algebraic transformation transforms the divides into additions and subtractions: The matrix transform can then be used to substitute the original points for the post-transform points: This is a underconstrained linear system with eight equations and nine unknowns mdash no good. However, a homogeneous 3x3 transform can be arbitrarily scaled, so setting I1 drops the ninth unknown: The result is now solvable via simple Gaussian elimination, and the 3x3 matrix can be converted to standard 4x4 form by setting Z0 on input and output. I am not entirely sure that I1 is safe, but it appears so. The only case in which I1 is impossible is if w must be zero at (x, y)(0,0), which means that the unprojected form requires that point to be at infinity . That seems extremely rare even when (0,0) is not within the projected image. In any event, the transform works and is more stable than Id thought it would be, so Im sticking with it. The fact that a single 3x3 transform can encompass all possible required rotations, shears, translations, flips, and projections is not obvious, but it is true. The algorithm can also be used to transform any convex quad into any other convex quad, simply by applying it twice, once on the source points, and a second time with the target points and the resultant matrix inverted. 1262003 News: 3D fun I suppose by this point I should really label this page the blog page rather than the news page, but oh well. Im currently developing a 3D software rasterizer in the 1.6.x branch. Why Because I can. Actually, one has been in VirtualDub for quite some time now: its in the module for the About box. Transform and lighting, triangle setup, rasterization, and texturing is all done manually and runs full speed even on a lowly Pentium. Of course, its generally only drawing a few thousand pixels per frame. Tapi aku ngelantur. Why do I need a 3D rasterizer I got the idea to write a deprojection filter, to correct for an off-center camera position. (If you always choreograph your shots perfectly at the correct angle at the right time, good for you.) The speed sucks right now at about 8 Mpixelssec with mipmap generation, trilinear filtering, and per-pixel perspective correction enabled, but I can speed that up later. I had to make sure there were no dropouts, that subpixel addressing was working, that the lambda determination worked properly, etc. first, and at least the image quality is good. The problem Im having right now is not in the triangle rasterizer, but in the filter that uses it. I have the forward transform determination working nicely, where a flat image is pasted onto an oblique plane, but I cant figure out the reverse transform, to convert the projected image back to the view plane. I have no need for correct depth coordinates since Im not doing Z - or W-buffering, and thus I think I can abuse homogeneous transforms to get the required mapping. I have a plain old 4x4 OpenGL-style matrix, so I can do just about any such transform. The problem is that I dont know how to derive the transform from a projected rectangle on a plane to the viewport, since the transformation ultimately isnt linear. Ideally, the user would only specify the corners of the projected rectangle, and appropriate depth values would be inferred to create the transform. I have a nagging suspicion that the quad-to-quad transform isnt hard, but that I might need projective texturing to unwarp the source texture correctly. With projective texturing, not only is the destination coordinate interpolated homogeneously as xw yw zw 1w . but so is the texture coordinate, as sq tq 1q . The usual way to handle this is apparently to turn the 1w divide into qw, making q almost free. Unfortunately, my texture mapper is not straight divide or affine subdivision based mdash it uses Newton-Raphson iteration to compute 1w per pixel and so Id have to throw in an extra multiply. Thus it wouldnt be so cheap to add it, and Id like to do the deprojection solely using 1w if possible. The bonus of getting this transform right is that if I were to enable the existing OpenGL path in the display code, or add a Direct3D version to it, I could trivially plunk the transform into the projection matrix and do the de-projection in real-time for free on previews. But I have to get the algorithm working first. Any 3D experts reading this that are bored and willing to explain the solution to me My copies of Real-Time Rendering and Jim Blinns Corner: A Trip Down the Graphics Pipeline arent helping. :) 1222003 News: VirtualDub 1.5.10 released VirtualDub 1.5.10 is out -- it fixes a couple of critical crashes, one of them being the VideoCD crash, and the other being a stability issue on Windows 9598 systems. Full props go to fccHandler for finding the bug in the source code that caused the latter problem. This version also fixes a few random problems I happened to identify on the way. Work is still progressing on the experimental version, which I hope to get to releasable mdash major-embarrassment-free mdash status in the near future. 1.5.10 contains a workaround for a rather sticky problem with certain filters, such as Deflicker. Basically, some filters that rely on separate analysis and render passes make a slightly invalid assumption mdash that once the analysis pass finishes, the next startProc call received will be the user starting the render pass. Well, this isnt actually guaranteed in the spec and recent 1.5.x versions break such filters when they refresh the output pane after the analysis. The result is that the filter dumps its analysis data and builds a one-frame trace before the render starts. Ups. This problem actually exists in all versions of VirtualDub, but in earlier versions you have to explicitly step the current frame position in order for the filter chain to restart, whereas 1.5.9 will do it any time the output pane needs to be refreshed and the filter chain is idle. A possible workaround is to disable the output pane before doing the analysis pass, although I havent actually tried this. Video codecs that support multi-pass modes, such as DivX, are not affected by the problem as they are not fed frames except during an actual render to disk. They do receive extra startend notifications in the video compression dialog, but anyone who has tried testing a multi-pass codec against VirtualDub has probably discovered this long ago. (Its a workaround for some early codecs that accept formats in ICCompressQuery() . but then reject them in ICCompressBegin() .) The change in 1.5.10 is that the FilterStateInfo structure contains an extra field indicating whether a preview is active. Unfortunately, filter authors will have to add a check for this structure field and recompile their filter against the filter. h header from the 1.5.10 source to take advantage to this. I have bumped the API version so that this can be done without breaking compatibility with earlier hosts. Backwards compatibility, while desirable, is a huge pain. I had an interesting encounter this weekend while playing Final Fantasy XI. While in Selbina I partied up with a few Japanese people who mostly didnt speak English mdash and, of course, I dont speak Japanese. I can read some Hiragana and Katakana glyphs, so I could mostly figure out who they were addressing and simple questions like are you ok. Beyond that, though, the most I ended up with was making an idiot out of myself by typing token phrases in romanji that I learned from anime. (Perhaps the most embarrassing was that one of them asked in romanji if I understood romanji, and I answered iie without thinking.) Trying to communicate across the language barrier is kind of fun, especially since in this case the worst that happens is that you end up lost in Vanadiel or perhaps virtually die a couple of times, and because words arent necessary to convey Im getting whaled on. I do feel a bit ashamed, though, that my Japanese party members did know some English, while the only other languages Im fluent in are CC and assembly. Ive always wanted to learn Japanese, but its very difficult to do so without (a) serious time and effort placed into learning it, (b) an immersive environment, and (c) a real dictionary. P. S. The automatic translation ability in the game is rather useless, as you have to choose from an incomplete list of phrases, and the menus that display the phrases are far too small so they all ellipsize (Are you. ). P. P.S. A game that leaves your character stranded in the world because you hit the Windows key or Alt-Tab, and then displays an error dialog saying Final Fantasy XI quit because the app lost full screen mode is lame. This has little to do with the language barrier but its so stupid I had to mention it. I tried using my own custom WinKey blocker which uses the Windows NT low-level keyboard hook, but for some reason it failed when FFXI was running (DirectInput). 11182003 News: VideoCDs and 64-bit computing There is a bug in the VideoCD MPEG-1 parser in VirtualDub 1.5.8 and 1.5.9 that causes heap corruption, and thus application instability. Please avoid processing VideoCD MPEG-1 (.dat) files with those versions. I wish it had been reported on 1.5.8 so I could have fixed it before the next release. The parser is rather hacky to begin with, though -- youll probably have better luck with third-party demuxing tools anyway as VideoCDs are rather prone to bit errors, since they are written without the second-level of correction that most CDs have. Regular MPEG-1 streams shouldnt trigger the bug. Lots of people are apparently having trouble figuring out the changes I made to the Save Processing Settings command. Folks, its really simple: there is a checkbox on the dialog where you can specify whether the edit list is saved. Check it if you are preserving settings for a particular file, uncheck it if you want to use the settings on other files. Every once in a while I hear people saying that we dont need 64-bit on the desktop. Well, us developers will need it soon. If you have a large program, it may take a couple hundred megs of memory to compile and link without swapping. Another few hundred megs is required to keep object, library, and debugging database (.pdb s in Visual C) resident in the disk cache. And if youre working on something that requires a large data set to start up, like say, a game, you can need another few hundred megs to keep that resident in the disk cache. So to do a full compilelinktest cycle you need a full gigabyte of RAM on the machine. Try it with half a gig and the disk cache ends up thrashing so both the compile and the program load hit disk, which is now three orders of magnitude slower than memory, doubling or tripling the cycle time. Doh. Look two or three years down the road, and its not hard to envision average developers hitting 2GB soon, which is where the trouble starts. Current x86 processors have 36-bit addressing and can address up to 64GB of RAM, so you might think this is a non-issue. Unfortunately, the extensions required to access that much memory are not always pleasant, and even if the whole OS can address more than 2GB you cant get that much to applications. Win32 applications only get 2GB by default, with 3GB being possible on 2000XP if a Physical Address Extension (PAE) kernel is in use. You can only get that 3GB, though, if your drivers and your software are PAE-capable. Under Windows NT, the disk cache is itself a process, so it suffers from the same addressing limitations as a regular process. As for speedups from 64-bit computing, dont expect too much. 64-bit brings some downsides over regular 32-bit, particularly in terms of higher memory usage and thus lower cache locality due to the larger pointers. Existing x86 CPUs already have at least one 64-bit ALU, for floating-point and MMX, so the increased width of general-purpose calculations itself isnt going to help for applications that already make heavy use of CPU extensions. In the specific case of AMD64 (x86-64), it appears that most of the gains come from the increased number of registers, compared to the IA-32 architecture. 1192003 News: VirtualDub 1.5.9stable released Its Sunday again. I dont plan to keep this pace forever, but you might as well enjoy it while you can. :) The major fix in this version-of-the-week is for a dumb oops in a fast display copy routine, but one fix that isnt mentioned in the release notes is that I fixed a lot of compile errors that only occur under Visual Studio 2003. VirtualDubs main compiler is still Visual C 6.0 SP5PP, and not 2003, because I refuse to put up with the slow, half-broken IDE of the latter. (Its not that Im unfamiliar with the 2003 IDE, because I use it at work. Its just that I hate it.) However, apparently some people are trying to compile VirtualDub under 2003 and are discovering that there are a lot of compile errors due to new overloads of C runtime functions that Microsoft added to the RTL for improved standards compliance. Ive added the requisite casts to fix that problem VS 2003 users should also disable buffer security checks (GS) and warnings 4018 and 4244 on the imported project. There are two other major incompatibilities between VC6 and VC7.1 that you may encounter. One is that some of my VC6 resource scripts have include afxres. h replaced with include winres. h to allow the. rc to compile without MFC installed. VC7.1 has a newer Platform SDK header set so this must be substituted instead: The other is a problem with the definition of the wide character functions in the RTL, such as iswspace . There are both library and inline versions of the functions, and unfortunately, depending on which of ltwchar. hgt and ltctype. hgt you include, either can get used. If one module uses library wide-char functions and the other uses inline functions, you get a link error even if the same function isnt used. This wouldnt be much of a problem except apparently the two headers are referenced in interesting ways through other header files and Ive found its very easy to produce a project that links on VC6 and doesnt link on VC7.1, and vice versa. This is very annoying. I want to create a system where I can work on VirtualDub and VC6 and batch-create the corresponding. sln and. vcproj for VC7.1, but this wont be possible if I constantly have to frob code around to get iswspace to link correctly. 1122003 News: VirtualDub 1.5.8stable released Random Helpful Tip . The heat dissipation of a 3GHz Pentium 4 CPU is not wasted if your room is freezing cold, as mine is right now. Im tempted to overclock it in order to warm the room up some more. 1.5.8 is out on SourceForge and is once again a minor stable release. Those of you who actually read my change notes (all three of you) will notice that the access denied SMP bug fix has mysteriously jumped from 1.5.7 to 1.5.8. This fix was supposed to go into 1.5.7, but got omitted at the last minute due to a source code control error. (It was stuck in the client spec of the machine that had bad RAM.) The fix has been pushed out with 1.5.8 and those of you with SMP or HyperThreaded systems should no longer have to whack CPU affinities to bypass the random errors. Other goofs that have been fixed are an inability to run under Windows 95 (oops), and displays not coming up with 8-bit paletted video. Work is still continuing on the experimental branch, which has now been pushed to 1.5.9, obviously. I have a cleaner and more versatile image library in progress that will support planar formats, and in particular, the 1.5.9exp display code can now handle YV12. Prototyping of a new video filter system is also in progress, although its still quite rudimentary and hasnt been hooked into the main system yet. The toughest part is figuring out what I do and dont want to support if I had tried to support everything I could have thought of when I made the original filter API I never would have finished it. The good news is that the old filter API can easily be retrofitted, so whatever I come up with should be able to support existing filters. 10222003 News: Stupid shell tricks, revisited Thanks to all who emailed me about the echo. off solution to the puzzle from the last news entry. You can stop now. :) For those of you running VirtualDub through Windows Terminal Server (or Windows XP Remote Desktop), there is a bug in Terminal Servers DirectDraw support: blits partially outside of the primary surface appear at the top-left instead of their intended origin. The problem is reproducible with the DirectX 7 sample applications. Unfortunately, current versions of VirtualDub do not have an option for disabling DirectDraw support I will need to add this in, as well as code to automatically fall back to GDI when a Terminal Services Client is connected. Arranging windows so that the video panes arent partially off the screen should work around the problem. Ive recently been having stability issues with my desktop system and thought originally that it was due to bad drivers. (Its bad when you md5sum - b d2exp. mpq twice and get different checksums.) I later thought it might be caused by a bad patch from Windows Update, after WinDbg pointed out a kernel SendMessage call in some of the minidumps. Turns out it was neither after running Memtest86 on a hunch, it turned out that one of the RAM sticks in the system had a subtle pattern-sensitive flaw in it at PC1066 speeds. What sucks is that this system is RDRAM-based so the memory is more expensive than DDR, which is dirt cheap. Bleah. Dont bother telling me to switch to DDR, because 256MB of RDRAM is still cheaper than a new motherboard. 10202003 News: VirtualDub 1.5.7 (stable) released Thanks to the miracle of source code control branching, VirtualDub 1.5.7 is out, and it contains none of the new features Im currently working on. As a pure bug-fix release, though, it should bring nothing but more stability over the 1.5.6 release. Try it, report errors, wait for new release, lather-rinse-repeat. Well, you probably know the drill. 1.5.6 contains a bug in its MP3 rate correction code that causes MP3 audio streams to be written out to AVI files with dwLength0 in the header, resulting in some players not playing audio properly. This has been fixed in 1.5.7. For those of you who have already pushed audio through 1.5.6, do not fret mdash the problem is easily fixed by running the file through VirtualDub again in directdirect mode, which will cause the dwLength field to be recomputed properly. Now for stupid shell tricks. I got asked an interesting question today about Windows NT command scripts: someone needed to write an INI entry out to a file without extra spaces, but found that the obvious: doesnt work, since the shell interprets 1gt to mean redirect stdout. In a Unx-like OS, this would be simple: backslashes are magic and solve everything. Well, Windows NTs shell isnt quite so simple, but using the shells quote character to keep it from interpreting 1gt as a single token still works: But then I got asked another teaser: how to write the string off to a file. Again, the obvious doesnt work, due to a rather annoying special case in the echo command: Quoting doesnt work here because its the internal echo command that is the problem, not the shells command parsing. In fact, echo is quite lame, because it doesnt support escapes, or any way to suppress the newline that it prints. The best that I was able to come up with off the top of my head was to abuse the filesystem: Im sure there are better ways, but I havent thought of one yet. 10132003 News: Codec issues A friend of mine takes examples of bad code snippets that he finds and emails them out to a select few under the subject Lame Code of the Week. Typically, these are code snippets have have dumb errors, like this: Why do I mention this Well, I finally got a chance to debug against that blasted Grand Tech Camera Codec thats been crashing all over the place, trying to decompress formats that belong to other codecs. It turns out that the validation done by its ICDecompressQuery() function is to check. the width and height. Believe it or not, it doesnt check the FOURCC. Even worse, if you call ICDecompressQuery() or ICLocate() with any format as the target format, the codec essentially checks. tidak ada. Does it accept DivX Yes. Does it accept Indeo Yes. Does it accept 43-bit RGB Yes. That means it claims to be able to decompress ALL formats As such, it is only fitting that I award GTCODEC. DLL the Lame Codec of the Week award for absolutely breaking the Windows video codec system. This doesnt normally affect VirtualDub too badly as it first attempts to search for a codec with the same FOURCC as the compression format. Where the codec screws over applications is for formats that either (a) arent accepted by any currently installed codec, or (b) are secondary formats that a codec handles besides its primary format, such as YUY2 . In these cases if the codec search ever gets to the Grand Tech codec the codec grabs the format and then immediately crashes trying to decompress it. Im not sure if this is something I can work around. I can rewrite VideoSource. cpp to do a manual codec walk and avoid it, but the DrawDibDraw() call implicit in the Windows video capture system is a bit more difficult. And I cant work around the problem for Avisynth or an embedded DirectShow graph. Wonderful. Needless to say, if you have this codec installed I recommend you uninstall it. The other codec worthy of mention is the VFAPI Reader Codec. This codec will trip the FPU warning in 1.5.51.5.6, apparently because it was built with Borland CC, which for some strange reason likes to flip the FPU to 80-bit and exceptions enabled in initialization code of DLLs it initializes. This then causes problems in other floating-point code that expects to be able to use invalid or indeterminate number values without crashing, which is ordinarily the case with the Win32 standard 64-bitall-masked mode. MP3 codecs tend to have this problem and Direct3D, if it ever gets initialized, may also trip in its transform pipeline. This is a rather obscure problem and Im not surprised that the author didnt catch it Ive notified the author but havent gotten a response back yet. 1.5.6 will correct the FPU control word back to standard 027F . so odds are you wont see ill effects in VirtualDub besides a warning. For older versions or other applications, the exception that occurs is FP Invalid Operation (C0000090) the failure condition is thankfully rare so even if you do have this problem youre not guaranteed to crash. There is a rather stupid bug in the AVI append command of 1.5.6: it increments the filename extension rather than the core name itself, so it tries to open foo1.avi1 and foo1.avi2 instead of foo2.avi and foo3.avi . This was actually in 1.5.5 as well but nobody happened to catch it during the experimental phase, and doesnt happen if the segments are implicitly attached during the first open, so nobody caught it. Sigh. If I come up with some good workarounds for above problems in 1.5.6 I might get a 1.5.7stable out soon with a fix for the attach bug. Kita lihat saja nanti. 10102003 News: VirtualDub 1.5.6 (stable) released Thanks to all that reported issues with the experimental 1.5.5 release -- nearly all the bugs have been fixed in 1.5.6. Im beginning to like branched development while waiting for bug reports to come in on the 1.5.5 release, I was able to work on the dev branch for 1.5.7 without screwing up the stable release. So unless there is a goof-up that needs to be addressed, expect 1.5.7 to be the next unstable release. 1.5.5 had a bug in that it forgot to disable the displays when fast recompress mode was enabled, and so it displayed garbage during rendering, because it blitted YCbCr data to the screen as RGB formatted data. So I fixed it in 1.5.6 by setting the format correctly so that the display code either uses a YCbCr hardware overlay or software conversion fallback, thus turning it into a feature. So now you can see the input video during a fast recompress operation, if the format is UYVY or YUY2 . I forgot to mention with 1.5.5 that many of the filename strings in script files have changed from ANSI (8-bit) encoding to UTF-8 encoding, and thus configuration files that have been saved with high-bit characters in them arent portable between 1.5.4- and 1.5.5. This was required because 1.5.5 can read and write files with filenames that are not ANSI-safe. For those of you maintaining front ends, information on UTF-8 encoding is available from the Unicode website. Windows 98, 2000 can convert directly to and from UTF-8, and for 95NT4, the conversion between UTF-16 and UTF-8 is straightforward. All of the UTF-8 characters are escaped using x notation and therefore the script file itself remains ANSI-safe. Ive been playing Final Fantasy Tactics Advance for a while now and have come to a conclusion: its nicer, but easier, than the original FFT. The AI isnt as good in FFTA and its a lot easier to do huge amounts of damage. Also, unlike FFT, in this game you get awarded XP even for worthless actions, such as curing someone whos already at max HP. I havent decided whether this is good or not. On one hand, Im walking into battles with screwed up parties and basically stomping all the enemies effortlessly on the other hand, Im not spending hours at a time throwing rocks between party members for JP. Ive only lost one battle -- the main character accidentally dinged the last enemy with a sword when swords werent allowed, got sent to jail, and thus ended the game. Baka. 9302003 News: VirtualDub 1.5.5 (experimental) released VirtualDub 1.5.5 is out on SourceForge and is the first version that Ive explicitly tagged as experimental. The primary reason is the new display code mdash 1.5.5 is the first version to use DirectDraw by default, the result of which is a significant increase in rendering speed as well as a usable stretch. (Right-click the panes for the new options.) 1.5.4 is pretty stable at this point and as such its a good idea to split versions into stable and development releases. So please try 1.5.5 and report the problems, and if you have problems, use 1.5.4. 1.5.5 does add a few more workarounds for various problems as well as some optimizations for direct stream copy mode, so if all goes well it should work better than 1.5.4. 1.5.5 allows audio filters to be plugins, but I havent completed the audio filter SDK yet and Im not sure I like the current API. For those of you that want to experiment with it, there is a new samplefilter project in the VirtualDub source code. (Contact me if you want the preliminary SDK.) Keep in mind that the API is still fluid and Ill probably nuke this API version in the future. There is no P4 version of 1.5.5, and there may not be P4 versions of subsequent releases. The reason is that Intel CC 6.0 started miscompiling parts of the code base starting with 1.5.5 in ways that can cause crashes andor heap corruption mdash specifically, in some exception handling contexts the generated code double-destroys objects. This then causes string objects to trash memory. As Im increasingly moving toward dynamic strings instead of fixed-size buffers for text handling Ive decided that I cannot afford possible instability in my P4 releases in exchange for the minor performance gains provided by the Intel compiler. The code base still compiles under Intel CC however, for those of you that want to try. A couple of weeks ago, I posted a longish essay on the various display methods I was considering. The only options that are enabled in 1.5.5 are GDI and DirectDraw, because I ran into some unexpected problems with NVIDIA OpenGL drivers and switching between rendering contexts mdash sometimes nvoglnt. dll would spin for up to 10 seconds at 100 CPU when creating the second context. This is somewhat disappointing as I can control filtering with OpenGL and not with DirectDraw. As a matter of curiosity, I tried Microsofts new GDI API to check out its image interpolation. The GDI people did a great job with the scalers subpixel accuracy its both subpixel accurate and smoothly filters at all sizes, when decimation filtering is requested. Pity its somewhere between one-fifth to one-half the speed of VirtualDubs scaler, which makes it useless. Apparently, GDI doesnt have a hardware DDI of its own and makes use of the regular GDI DDI, so the vast majority of its options are emulated in software. You know, all I want is a simple API for a hardware accelerated stretch blit without BS like lost surfaces and having to do basic pixel conversions myself. Im still waiting. Ive also released version 2.4 of my subtitler filter on the filters page, which fixes a minor error in shadow address calculation that could cause a crash. Its been more than a year since I worked on it and I dont know if anyone still uses it, but I pushed 2.4 out in case anyone was. Ive heard of an expanded SSA format that someone coined A dvanced S ub S tation or a similar name. Personally, I would have picked a name with a slightly different filename extension. Rewatching the anime series Martian Successor Nadesico was a new experience for me. In particular, Im convinced that Vandread should be renamed Nadesico: The Next Generation. I also thought the ending sucked, which is why I sought out fanfics to close the gap, most of which didnt help. Fortunately, I found Magical Girl Pretty Ruri, which is over 500K of sarcastic Ruri baka-goodness. but now I have to wait for episode 23 of it. -- 9252003 News: DivX 5.1, 3ivx, and plugins My announcement for the week is that I will not be supporting the use of DivX 5.1 in any way, shape or form in VirtualDub. That doesnt mean you cant use it or that VirtualDub will prevent you from using it, but merely that I wont bother answering questions about issues with the use of the codec, and any email about the use of DivX 5.1 with VirtualDub will be dumped into the trash. The reason is the following: I have never been a big fan of so-called protection wrappers, primarily due to technical reasons. However, putting one into a userspace driver is one of the dumbest and rudest ideas I can think of. Here are the problems: I cant debug VirtualDub using the DivX 5.1 codec. Remember that B-frame glitch that was causing VirtualDub 1.5.3 to loop infinitely at the end of processing operations Cant use the debugger if the DivX 5.1 Pro codec prevents it. I cant debug VirtualDub AT ALL while the codec is installed, because the DivX codecs protection triggers on load . even in the Free driver. Both the video codec search and video compression dialogs trigger it even if Im not trying to use the DivX codec. I cant debug my other programs either, because if I hover over an AVI file in an Open Dialog, Explorer loads video codecs to try to display a tooltip about it, and the DivX codec terminates my program. The protection dialog has no indication that it comes from the DivX codec, has the client applications icon on the taskbar, and when you click OK, the codec calls ExitProcess(0) and terminates the app . I can live with games and applications that wont let you launch them under a debugger, but when it comes to a driver that keeps me from using Visual Studio in general, the driver gets uninstalled. Segera. It is a waste of time for me to verify any compatibility issues with DivX 5.1 if I have to deal with this crap and I refuse to do so. It looks like DivXNetworks is considering adjusting or removing the protection in their next release please encourage them to do so and resume working on the codec itself, which is what people actually care about. Now that Ive said that. I finally looked into the strange divide-by-zero crashes with the 3ivx D4 4.0.4 codec. The problem is that the codec isnt clearing MMX state properly before returning from ICCompressBegin() . causing VirtualDubs floating-point calculations to screw up. It occurs more often in newer versions because I recently rewrote the interleaver to use FP rather than integer math however, it can still cause filters to malfunction in older versions, particularly the subtitler. As such, I do not recommend that you try using 3ivx with VirtualDub at this time. I have contacted 3ivx about the matter and they say it will be fixed in the next version of their codec in the meantime, I have a change in my development tree that will work around the problem in 1.5.5 if the updated codec isnt out by that time. The same problem and workaround also applies to the Windows Media Video 9 prerelease beta codec, but anyone using that should upgrade to the final release, in which Microsoft has already fixed the problem. A few days ago I looked at the filter SDK I currently have posted and concluded that even though I am a native English speaker, no one could tell that by reading the SDK. As such I have decided to rewrite it, as well as rework the API headers to a somewhat more sane form that doesnt require massive pointer hacking to push pixels. Writing documentation takes an awful long time, and its no surprise that many programmers dont bother with it at all, especially when you consider that incorrect documentation is in some ways worse than no documentation. This will get worse when I export the audio plugin API, which introduces multithreading into the mix. I design my APIs for longevity mdash filters dating back to VirtualDub 1.2 are still valid mdash so Im hoping that my new APIs remain relatively simple and easy to program for. Well see, I guess. 9142003 News: Back to normal (long) Everythings calmed down a bit here. Im on vacation now, SoBig. F expired so I have my Inbox back again, and its not as blisteringly hot as it was a month ago. That gives me some time to attack The Legend of Dragoon and Final Fantasy Tactics Advance. Oh, and I could work on VirtualDub too. (grin) Warning: Long technical brain-dump ahead. VirtualDubs display code is a bit dated and Ive been working on rewriting it to support resizing and bilinear filtering, as well as more speed . Playback mode already had some acceleration features, but the main edit mode had only the lamest support for stretching, and the filter preview window couldnt be stretched at all. There are currently five ways I could implement for blitting stretched images, and none of them are particularly complete. Frankly, it amazes me that stretching an image onto the display still requires this much effort. Timing statistics below are done on a P4 1.6GHz laptop with an NVIDIA GeForce4 Go 440, with preliminary 1.5.5 display code. Win32 Graphics Device Interface (GDI) . Ubiquitous and the most reliable this is VirtualDubs fallback in all cases. Vendor drivers for GDI are mostly very stable and functional at this point, which is good unfortunately, there are two problems. One is that GDI provides extensive software fallbacks which are not very fast. In fact, theyre quite slow, and how much theyre used varies widely between drivers. Many drivers implement hardware color conversion, but only a few implement hardware stretchblts. (A WinHEC presentation from a few years back indicated that a number of vendors tried accelerating StretchBlt() and ended up failing WHQL tests because they got texel alignment wrong.) ATIs GDI drivers are particularly good in this department and a BitBlt() on a RADEON will often beat a DirectDraw rendering path Matrox may be good here as well but Ive never had one. The Windows NT implementation of GDI is significantly more powerful than the 9x implementation -- it can do a filtered stretch if you call SetStretchBltMode(hdc, HALFTONE) . The subpixel precision is poor, as it appears to perform a point-sampled stretch and then do a low-pass on top, and the speed is even worse than a normal DIB StretchBlt(). However, its not bad considering that you only have to add one line to enable it, and in the time of NT 3.1 I imagine it was very high quality compared to anything hardware could do. The upcoming Longhorn release of Windows is supposed to have a next-generation GDI that is based on top of Direct3D and will provide much better performance and capability than the existing GDI. I just hope the API isnt a mess like DirectX and that it isnt going in the same direction as the rest of Longhorn. Managed Explorer was not what I wanted to hear, after my experiences with Visual Studio . On my dev machine, a 1:1 GDI blit of a 320x240 image takes about 1.2ms, and a 1.01:1 stretchblt takes 11ms. That 10:1 ratio on stretches is a killer -- taking 13rd of the frame time on a P4 1.6GHz is a bit excessive DirectDraw offscreen surface blit DirectDraw is very much a self-serve API in that you have to handle a lot of device abstraction yourself, but of the few features the hardware emulation layer (HEL) will emulate is stretchblt, and it does so at much faster rates than GDI. DirectDraw does not support color conversion in any way, however, which means that in some cases it can be beaten by a well-optimized GDI driver for 1:1 blits. The DirectDraw API is also much less simple to use than GDI because you have to create about a dozen objects, check pixel formats, check for lost surfaces, check for failed lock calls, etc. DirectDraw doesnt give you control over whether a blit is filtered generally only 3D chipsets will do so and it looks like DirectDraw uses integer coordinates internally so the blit warps when it is clipped and sliced into subrects. Finally, DirectDraw doesnt cooperate very well with multiple threads my current implementation simply thunks all DirectDraw blits down to the UI thread because doing anything else is likely to break. On low-end systems whose blit probably consists of rep movsd anyway, its probably even faster to write directly to the primary surface than to blit through an offscreen one. But this involves considerable complexity given that to obtain best write performance you need to write 64 bits at a time to VRAM, which is a pain with 16-bit or 24-bit pixels, and you cannot write unaligned to VRAM (doing so can lock the system if VFLATD is active). Clipping also has to be done manually. Handling the fixups at the beginning and end of scanlines while doing color conversion and a stretchblt is not much fun. DirectDraw blitting is slower than GDI for 1:1 at around 2.4ms -- but it stays 2.4ms even when stretched to 1600x1200. Hardware acceleration is good DirectDraw overlay surface DirectDraw overlay surfaces are essentially giant hardware sprites and have two big advantages over offscreen surfaces. Overlays are generally done via special scanout hardware rather than a generalized blit engine, so you can get bilinear filtering and fast, large stretching even on low-end hardware. (Bandwidth requirements are reversed for an overlay stretch compared to a blit stretch because the higher the stretch ratio, the less often source pixels have to be fetched relative to scanout rate, and the result is never written back to the framebuffer.) The second advantage is that the overlay generally accepts YCbCr data instead of RGB, and most modern codecs work in YCbCr, allowing software color conversion to be skipped. Overlays do cut some corners compared to blit engines, though. The filtering is sometimes not as good as the 2D or 3D engine NVIDIA TNTTNT2s cannot filter vertically with some driver revisions and many video chips dont fully upsample chroma. Sometimes the luma will have excessive contrast as well, which is annoying. Overlay hardware almost always supports the two primary YCbCr formats, YUY2 and UYVY, but rarely support RGB. And finally, on most hardware you only get a single overlay, which means they cant be relied on for general image display. One notable exception was the Tseng Labs ET6000, which was probably one of the last PC video cards to use a display list and could support as many as three overlays at once. Overlays beat GDI BitBlt() slightly at 0.9ms instead of 1.2ms. Thats cheating, however, because UYVY is 16 bits per pixel instead of 32 bpp, and thus only half as much data is being uploaded to the video card. I like OpenGL -- its a well-designed API with a well-written specification. Seeing as though image stretching is a subset of texture mapping where U and V are constrained to X and Y, respectively, it seems perfect for this task. A bit of groundwork has to be done here in that the image has to be broken down into overlapping textures, but any respectable 3D card (read: ATI or NVIDIA) is going to support OpenGL 1.2 packed pixels and hardware mipmap generation. With hardware mipmap generation, we get trilinear filtering, which means no more aliasing when shrinking. And unlike DirectDraw, filtering can be forced off when its not wanted. Using the 3D pipeline gives you other niceties, such as free brightnesscontrast adjustment (modulate2x add specular) and free dithering. Did I mention I like OpenGL I like OpenGL. Go learn OpenGL :) One downside to OpenGL is that it doesnt support YCbCr textures, so color space conversion cant be performed in hardware. Another is that there are some really bad consumer-level OpenGL implementations out there from the early days of 3D on PCs (so-called QuakeGL implementations). A couple of years ago, most of the OpenGL drivers still hadnt gotten texture conversion correct and were frequently swapping redblue or thresholding alpha incorrectly NVIDIA was noticeably ahead of the game in this department although it got a few wrong too. One vendors implementation was so bad that it set depth mask off on init, fouling up depth clears unless you called glDepthMask(GLTRUE) . and would blue-screen the machine if you did an empty glBegin()glEnd() pair with no verts. These problems have been mostly cleared up now but occasionally you still hear about someones glTexGen() goofing up. And glDrawPixels() still sucks on consumer drivers. OpenGL is the same speed as DirectDraw at 2.4ms. However, it doesnt clip funny when other windows are on top, and the coding is a lot more enjoyable. Direct3D, or rather, DirectX 9 Graphics, doesnt have much of an advantage over OpenGL for an operation as simple as image stretching. I thought briefly about whether itd be possible to abuse pixel shaders to do bicubic resampling, but itd require something like 32 texture fetches and I dont have any ps2.0 capable hardware anyway. The one major advantage DX9 should have over OpenGL for image blitting is StretchRect() . since it could do hardware color conversion during the stretch. (NVIDIAs OpenGL drivers do color conversion in software on texture upload.) Unfortunately, I dont seem to have any hardware that supports this, and even if I did, Im not sure I would want to put up with the stupidity of CheckDeviceFormatConversion() . The same goes for YCbCr textures -- I can only use them if I use the reference rasterizer. Yay. This rendering path is the one path I havent coded yet and probably wont code at all. The API is not generalized there are large parts of the API that can be readily identified as the NVIDIA part and the ATI part, and other parts that have no formal spec other than probably like OpenGL. Youre supposed to check caps bits for available functionality, but some of the caps bits are so basic that the 3D device is useless if they arent set, others have never been set by any device other than the reference rasterizer, and of the rest you cant easily tell which ones are usable because no vendors publish their caps bits. The best part of all is that when your display is switched out, either due to Ctrl-Alt-Del or a full-screen exclusive app starting, DirectX goes nuts. All of your textures are instantly dumped into oblivion, your calls fail and you cant do anything, and you start hitting driver and kernel bugs en masse . such as a vertex buffer lock succeeding but giving you an invalid pointer. And finally, requiring that the global application FPU precision be kicked down to single precision (24 bits) for performance is ridiculous. This is enough hassle for a game, and more than I want to deal with for video displays in VirtualDub. I dont have any numbers for Direct3D because I dont have a D3D path written. I dont expect they would be significantly different from OpenGL, however. On a passing note, Ive been watching the latest developments in the NVIDIAlt-gtATI war with some amusement. All Ill say is that the current GPU situation bears an awful lot of similarities to the Pentium 4 vs. Athlon race hopefully itll continue in the same fashion, where both companies periodically kick each other in the rear and consumers get ridiculously fast and cheap hardware out of it. 8202003 News: Temporarily absent Life outside VirtualDub has been very busy for me lately, so I havent really done much hobby coding lately. That isnt to say that Ive neglected Vdub completely. well, youve probably heard this excuse before. Whats new this time is that my email has practically been made useless by the new strains of worms that have been making the rounds. Thanks to the miracle of Microsoft Outlooks address book, Ive been getting nailed nonstop by 100K viruses and bounces from idiotic mail servers that dont know about From: spoofing viruses, to the tune of about 10-20x the volume I was previously getting. Ive temporarily lowered my maximum email size to 50K and raised my Inbox limit to 50MB in an attempt to keep the crap from blocking legitimate email, but this is becoming difficult with 100 million styles of mail delivery failures and you may have severe problems getting through to me. For VirtualDub questions, the forums are currently a better way to get help. Please try to search the forums first and DO NOT send me a PM with your question as well. I hate people who dont do due diligence before spamming their question across eight forums and my email address. A recap of some current issues, since Im too tired to update the KB right now: The Creative MP3 codec is responsible for the frame 9995 hang problem -- due to the way that audio codecs are enumerated in Windows it can activate even though you think youre choosing a different MP3 codec. Disable it or lower its priority in Control Panel, Multimedia. quotGrand Tech Camera Codecquot may conflict with the DivX codec and cause crashes attempting to decode DivX material. Disable it if you have problems and it shows up as the offender in VirtualDubs crash context dump. The quotLame ACM 0.9.1 (stable)quot codec distributed in the Nimo codec pack has a habit of crashing during Windows audio codec searches. Guess what the solution is Non-interleaved audio is currently broken. Ups. Will fix later. 6192003 News: Optimization Ive been told that VirtualDub 1.5.4 doesnt process files in direct mode as fast as 1.4.13 does, because it drops to non-streaming AVI read mode. This isnt surprising as the 1.5.4 video and audio pipelines are decoupled and thus tend to drift farther apart during operation, leading to the AVI layer disabling streaming due to a high cache miss rate. After adding pipeline balancing code as an attempted fix, I profiled the app under Intels VTune Analyzer and discovered a different problem: when processing a highly compressed file in direct mode, the highest CPU hogging function in the app is alldiv . alldiv is the Visual C 64-bit divide function. Apparently, VC6 cant convert signed 64-bit divides by constant powers of two into adjusted shifts, like it can for 32-bit ints. (For that matter, neither can VC7 or VC7.1.) This just goes to show that when optimizing an app, the correct route is always profile, profile, profile Expect Direct mode throughput for highly compressed video files to be significantly better in 1.5.5. Someone asked me recently if full-scene antialiasing (FSAA) on a 3D card could be used to improve deinterlacing quality. Sorry, no. The reason why FSAA improves visual quality is that polygon space has infinite resolution due to precise triangle primitives and texture interpolation, and you can always improve the quality of triangle edges by sampling more. That is not the case with deinterlacing where your input and output sample resolutions are the same (and finite). I should know, since Ive written software 3D rasterizers and know intimately how triangle rasterization and supersampling work. If you dont believe me take a look at VirtualDubs About dialog. :) The problem with 3D programming is that you spend half your time getting anything to draw on screen at all and the other half figuring out who left alpha test enabled. Many of you have had trouble compiling VirtualDub 1.5.x due to errors in ltvd2systemzip. hgt and the OpenRaw function. The problem is this gem in some versions of Microsofts winbase. h : To work around the compilation errors, rename VDZipArchive::OpenRaw() to VDZipArchive::OpenRawStream() . Note that unless you are using Visual Studio , you will still need to update your Platform SDK headers as the ones that come with VC6 are quite old. 5282003 News: VirtualDub 1.5.4 released First, thanks to all of you who sent various Windows ports of ls in response to my last post, but it really wasnt necessary. 1.5.4 is out, and is another quick bugfix release -- so grab it and pound on it. Actually, as I type this, I havent yet uploaded the SourceForge download page yet, so if youre too fast, go play a game for an hour or something until I get it up. The main changes are a fix for a thread race condition and a workaround for the hang at the end of a 2-pass DivX 5 operation. Also, I threw in bitrate calculation for AVIs under File File Information. If it still doesnt work. well, I guess well just try again. Regression testing Whats that If it compiles, it is good, if it boots up it is perfect. mdashLinus Torvalds, right before release of Linux 2.1.94 I just got my Visual Studio 2003 upgrade today, and although I was expecting to get a CD in an MSDN-Library-style package, I got a heavy box that was essentially the same as the full 2003 package, except the box was uglier and the CDs were upgrade only. Fortunately, the 2003 upgrade allows you to install 2003 without 2002 installed first, as long as you provide the 2002 CD for a moment. (I was amused by the 2002 installer, which asked me to register after it had finished the uninstalling the product.) The IDE looks largely the same, with the same ugly flat style and a tendency to overrefresh the solution tree, but the build dependency check appears to run much faster. I dont know if theyve fixed the butt slow output window yet. I havent dug much into the compiler yet either, but at the very least it generates much better code for MMXSSESSE2 intrinsics, finally making them useful. One more note: if anyone has purchased a copy of video software called quotLuxuriousity Video, quot please drop me an email at phaeron (aht) virtualdub (daht) net. I have a question to ask. 5202003 News: VirtualDub 1.5.3 hopefully not buggy Releasing VirtualDub is an interesting process. SourceForge and my web account on pair are Unix-based, while my development environments are Windows-based. Given that I have to juggle four machines during the release process, its guaranteed that this happens at least once per release: Thanks to some detective work by some forum members, its fairly apparent now that the culprit of the famous frame 9995 hang bug is the Creative MP3 codec. If you are experiencing this problem, drop the priority of the Creative MP3 codec in Control Panel so that the Fraunhofer codec activates instead. (In Windows 9598, this is done under Multimedia in XP, go to Sounds and Audio Devices, Hardware, Audio Codecs, Properties. I dont remember where it is under 2000.) I dont have a Creative card that does MP3 assist, so I cant verify the problem myself, but Ive seen enough reports now that Im fairly certain of the diagnosis. The underlying problem, though, is that although you can have several MP3 codecs installed, you may end up using a codec other than the one you chose because the audio compression dialog only remembers what format you picked and not which codec that format came from. So even though you picked MPEG Layer-3 . you can actually end up with a format from plain old MP3 . Thats true of both the standard Windows codec dialog ( acmFormatChoose ) and VirtualDubs custom dialog. I can fix the latter so that it records the ID or name of the codec as well, but I figured Id warn you since the mixup can hit other programs too. Youre better off toggling drivers so that only MP3 codec is active at a time. And for the last time, MP3 stands for MPEG audio layer III, not MPEG-3. There are both MPEG-1 and MPEG-2 variants of audio layer III. 1.5.3 is out and is mostly only bug fixes. The known regressions in 1.5.2 have been fixed and I even found some bugs from the 1.4 branch that no one noticed. I had to rip out and redo some algorithms that didnt work out in 1.5.2 in particular, that build tried to dynamically guess which frames were going to be pushed in the future, and although 1.4.13s algorithm wasnt totally correct, it was better than 1.5.2s. So 1.5.3 simply computes a static reverse frame map at the start of the operation. This consumes 16 bytesframe, but if youre processing a 100K frame file I assume you can spare 1.6MB for the frame map. (It swaps well anyway.) The reason for the complexity is that VirtualDub tries hard to allow frame skipping in Direct video mode even though technically it cant be done exactly, and this has to happen in the middle of a multithreaded pipeline with audio interleaving active. What happens is that frames get pulled sequentially from a key frame until the next key frame is available. So resampling from 30fps to 25fps on an MPEG-4 stream that has key frames every 1000 frames is likely to produce badly desynced output, but on an Indeo 5 stream with key frames every 15 frames itll almost be perfect, and on a Huffyuv stream itll pull exact frames. You can also now upsample a stream to a higher frame rate in Direct mode, which is always exact. Since the upsampling works by inserting drop frames, its virtually free ( 24 bytesframe) space-wise and even allows the player to drop the duplicates since it knows the frames are dupes. As it turns out, one of the quotfeaturesquot in this release is actually a bug. People have asked me to hook the spacebar to playback and stop. Well, hooking it to playback is easy, but hooking it to stop is a problem because the processing mode message loop is actually a separate modal loop that doesnt have access to keyboard accelerators. (I could just shove it in via a global, but thatd be so 1.2.) So I only implemented the playback command, thinking Id solve the stop for later, and I discovered that space already worked for stop The reason is that inline playback spawns a visible status dialog first and immediately hides it, so it takes the focus and its default button is Abort. which responds to both space and Enter. This works fine unless the window focus changes. Needless to say this is lame and I need to implement stop properly for a later release, but its amusing that the bug worked out this way. 5102003 News: VirtualDub 1.5.2 buggy Im sure this is not news to many of you, but I figured I should note it now since Im in the middle of crunch time at work and dont really have the time to address this immediately. Essentially, the 1.5.2 processing pipeline has two major bugs in it: it doesnt always flush all the frames out of the pipeline before it finishes, thus cutting some frames off at the end, and the mapping from output frame to source frame is incorrect when Direct video mode is used and frame segments have been deleted. This basically means that 1.5.2 is not stable for production use and you should only use it for testing or experimentation until I release 1.5.3. If you do not have 1.5.2, it is available in the quotprevious versionsquot section at the bottom of the download page on SourceForge. However, please do test 1.5.3 for bugs that I havent heard of, as I want to squish as many bugs as possible This is the current changelist for 1.5.3, which has not yet been released: I know the Knowledge Base is very much out of date, but I figure given a hard choice between my documenting bugs and actually fixing them, youd prefer the latter. Also, apparently some of you havent heard of the term quotregressionquot it means reverting to an earlier, lesser state. In quality assurance (QA) testing, it refers to the recurrence of a bug or undesired behavior that was previously fixedimproved in an earlier version. So regressions are new to recent versions, whereas other bugs could go as far back as 1.0. Regressions are most likely to happen while fixing other bugs, and thus its very important in large scale projects to track bug history and make sure the code actually goes, well, forward. As it turns out, I dont have much of a regression test plan, and the test file I frequently used for 1.5.2 (Vandread 1st season OP) has a lot of repeated frames at the end. Baiklah. Interestingly, the Windows Media group at Microsoft has released a beta Video Compression Manager (VCM) codec for Windows Media Video 9 that interoperates with Video for Windows based applications. It works with VirtualDub, with the exception of batch mode due to a bug on my part (see above). The interface is a bit (ahem) familiar, but it contains all the options you would expect from a modern codec, including CBRVBR and 1-pass vs. 2-pass. (And they spelled my programs name correctly on the web page. ) Looks like another toy to tinker around with, at the very least. 4302003 News: VirtualDub 1.5.2 released Adding more features adds more code and thus adds new bugs. --Andrew S. Tanenbaum, Modern Operating Systems Feature-wise, 1.5.2 is a minor release, with fixes for a couple of regressions in the 1.5.x series, as well as some fixes for AVI format incompatibilities with other programs. One new feature is logging, which means VirtualDub now notifies you of issues that it used to silently correct. Another is error control -- you can now tell VirtualDub to attempt to work around decode errors rather than bombing the whole operation. Of course, if a codec crashes, the operation will definitely stop anyway. Finally, you can now convert to any other frame rate, the so-called quotfractional decimationquot people have been asking for. This feature can actually target any exact AVI rational frame rate, but the UI only allows you to enter the frame rate in ten-thousands of fps. Edit a script directly if you want to hit a specific value. The temporal resampling is point-sampling, so expect some jerkiness if you attempt, say, an NTSC-to-PAL conversion. Internally, the code has been changed significantly, and for this reason 1.5.2 is more of an experimental release than usual. Specifically, 1.5.2 is the first version to use separate audiovideo pipes and a pull architecture rather than a push architecture. This removed a lot of the cruft in the code related to interleaving and spilling, and in particular should make split segment output more reliable (once the bugs have been shaken out). When the old spill code failed, the result was that nice bug where everything ground to a halt and the program sat forever at 0 fps. (This is not the same as the 9995 frame bug, for which the current theory is a specific faulty MP3 codec -- but reports are still all over the place on this one.) The new code is simpler and also doesnt produce funny interleaving when a delayed-frame codec is active (DivX or XviD in B-frame mode). Its also absolutely guaranteed to piss somebody off by breaking something that used to work, but thats the price of progress. Additional note: the 1.5.2 source archive contains the HTML compiler that I used to build this website, Lina. Its documentation is very terse and its usage somewhat cryptic, but there has been some interest expressed in the past and I note its release here for anyone wanting to mess around with it. As it turns out, VC7.1 beat both VC6 and VC7 by a longshot -- it managed to ICE with C1001 before I even received my update disc. I discovered that the Framework SDK 1.1 comes with the Standard edition of the VC7.1 compiler this version is useless for actual development as it is missing the C libraries as well as the optimizer (castrated code generator), but it has the full parser. The very first code fragment I tried was the one I posted last time and. ICE in ehexcept. c . Sigh. Well, at least I know Microsoft still does their builds on their f: drive. I once frightened several coworkers by deliberately crashing the compiler on the build machine and reading the resultant C1001 message to determine if its copy of VC6 had been patched to Service Pack 5. Ive always wanted to write a quotService Pack detectorquot for Visual C by deliberately using pieces of code that crash the various builds of the compiler and using line directives to print out the detected service pack level, but Ive never gotten around to it. 4272003 News: quotLAME MP3 Codec v0.9.0 - 3.93 (stable)quot Ive been receiving a lot of crash reports of the following form: (LAME is the name of the MPEG audio encoding library, not a quality statement.) It appears that the version of the LAME ACM codec that is distributed in the Nimo codec pack is broken in some way and crashes during codec enumeration, at least under Windows XP. I can reproduce this with Microsoft AVIEdit (a Platform SDK sample application) as well as Windows Sound Recorder ( sndrec32.exe ). In other words, this codec appears to destabilize the Windows XP audio codec system, and should not be installed. I do not know whether the problem lies in the codec itself, or in the particular build that was compiled -- LAME is distributed in source code form only and as such the ACM codec may not be exactly the same when compiled by two different sources. I wasnt successful in compiling it myself since it needs some headers from the Windows DDK, which unfortunately isnt distributed online anymore. I ordered my Visual Studio 2003 upgrade CD on Wednesday, and the next day I received a notice from Microsoft that it was backordered until mid-May. Arrgh. Im looking forward to seeing how improved MMX intrinsic code generation is, as well as how long the new compiler can last before I can get it to emit C1001 INTERNAL COMPILER ERROR (grin). Visual Studio 2002 only lasted about two minutes: Another reason Im looking forward to VC7.1 is that reportedly it fixes a nasty bug in the VC7 global optimizer, namely that it aggressively prunes computation of unused formal parameters in inline functions. Unfortunately, this occurs in functions that have inline assembly that accesses formal parameters by stack offset instead of by name, which is virtually required if you write assembly routines without frame pointers like I do. 4222003 News: When an AVI file is not an AVI file While trying to test some new code in VirtualDub 1.5.2 I happened to make an interesting discovery about Windows Media Player that could explain some of the bug reports Ive been receiving. In particular, I now know why some AVI files refuse to play under Windows Media Player, instead showing a visualizer, even though you have the video and audio codecs you need to play the file. The answer will elicit great amounts of shock and awe. Okay, maybe it wont. The version of Windows Media Player that ships with Windows XP, and possibly newer versions, appear to have a bug in their media type detection code: specifically, any file that contains two or more consecutive MP3 frames in the first 8K of the file is considered an MP3 file. Unfortunately, that means an AVI file written with an MP3 audio track and without an OpenDML hierarchical index has a high likelihood of being mistaken as an audio file. When Windows Media Player does this, it displays a nice flashy visualizer, announces the audio tracks bitrate with some ridiculous duration, and then refuses to play the file properly. DirectShow itself doesnt have this problem, as neither Windows Media Player 6.4 ( mplayer2.exe ) nor the old Media Player ( mplay32.exe ) with the MCI DirectShow driver goofs up in this fashion. I do not know whether this bug affects WMP9, as I dont have that installed and dont plan to anytime soon -- 6.4 works just fine. (I still use WinAmp 2.77 too. Why change what works) Files written out by DirectShow wont trigger this bug, as they have AVI headers more than 16K long. VirtualDub normally wont trigger this bug either, for similar reasons -- it has to reserve space for the OpenDML indices whether or not theyre actually needed. The problem occurs when VirtualDub writes AVI files in compatibility mode (old format AVIs), or segmented files, which automatically turn off the OpenDML index support. In these cases, VirtualDub writes a smaller 2K header instead, and this is what triggers the Windows Media Player bug. Annoying as it is, Ill probably modify VirtualDub to write 8K headers instead, as there is practically no need for tiny AVI files, and I dont feel like doing more experiments to figure out exactly what Windows Media Player 8 considers valid MP3 frames. Current files that have this problem can be quotfixedquot by running them through VirtualDub in Direct mode with the normal Save AVI option. For best results, disable the quottrimquot option in Video gt Frame Rate, so VirtualDub copies all data from both streams even if theyre not the same duration. 4122003 News: VirtualDub I got bored yesterday and compiled VirtualDub in Visual Studio to the common language runtime (clr). After fixing a couple of violations of the One Definition Rule (different global definitions of MyFilterData in different files) and sidestepping a link bug with VDswprintf() . I had about 90 VirtualDub 1.5.2 running in bytecode. Now, the program was still i386 bound (due to inline assembly), still glued to Windows (due to many API calls), and about 5-10 slower -- but it was cool to see my MPEG decoder running in portable bytecode after flipping a switch. Its obvious now where the real work went into Visual C , since it only took a few minutes to convert my program to the Common Execution Environment, including seamless integration with existing assembly functions. Definitely a lot easier than moving to the JVM (Java Virtual Machine). The downside is the file size. VirtualDub 1.5.2 alpha is about 1MB built Release, before getting packed by UPX in the build script. The version is 1.9MB, of which 640K is metadata and the other 250K or so appears to be IL. Its strange to me that IL would be bigger than the equivalent native code perhaps VCs IL optimization is not yet to the level of its native optimization. The 640K of metadata, however, is unacceptable. Having symbol information is not a problem since VirtualDub ships with source anyway, but having a executable thats twice as big is -- and VirtualDub doesnt use C objects that extensively save for a little STL usage. The information would be useful for a crash handler, but its way too big for that, the VirtualDub. vdi file being about one-tenth the size. This is a bit of a disappointment, really. One of the advantages of is that garbage collection and the execution engine are not tied together -- you can still use unmanaged memory as in native C when targeting IL bytecode. Another is that there is significant effort being put into environments for Linux (Mono and DotGNU). Just-in-time compilation (JIT) has advanced to the point that most traditional optimizations are included and the speed hit is acceptable for UI and framework code, and with seamless integration into native code for inner loops OS portability without recompilation is feasible. Garbage collection is still unwanted by me, however. The technology is at the point that the speed and memory usage are much better than before, but it seems that every time a language is converted to garbage collection the first thing the language designers do is kill destructors. Sorry, my scoped lock class cant release a critical section in a finalizer called with random delay between zero and infinity. Managed C (clr) allows you to switch between managed and unmanaged memory on a per-declarator basis with gc and nogc . but those look too much like near and far . Yuck I think Im about to declare quotfeature freezequot for 1.5.2, and begin cleaning up the bad parts of code to prepare for release. There isnt going to be any major new feature in 1.5.2, just minor improvements and more internal tectonic upheavals to prepare for major new features later (thus saving me the effort of writing said features now). 1.5.2 will still not support external audio filters yet, but the internal API has been improved in preparation in particular, filters can now request that the host convert upstream data into a specific PCM format. Im a big proponent of pushing as much work into the host as possible. In my opinion, this makes for easier plugin development, meaning more plugins, and more importantly, more reliable plugins. 2222003 News: VirtualDub 1.5.1 released When you said you wanted free software . you should have specified you wanted bug-free software . I wouldnt have released 1.5.1 nearly this quickly if it werent for the glaring bugs in the 1.5.0 release. The biggest one, the random crash in the menu bar, is system-specific. I actually put 1.5.0 through a short beta test period since I knew the heavy refactoring I had done broke a lot of code surprisingly, even though the testers did a great job and found a number of bad glitches that I fixed before release, none of them noticed the menu bug. I cant even reproduce it on the general version, but it did appear on the P4 version when I actually used the mouse to control the program. Stupid stack-sensitive bugs. The glitches in the capture module were due to some in-progress conversion of filename handling to Unicode -- a couple of subsystems in 1.5.0 are capable of accepting Unicode filenames under Windows NT2000XP, and this will grow with time. The simple truth is that QA is labor intensive, nondeterministic, and boring. Which is basically why most projects, commercial or not, dont do enough of it before shipping 1.0. Asymptotic version numbering schemes are a popular remedy. On the good side, I received 100 crash traces for 1.5.0. Which means that, in spite of the crash context additions I made to the source code, I escaped the cardinal embarrassment of shipping a crash handler that crashes. I recently discovered a few interesting additions to the Visual C code generator that didnt exist in Visual C 6.0 SP5PP. One of them is constant evaluation of static initializers, which means you can, for instance, declare a global 3D vector constant and have VC7 optimize it down to pure initialized data. The second is that VC7 can generate the bswap instruction intrinsically ( byteswapulong() ), which is great for bitmap processing and MPEG bitstream parsing. Third, the undocumented compiler switch QWMIemu causes VS to emit SSE2 instructions with a lock prefix instead of a size override prefix for software emulation purposes. This borders on absolutely useless, but its interesting. A great plugin for the Visual Studio IDE: Fast Solution Build fixes my 1 hair-pulling pet peeve of Visual Studio : the pathetically slow, moronic dependency checker that add 30 seconds to the build cycle because it has to print quotup to datequot over and over and link executables whose dependencies didnt compile. Sangat dianjurkan. We recently switched from VC6 to VS at work, and I swear I would have broken my keyboard in half if it werent for this plugin because the VS IDE is so braindead. I still might due to the 300 baud output window and the dumb solution tree that opens itself up and sorts filenames with case sensitivity on a case-insensitive filesystem. Hopefully VS 2003 will solve a lot of the problems, but Im not holding my breath. Direct3D is another Microsoft creation that makes me want to throw equipment, but at least I dont have to pay for it. (Directly.) Ask me about it if youre really bored and havent heard of a bad API yet. 2162003 News: VirtualDub 1.5.0 released VirtualDub 1.5.0 is out. From the users prospective, 1.5.0 consists mostly of bug fixes, not the revolutionary change you would expect from a major version bump. The reason for the major bump is that the program has been internally restructured, breaking out some code into libraries and cleaning up the build process significantly. Sources are no longer split across four archives and sylia. dll is no longer statically linked, which should simplify project management a bit. That is not to say that there are no new features in this release, however. 1.5.0 is the first release to have audio filtering, which has been a requested feature for some time. Now, the audio filtering system is quite rough at this point -- its not optimized, and the filter selection is a bit sparse. You also cannot write external audio filters yet, although that is definitely going to change. The current selection of audio filters consists mainly of miscellaneous algorithms that I have been playing with for a while within a sandbox WinAmp 2 plugin: Center cut. The classic quotvocal cutquot filter, except that the output is stereo instead of mono. This is accomplished through FFT phase analysis the output will have some warbling in it, but stereo separation is preserved. Also known as the quotmake your own karaoke to embarrass yourself withquot filter. Ratty phase shift. A time-domain, sawtooth-swept delay line, with rake-like correlation to smooth out the jumps. Good for about -20 variation in pitch. Now, you might ask what a pitch shifter is good for with video. Well, 1.5.0 also contains a stretch filter that allows you to slow down or speed up audio, like a tweaked tape recorder. Combine a stretch and a matched pitch shift, and you get a time stretcher. If you set the pitch shifter and stretcher to the same ratio and tweak the video frame rate to match, you can speed up or slow down video. (Yes, you can now make quotSakura Sakuquot even faster) Im still looking for a better pitch shift algorithm -- the current version has problems with clicking when multiple dominant tones are present. I tried a frequency-domain version once, but it didnt work out too well: frequency-based algorithms dont like sharp attacks and tend to quotsmearquot them, making percussives sound mushy. Now, for the obligatory off-topic paragraph: ( warning, plot spoilers ) I saw a bit of the anime Onegai Teacher recently. (I just got the Nuku Nuku DVD -- guess what my next DVD purchase will likely be :) ) I have to say, the series is totally original. A young woman played by Kikuko Inoue, Belldandy Mizuho, ends up living with a student called Keiichi Kei. They have to keep their relationship secret since she is not from this world. They are then visited by two people. Her older sister, Urd Her mother, Hazuho, is a bit lewd and tries to get them closer together, whereas her younger sister, Skuld Maho, doesnt like Kei-chan Kei-kun and wants to break them apart. The couple is broken apart later in the series, causing them anguish, but Belldandy Mizuho returns and they live happily ever after. Not that Im complaining . really -- Belldandy without the tranquilizers, yay -- but geez, this is a bit too familiar. Oh well, voice actor reuse is always fun, I guess. Take Inuyasha, for example. This is another series for which I thought, quothavent I seen this before quot. 1162003 News: Quick fix2 Correction to the Antigua fix below: the DWORD value should be 0x00000000, not 0x00000001, because you want to disable preview. Maaf soal itu 1142003 News: Philips SAA713x (Antigua) capture fix, and random other stuff One of my friends commented that my web page wasnt interesting because I didnt update it enough. I have been in contact with Philips regarding a problem with VirtualDubs capture module and the drivers for the Philips SAA713x (Antigua) chip -- essentially, you can get video capture to work one or two times, then the thing dies. Overlay works, but preview doesnt, and you get zero frames trying to capture. For you coders out there, from the Video for Windows side this very simply means you get zero samples on both the video and preview callbacks. As it turns out, Philips staff discovered that the problem is VFWWDM driver connecting to both the Preview and the Capture pins on the Antigua capture filter. You can disable the Preview pin with the following Registry entry: HKEYLOCALMACHINESYSTEMControlSet001ControlClass nnnn ParametersCapPreviewEnabled DWORD:00000001 nnnn is a four-digit number that varies by system -- the best way to figure it out is to search for a key that is fairly unique to the driver, perhaps ADC Phase Clock Delay or VideoTunerEnabled . There should already be a bunch of other Enabled type entries in the registry key. (Note that CurrentControlSet is an alias and usually points to ControlSet001 .) When you are done, reboot the system (I mean it) and then restart VirtualDub in capture mode. If the change took, the Overlay option should be grayed out. After making this change on my Windows 2000 system, capture works reliably. Full credit goes to the Antigua team for figuring this out -- my role for the most part was trying a capture and responding back, uh, yup, it doesnt work. Please note, however, that the registry change disables functionality in the capture driver and may interfere with normal operation of DirectShow-based capture applications. It may also void your OEM technical support. It is suggested that you keep note of the change for future reference, and perhaps bookmark the key if you are running the Windows 2000XP Registry Editor, in order to delete it later if you experience problems in other applications. As usual, if your system fails to boot, you must have screwed up somewhere. I recently discovered that a lot of the crashes I have been receiving are the result of people aborting VirtualDub after the deadlock detected dialog appears. Folks, when you hit OK on that dialog, VirtualDub does an abnormal process abort, and all bets are off. The most common result is the DivX 5 codec crashing on a line that looks like this: Please dont send me these reports, as they are not useful. This crash can also occur as a secondary crash after an initial crash is intercepted and reported. Do not save the subsequent crashes. The first one is the only one that is useful. I know sending these crash reports is a pain in the butt, so I am working on an improved crash handler for the next release that will report more user-friendly analysis. On yet another note. I am hearing more and more of VirtualDub being distributed in rippacks along with front-ends, Avisynth, and lots of plugins. GPL issues aside, there is an increasingly occuring problem with such composite systems under Windows 9598 -- codecs, filters, andor plugins are failing to load at random. The problem is that a lot of codecs and plugins are statically linked to the C run-time library (CRT). Each instance of the CRT consumes one Thread Local Storage (TLS) slot, of which there are only 64 under Windows 95NT4, and 80 under Windows 98. Once all TLS slots are consumed, DLLs fail to load. Codecs have a tendency to stick around in memory, and VirtualDub filters are always loaded, so you can hit this limit really fast. VirtualDub will be switching to dynamic filter loading in order to help this problem somewhat, but codec authors can help out by using the shared CRT -- switch C Code Generation to Multithreaded DLL . Dont do this if you are using Visual Studio , because it will require MSVCR70.DLL instead of the MSVCRT. DLL that ships with the OS. End users can work around the problem by temporarily removing filters, codecs, and plugins that arent immediately needed. Better yet, upgrade to Windows 2000XP, which raises the TLS limit to 11302002 News: VirtualDub 1.4.13 released 1.4.13 is out way ahead of schedule, because I botched some changes to the resize filter and RGB color conversion routines very badly in 1.4.12. It also fixes a nasty audio desynchronization bug that has been in the codebase for some time. Please upgrade to 1.4.13 ASAP. Also, the P4 version now has its own. vdi file for debugging purposes. Now if youll excuse me, I have to wear the brown paper bag on my head again. 11232002 News: VirtualDub 1.4.12 released 1.4.12 is out, and it breaks a rather old VirtualDub tradition: it has a separate version optimized for the Pentium 4, instead of all optimizations in one codebase. Intel Corporation has graciously given me a 3.06GHz Pentium 4 with HyperThreading Technology, along with copies of Intel CC and VTune, and I spent some time optimizing the MPEG-1 decoder, resize filter, and color conversion routines for the P4. The reason for the separate executables is that the P4 version is compiled with Intel CC with the QxW flag, and wont run at all on CPUs without SSE2. However, dont fret, because 1.4.12 still has a standard executable with auto-CPU-specific dispatch, and it even has some of the optimizations of the P4 version. To run the P4 version, just drop it in the folder where you unzipped the regular version, and launch VeedubP4.exe instead it otherwise should function exactly as the usual version. And yes, I do still have a Pentium III, so I do know the normal version does not require a P4. This wouldnt be a new release, of course, without a little something for everyone else too. As it turns out, the HyperThreaded CPU exposed non-atomic synchronization code in the playback routine, and so this version fixes random lockups during playback on any SMP or HT-capable system. (A rather neat feature of HyperThreading is that you find all the mistakes in your threading code without having a second CPU do nothing all the time other than run WinAmp.) The VTune 6.0 profiler also spotted an unaligned row buffer in the resize routine, which should execute a little faster now. I fixed a bug that made the copy construction support in the filter API unusable, and fixed the directory bug that everyones been telling me about in the Save Image Sequence command. Im sorry I wasnt able to squish some of the other bugs or missing features that still exist, but I wanted to get the P4 version and the above critical fixes out first. My philosophy is that one executable should contain optimizations for all CPUs and users should not have to switch executables to do so, but I have to rethink my strategy for doing so. Intel C has a much better code generator than Visual C, even VC7 -- the output of the Intel compiler makes me say quothey, thats pretty good, quot whereas my reaction to Visual Cs output is usually quothey, that doesnt suck. quot The main problems are that CPU-specific dispatch is tougher when you have a large amount of C code involved, and that the Intel compiler generally produces executables about 30 bigger than the Microsoft compiler. A third downside is that the IC inline assembler. well, miscompiles some of my assembly code. For that reason, one module in the P4 version, mpegidct. cpp . is compiled with Visual Studio rather than Intel C. My plan for current releases, however, is for the codebase to be buildable on Visual C 6.0 SP5PP, Visual Studio , and Intel C 6.0. Also, at least for the short term, VirtualDub will continue to run on all 80486-compatible CPUs I havent decided to require MMX yet. On a random note, Ive been distracted by two new games. I just started playing Final Fantasy X (beaten FF4-FF9FFMQFFTSD2SD3RS3, cant stop now), and although I like the new battle system, I hate Blitzball. Its actually been out for a long time, but I just got it. For some reason, I have a strong urge to rename my main character quotSelphie. quot The other game is Need for Speed: Hot Pursuit 2, which has much improved car physics and gameplay -- the cops no longer have the giant electromagnet at their disposal and resetting gives you a running start -- but what I find annoying is the game insulting my driving. I play NFS:HP2 with the keyboard, and the cops keep radioing quothes all over the roadquot to each other. The changelist for 1.4.12: 1112002 News: Knowledge base, filter SDK, and scripting document updated The Knowledge Base has been updated rather lamely -- I just copied over the entries from the changelog that were likely to have affected users. The filter SDK has also been updated to V1.05 in order to cover the copy construction of filter structures, a feature new to 1.4.11 that allows you to write filters that use regular C class objects for the filterdata structure. I have a filter framework in progress that makes this easier, but it is not ready yet. Also, the scripting document now covers the configuration commands for the logo and HSV filters, and the new SaveImageSequence() command. Finally, heres the changelist for 1.4.11: 10312002 News: VirtualDub 1.4.11 released VirtualDub 1.4.11 is out there -- I finally released it because there are two critical bug fixes in it that Ive been sitting on waaay too long. One is the OpenDML bug, which went largely unnoticed in my testing because it only occurs with variable size video blocks. The other is a MIME64 encoding bug that was probably crashing some codecs in batch mode by trashing the last byte of the codec configuration structure. My apologies -- I value third-party software compatibility very much, and these were two very big booboos. Accompanying 1.4.11 is a new release of buildtools, 1.2, an upgrade over the 1.1 release that almost no one had because I forgot to update the link. Anyway, if you are fooling around with the source and want to rebuild the debug resource: 1.4.11 has a new icon as well, thanks to Spire. Actually, I touched it up a bit and removed the 256-color versions (although good looking, I couldnt bring myself to ship with a 10K icon), so dont blame him if you think it looks bad. To tell you the truth I rarely pay attention to the aesthetics of my program, because although many people appreciate open-source software, very few appreciate open-source art, because it generally looks ugly. Honing the quotbad art skillquot is very important for a professional career in software development because it lessens the chance of your company shipping a product with your programmer art, which is generally embarrassing. 1.4.11 also includes a new scripting command for saving image sequences, and supports a few new features in the filter API, most notably a copyProc procedure -- this allows you to implement a copy constructor for your filter in order to escape the dumb quotmust be PODquot restriction in 1.4.10-. Doing this properly requires some slightly obscure C (placement new) so I will probably need to polish a framework for this. Im too lazy to update the filter SDK and scripting document tonight, as well as the knowledge base, so Ill probably do that stuff tomorrow. Uh, I guess thats all I have to say. 1 1 Blinn, Jim. Jim Blinns corner: a trip down the graphics pipeline. Read it 8182002 News: More HTML fun As you can see, Ive spend the weekend doing the exciting task of redoing my website. The biggest part of this was rewriting my HTML compiler from scratch -- it now sanely uses STL strings and generates a parse tree of psuedo-XML that it then reprocesses to output the final pages. This means I can do all sorts of transformations that I couldnt do before. About two-thirds through I realized that I was basically rewriting XSLT, but writing my own utility freed me to add file generation and output-compaction features that XSLT cant do. The other half of the rework is the introduction of CSS into the mix so I can discard all of the stupid ltFONTgt tags I used to have everywhere. Ill probably end up crashing a lot of Netscape 4 browsers out there, but I figure Im doing everyone a favor by doing so. Christian HJ Wiesner has graciously started a VirtualDub users forum at virtualdub. everwickedindex. php -- this looks to be a great place to collect all of the common questions and drop my mail load. I encourage you to try it out. Hes actually been bugging me for weeks to put up this link, and I kept putting it off because I was in the middle of reworking the site -- for that, I apologize. Donald Grafts website was down for a while, but its now up under a slightly new URL: shelob. mordordgraft. Sorry, but our favorite VirtualDub and Avisynth filter guy doesnt disappear that easily. Finally, on a completely random note, if you are a Ranma fan you should really read these two fanfics from Jeffrey quotOneShotquot Wongs site : Theyre very good, although so frighteningly long that they could probably be bound into small books. Oh, be wary of the stories titled quots usual morning. quot You may regret opening them. :)SHOUTcast DSP Plug-in v2.2.3 for Winamp (07192011) The updated version of the SHOUTcast DSP plug-in has now been released and is available from: Note: This updated version of the plug-in will only work on Winamp 5.5 and higher and requires Windows 2000 and higher to work (though the plug-in has only been actively tested on Windows 2000 XP Window 7). Additionally, this is a 32-bit dll like Winamp though should run fine on a 64-bit version of Windows. This is a recommended update for anyone using the DSP plug-in as it resolves a number of stability issues with the prior 2.x and 1.9x versions and also adds support for SHOUTcast 2 along with a number of other improvements to the experience of using the plug-in with a number of bug fixes as well. Changes from v2.2.2 Fixed title updates to remove characters the v2 DNAS will abort a connection on Fixed DSP not starting connections if Winamp is starting minimised Fixed the AAC encoder not being re-loaded if closing the dialog and re-opening without re-loading the DSP Fixed some rare issues preventing the dialog from loading correctly Changes from v2.2.1 Added support for Winamp 5.62s new AAC encoder dll (Winamp now uses Fraunhofers AAC library instead of Coding Technologies) Changed the genre to be chosen from a menu (in supported situations) so only allow supported values Changed MP3 default settings to be 96 kbps Stereo (meant to have been this for a while but wasnt working) Changed default genre to be Misc on clean installs or on loading and not matching the supported genre list Changed the version string so its more like the v1 tools (and pending DNAS Transcoder updates) Changed Description to Name on the Yellow Pages tab Fixed the vu input meters to not show a level if there is currently no audio input instead of keeping the last value Fixed issue with loading of the config dialog not showing the tabs correctly in some situations Fixed sending a manual title update in v2 mode also incorrectly sending inappropriate cached title data Miscellaneous code tidyups, optimisations, removal of unwanted code Changes from v2.2.0 Fixed crash on some machines when the playlist editor is empty Fixed some minor localisation issues with some of the error messages Fixed the installer not setting the DSP as the default DSP for some non-standard installs Changed message when loading in an invalid configuration to mention DSP stackers Changes from v2.1.3 Added new Artwork tab which allows for configuration of how and what artwork will be sent for SHOUTcast 2 streams to a compatible SHOUTcast 2 DNAS) Added support of the IPCGETNEXTLISTPOS api in Winamp 5.61 to better determine the next song to be played even if shuffle is enabled Added explict blocking of trying to load the p lug-in not in Winamp to resolve loading issues and crashes due to lacking api support required Added to the logs tab the option to log the next tracks to be played from the DSP in plain txt or in xml format Added sending of icypub data as per SHOUTcast 2 protocol specifications (only needed for the SHOUTcast 2 DNAS) Added lookahead ini only option for determining how many next tracks from the playback queue (if available) to report (default is 3) Changed all of the SHOUTcast 2 packet generation to fix a number of issues like large invalid packets, being unable to connect, unstable connections Changed all of the title gathering to no long poll Winamp but instead query it directly (reduces cpu usage and inproves reliability of metadata gathering) Changed all of the plug-in UI to use unicode where possible to improve localisation support Changed some of the UI elements to make certain information or errors more obvious (like the Cipher Response message when using the wrong SHOUTcast mode ) Changed the Logging tab to Logs due to the wider range of options it now provides Changed next track logging to be a per-configuration feature instead of being applied globally (as in the previous DSP release) Changed to send the full title in the metadata ltextensiongt block for the first (current) title so it follows the SHOUTcast 2 specs Changed the Send Update button to not be enabled unless there is a title to send as well as disabling the next title option as applicable Changed YellowPages tab to disable options not applicable to SHOUTcast 2 mode and when running as a public server (where the details relating to streamauthhash for the DNAS are used instead) Changed to send a default stream id if one is not specified in SHOUTcast 2 mode to improve DJ connection issues (which can fail if not specified) Fixed some metadata conversions leading to crashes Fixed internal utf8 conversions to prevent malformed SHOUTcast 2 metadata being generated which would cause the SHOUTcast 2 DNAS to block the connection Fixed some of the entered stream configuration options to not accept invalid input and revert to safe defaults as applicable if this happens Fixed some issues with logging initialisation leading to random lockups in some rare cases Fixed memory corruption using SHOUTcast 2 mode preventing Connection 1 being used in rare cases (mainly affected Windows 2000 XP systems) Fixed metadata not being sent if the connection to the DNAS is lost and a connection then comes back or is manually started Fixed clean up of resources if unloading whilst Winamp is still running to prevent a potential crash on close or UI corruption when the plug-in is loaded again Fixed when Winamp is not playing or is paused outputting blank stream data at a higher rate compared to playing leading to higher bandwidth usage then should be happening Fixed the Summary page listview flickering on update Fixed rare crash when Winamp is not playing and certain playlist configurations are in use when t rying to find the next track title Fixed to not reset the music levels if not using the soundcard input on closing Fixed to not reset the Winamp level if not using the soundcard input on startup but will instead apply it on changing to soundcard input Fixed playback queue lookup issues on older 5.5x clients when apiqueue is not present or not correctly loaded when queried Fixed the Send Update option to not send cached information from Winamps title and to not crash in rare situations Fixed rare lockup issue when using the soundcard input due to the input device taking longer to reset than expected Fixed refresh capture device not setting to a valid selection if the number of devices changed Updated help link for the plug-in to go to the new page at wiki. winampwikiSourceDSPPlug-in Updated installer to allow the plug-in to be set as the default DSP as well as run Winamp after completion (with the checked states remembered for next time) Miscellaneous code tidyups, optimisations, removal of unwanted code and other build related changes to make this more portable at a later date Changes from v2.1.1 Added passing of metadata from the playing track (if known) to the server so it acts like sctrans from a client connecting to the stream Added an option to not log Status X bytes messages (enabled by default) and improved log file handling Added a refresh capture device button to help update the plug-in if connected capture devices have changed Changed status info duration to be the time connected rather than a relative date time and allows for more than 24hrs to be displayed e. g. 26:48:57 instead of looping back to 03:48:57 Changed logging to filter Status X bytes messages to only 1 second (if the option to include them is enabled) Changed log files to use CRLF linebreaks instead of just LF Changed logging to remove newlines so each message is a single line to match the status info Fixed crash on Vista (and potentially Windows 7) where no capture devices are being present resulting in no default capture device known Fixed crash in SC2 mode when a different cipher is set in the plug-in to the server as well as indicating this error in the status info Fixed button images in the Soundcard Mixer Control section not appearing on all OSes Changes from v2.1.0 Added new Logging tab on the Output tab to log the connection status messages Added a mini dropdown next to the Lock button for Push to Talk to allow the mode to be automatically enabled on startup Fixed plug-in to not crash when the network connection is lost Fixed random plug-in crashes whilst the plug-in is st reaming (mainly in SC2 mode) Fixed internal plug-in uninstall not always working Fixed SC2 title updates to properly work as UTF-8 and to not strip out characters incorrectly Fixed next track detection to only be reported if shuffle mode is off and not to act in an undefined manner when on the last playlist item (wraps around to the start of the playlist as needed) Fixed title updates to cope with the same title being played but the next song title being different Changed SC2 metadata to not output ltsoongt and lttitle seqquot2quotgt tags in the xml metadata if they are not known (when shuffle mode is enabled) Changed the ltTENCgt tag in the xml metadata to include the plug-in version Changes from v2.0.2 Added a separate capture device fader timeout option Added copies of the plug-in documentation as an installer option Added help and documentation links to the About tab Changed on Vista Windows 7 to only show actually connected capture devices (requires a restart of the plug-in if co nnecting a new device whilst the plug-in is active ( )) Changed the Open Mixer button to open to the recording devices dialog on Vista Windows 7 Changed wording of the legacy mode checkbox to be clearer (hopefully) and added an info panel below to deal with the Cipher response message Changed capture device level to not alter the devices level unless Push to Talk is active Changed the resolution on the faders from 500ms to 100ms (will re-map old settings) Changed opening of help links in the plug-in to follow Winamps style of handling Fixed major issue in the plug-in leading to breaking of Winamp (and 3rd party plug-ins) COM usage Fixed running of the plug-in not starting auto-connect connections when Input or About were the opened tab Fixed capture device level not being correctly handled leading to spiking in on transitions (affected at least Windows 2000 XP where it is all known to work) Fixed capture devices source selection not being remembered Fixed capture device and source l evels not being set back to the non-Push to Talk level if Push to Talk is active when the plug-in is closed Fixed a few localisation issues with missing items on Windows 2000 XP Fixed capture deviceRemoved tooltip from the microphone slider on the line-in page Fixed some issues with the installer and uninstaller Miscellaneous code changes to make some things easier to manage ( ) There are other changes being made to the plug-ins handling of the Input devices over the next few versions so this behaviour will change againChanges from v2.0.0 Fixed SHOUTcast 1 connection errors to a remote connection Fixed authorisation error checking for Ultravox 2 amp 2.1 More changes to the output manager to avoid out of sync states Fixed timing issue which caused out of sequence Ultravox audio data frames in some scenarios Fixed some localisation and tabbing order issues on the config pages Removed unwanted encoder option on the Output - gt Connection tab Added a SHOUTcast 1 mode only information prom pt on how to enter the password for DJ connections Changes from v1.9.1 Added SHOUTcast 2 (Ultravox 2.1) support for the generated stream data Cleanup and general fixes to the streaming support in the plug-in Fixed settings not being saved on Vista Windows 7 Fixed a number of lock-ups in the plug-in (should be more stable now) Fixed plug-in to not stall if Winamp is not playing Fixed a number of UI issues (tabs not showing in all cases, controls not in the correct tabbing order, theming issues, notification icon handling) Config window now remembers its last position between use Improved Lame encoder quality Attempted to resolve standard AAC (LC-AAC) not working (additionally this is reported as audioaacp so it will work with the YP) Uses the current encaacplus. dll (AAC AAC encoder) from the Winamp install used instead of bundling an old version from Winamp 5.1) Fixed SHOUTcast 1 issue with titles containing quotquot amp quotquot Changes made to improve selection of the microphone dev ice allowing for more control over the capture device used Added localisation support to the plug-in (including supporting localised encoder plug-ins when showing their configurations) Some other minor changes including those from the 1.9.2 beta If you do come across an issue with the plug-in, then please do post in this thread with as much information as possible about what youre doing at the time, the system you are using and anything else which will make it easier to understand what is or isnt going on with your install. 1) There are still a few issues regarding the soundcard control features on Vista Windows 7 due to changes made in these OSes in the way it handles sound and how it can be obtained. This is being investigated though there is no eta on when a resolution will be found. 2) By default on new installs of the plug-in, it will enable support for using the newer SHOUTcast 2 features. However, if youre using an older version of the DNAS (or an alternative which is not compatible with the SHOUTcast 2 protocol) then you will need to check the Use SHOUTcast v1 mode (for legacy servers) on the Output Page - gt Connection Tab. 3) This version includes artwork support but requires a newer version of the SHOUTcast DNAS v2 in order to be able to use it. This will be released shortly once a number of other aspects have been completed with the DNASs development. Cipher response received message - If this happens then you most likely are connecting in SHOUTcast 2 mode to a SHOUTcast 1 setup and so need check the Use SHOUTcast v1 mode (for legacy servers) on the Output Page - gt Connection Tab. The plug-in will give you a number of hints if this is what you need to do. Soundcard Mixer Control - On Vista Windows 7 there are issues with the use of the features with the selected device to act as the microphone where it basically will not work in an expected manner. This is being investigated though is not clear if there will be a solution for this. Big thanks goes to kind people whove helped out from trying new test builds to providing access to systems experiencing weird issues as well as those who reported issues with the DSP whilst trying to resolve the issues with connection stability and other things from the v2.1.3 release. WACUP Project ltgt quotWinamp Ramblingsquot - Indie Winamp Dev Blog Originally Posted by thinktink The options in the plugin are for playback fine-tuning. If you find your stream running too fast or too slow you can tweak the rates. Otherwise just leave em at 0.0 and 0.0 (or whatever the install defaults were) respectively. As far as the 3rd party plugin issue. The closest would be the (if memory serves) official NULL Output Plugin that came with one of the old SHOUTcast DSP installs. I think ZeroPointer is far superior though. that would explain why i dont recall having this issue before. perhaps they should ask you to turn it into quotofficial. quot the first option default is 0.0 for playback, but 0.01 for quotend song fudge factor. quot i just dont get what they do my next task is to get this working with the new DNAS, but i dont know if i should try to get that going until i have a permanent static IP, and until the auth hash issues are settled. DrO gave me a lot of good info for it, but it seems like a daunting challenge. Yup, I just double checked, 0.0 and 0.01 are the normal defaults. quotSong End Fudge Factorquot is mainly for input plugins (as of the time ZeroPointer was first constructed) not always sending the same amount of data to DSP plugins as Output plugins thereby causing some data rate flow irregularities in the SC DSP send. 0.01 is based on an average experience with different MP3 files. For a more in-depth understanding of the data rate (about to get technical ish): For machines with a soundcard that Winamp can play to, the soundcard acts as a timer of sorts to control how fast Winamp plays music (and how fast audio data gets pushed through the DSP (or DSP stack if running)). Without that regulation Winamp would just dump the audio data way too fast for the DNAS or anybody trying to listen to your radio station to handle. With no soundcard that inherent regulation is gone. Winamp has to playback to something. ZeroPointer and other NULL output plugins therefore simulate that playback timing. Because not all system clocks are perfect I added this feature to ZeroPointer, again, out of experience dealing with different machines. Hope this answers your questions. Last edited by thinktink 20th May 2011 at 06:50. Reason: grammar well theres already the updinfo command in the DNAS so it can already be sort of done with that (which is basically all a v1 title update is anyway). just from an ease of use then doing it in the DSP is the way to go really plus the source is responsible for the titles rather than the DNAS which is basically just there to hold relay such info. as for the delay, i was more thinking if it was to be done to either have the check work on a fixed delay or make it check for the file being altered and have that trigger the read and sending of the metadata (though the DSPs metadata sending is delayed until it has free time to process it so its never going to be exact - related to the cpu usage tweaks in on song changes in v2.2.x). i guess an additional delay could be added as an option for a custom input. might try to find some time later today to mock up a gui page of the options to see if my thinking is on the right path. WACUP Project ltgt quotWinamp Ramblingsquot - Indie Winamp Dev Blog no rush on this DrO, i have very little exp with the DJ auto XML output, and it will take me some time to findpay someone to write the middleware necessary for it, to conform to whatever winamp standard you specify. (OMT Mediatouch makes the software, but i doubt ill be able to convince them to do it, so well probably look for a 3rd party) Also, amazing as this may sound, were still getting our network in house in order, which we need to do to get the FTP setup. however i wanted to go into a bit more detail on the physical airchain. basically, the quotboardquot controls what inputs and signal is sent to the xmtr. even if its computer automation, the board must route that signal. (thats how physical relay switches could then control what XML or whatever info is used to send to the DSP). our station is totally digital, meaning that all the equipment and connections are made digitally the entire airchain, from the computers to the boardcables to the cd players all the way to the xmtr digital. the xmtr is about a half mile away and the studio sends its signal by fiber optic cable. at the xmtr, the signal only goes analog again as an output of the orban processor which feeds the actual FM xmtr. back at the studio, we have a digital delay dump which is always on, always part of the airchain, and thats basically just a loop that purposely delays the signal by 7-10 seconds so that a DJ can quotdumpquot any cursing. users can select the time they want the delay to work for. so what you hear on the FM is actually 7 seconds old. the webcasts work off of the FM, we tune into it. some people think the webcasts should be fed by a pre-xmtr digital feed, but we like it like this so we can 1. get the benefit of the orban processing, and 2. verify via webcast the FM is actually up and running. however, this means the delay dump puts the webcasts behind too. so the XML created, if sent to the DSP right then, would be 7-10 seconds too early, if you follow me. i think you understood this already, but i just wanted to illustrate all this to make it plain and bc i thought youd find it interesting. as far as syncing goes via shoutcast on output, i figured it wouldnt be exact, but probably close enough for our needs. something else we havent talked about is RDS systems, where basically the xmtr outputs text that car stereos and the like can use. we dont have RDS (yet), but many stations do, and if the tuner they used to feed their webcasts supported it, i can imagine a scenario where the tuner writes a file via serial port or something like that, for the webcasts of the RDS info. again, just something to be aware of. thx for the interest im really not sure having full peak monitoring (i. e. another tab) in the DSP is really suitable for what the DSP is intended for. as adding in more complex logging over a variable timescale is more of a pro-tool feature id have thought. Winamp does have such a feature and that just keeps a track over a few seconds at most. it also has the advantage of drawing the vu itself instead of re-using a Windows progress bar control for the vu (as the DSP does) which doesnt really allow for such a ui state (not without going the custom drawing route). im not sure from development time if implementing a custom control for that would be worth it. the only obvious option (if this was to be done) would be to show it with the current level either to the side or below like in the attached screenshot (ignoring my lame copypaste work), just then leaves what timescale to cover it for (just peak since starting is the simplest, though its not too hard to have it track the levels over a single time period). WACUP Project ltgt quotWinamp Ramblingsquot - Indie Winamp Dev Blog
No comments:
Post a Comment