|
I did a little work on using scons with MSVC and this is a brief progress report 1. editing custom.py helps but does not solve all problems 2. scons have to be edited in a few places relating to: a. libs that MSVC requires (mainly to do with fltk) b. stdc++ and supc++ need to be deleted from the lib list c. csound has to be renamed csound5. The linker doesn't like csound.lib and csound.exe sharing the same name (it might be possible to fix it differently with a linker option). 3. with this, scons will build csound, but some of the sources had to be edited a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory offset with void * pointers, they have to be cast to whatever they're supposed to be, (MYFLT **) in this case. b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere in the headers c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be #ifndef'ed round there are also some other changes I had to do before when building from Visual Studio, but I can't remember them. 4. At the end of all this, two hours down the line, I got csound5.exe. Tried to run with trapped.csd and this is what I got; D:\csound5>.\csound5 -otest.wav examples/trapped.csd Localisation of messages is disabled, using default language. time resolution is 1000000000.000 ns WARNING: 'freeverb.dll' is not a Csound plugin library WARNING: 'ftconv.dll' is not a Csound plugin library WARNING: 'libsndfile.dll' is not a Csound plugin library WARNING: 'oscbnk.dll' is not a Csound plugin library WARNING: 'pmidi.dll' is not a Csound plugin library WARNING: 'repluck.dll' is not a Csound plugin library WARNING: 'reverbsc.dll' is not a Csound plugin library WARNING: 'rtpa.dll' is not a Csound plugin library WARNING: 'vst4cs.dll' is not a Csound plugin library 0dBFS level = 32768.0 Csound version 5.00 beta (float samples) May 26 2005 libsndfile-1.0.10 UnifiedCSD: examples/trapped.csd STARTING FILE Creating options Creating orchestra Creating C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc (7803A710) Creating score orchname: C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc scorename: C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3337.sco Csound tidy up: Segmentation violation inactive allocs returned to freespace Removing temporary file C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3337.sco ... Removing temporary file C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc ... ...in other words, it crashes! Well I'll try to debug this some other time... Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
round() is not an ANSI C function, and really, gcc/unix people had no business
putting it in <math.h>. We have had just this problem with CDP code when porting to OS X, as we have had a "round" function defined in a library for 20 years. On Windows, we want to use assember code to convert from float to int, as the VC compiler otherwise generates very slow code. In any case, "round" in gcc etc returns a double; for an integer (or rather "long") return value use "lround". Richard Dobson Victor Lazzarini wrote: > > I did a little work on using scons with MSVC and this is a brief .. > b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere .. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
Richard Dobson wrote:
> round() is not an ANSI C function, and really, gcc/unix people had no > business putting it in <math.h>. But it can very well exist in math.h, as round() is defined by ISO 9899:1999. It is a standard function, even though a new standarad. > On Windows, we want to use assember code to convert from float to int, > as the VC compiler otherwise generates very slow code. Or use GCC. With SSE instructions enabled, float to int casts will compile to the 'cvttss2si' instruction which is very fast and copies from a float register to an integer register (such as eax) directly. If you do not want to use SIMD instructions, it is still possible to generate fast code by using lrint() (see below). Note that in Csound5 sysdep.h defines the macros MYFLT2LONG and MYFLT2LRND that can optionally use lrint/lrintf if enabled in SConstruct. > for an integer (or rather "long") return value use "lround". No, you want to use lrint() and lrintf() which will compile to a single assembly instruction like 'fistp' or similar. lround is required to round fractional parts of 0.5 away from zero, and this strict definition results in emitting very complex (and thus slow) assembly code to comply with the standard. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Richard Dobson
I fudged the source to get through the compiler as
#ifdef MSVC #define round(x) floor(x + 0.5) #endif At 12:10 26/05/2005, you wrote: >round() is not an ANSI C function, and really, gcc/unix people had no >business putting it in <math.h>. We have had just this problem with CDP >code when porting to OS X, as we have had a "round" function defined in a >library for 20 years. On Windows, we want to use assember code to convert >from float to int, as the VC compiler otherwise generates very slow code. >In any case, "round" in gcc etc returns a double; for an integer (or >rather "long") return value use "lround". > >Richard Dobson > >Victor Lazzarini wrote: > >>I did a little work on using scons with MSVC and this is a brief >.. >> b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere >.. > >-- >Send bugs reports to [hidden email] > (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) >To unsubscribe, send email to [hidden email] Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
> 3. with this, scons will build csound, but some of the sources had to be
> edited > a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory > offset with void * pointers, > they have to be cast to whatever they're supposed to be, (MYFLT **) > in this case. More likely char*, because the void* pointers (most probably of AUXCH) used to be char* in old versions of Csound on which CsoundAV (from where the Maldonado opcodes have been ported) is based. The code is probably calculating some byte offset to an allocated block of memory, possibly with making assumptions about alignment that may not be correct on 64 bit platforms. > b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere > in the headers Looking at the code, it may be preferable to add casts to int instead, as the values are used this way later. > c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be > #ifndef'ed round You can safely skip building this utility, it is not essential. > Localisation of messages is disabled, using default language. > time resolution is 1000000000.000 ns Did you add WIN32 to the defines ? On Windows QueryPerformanceCounter() should (and actually is with MinGW) be used for timers, and not time(). This is odd. > WARNING: 'freeverb.dll' is not a Csound plugin library > WARNING: 'ftconv.dll' is not a Csound plugin library > WARNING: 'libsndfile.dll' is not a Csound plugin library > WARNING: 'oscbnk.dll' is not a Csound plugin library > WARNING: 'pmidi.dll' is not a Csound plugin library > WARNING: 'repluck.dll' is not a Csound plugin library > WARNING: 'reverbsc.dll' is not a Csound plugin library > WARNING: 'rtpa.dll' is not a Csound plugin library > WARNING: 'vst4cs.dll' is not a Csound plugin library Suggests problems with plugin libraries; I assume 'PUBLIC' needs to be added to the exported interface functions, for example: PUBLIC int csoundModuleCreate(void *csound) rather than just int csoundModuleCreate(void *csound) -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
floor() is horribly slow on VC. May be horribly slow everywhere for that matter.
Something to avoid for audio! This is the assembly function I use: #if defined _WIN32 && defined _MSC_VER /* fast convergent rounding */ __inline long conv_round(double fval) { int result; _asm{ fld fval fistp result mov eax,result } return result; } #endif which as the comment indicates, does convergent rounding which is good for audio. From what Istvan said, this is also what "lrint" does. C requries (um) a different rounding rule, which from all I have read (and mostly understood) is not so good for audio data. There was a huge discussion about all this on music-dsp years ago (from which I gleaned the code above); should be in the archives somewhere. Richard Dobson Victor Lazzarini wrote: > I fudged the source to get through the compiler as > > #ifdef MSVC > #define round(x) floor(x + 0.5) > #endif > > -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
Richard Dobson wrote:
> floor() is horribly slow on VC. May be horribly slow everywhere for that > matter. Something to avoid for audio! This is the assembly function I use: I already replaced round() with simple casts to int in vst4cs.cpp; in fact, looking at the rest of the code, it seems that truncation is actually preferred in this case and not rounding. > #if defined _WIN32 && defined _MSC_VER > /* fast convergent rounding */ > __inline long conv_round(double fval) > { > int result; > _asm{ > fld fval > fistp result > mov eax,result > } > return result; > } > #endif Should this be added to sysdep.h ? The following is used currently (USE_LRINT and USE_DOUBLE are build options set in SConstruct): /* macros for converting floats to integers */ /* MYFLT2LONG: converts with unspecified rounding */ /* MYFLT2LRND: rounds to nearest integer */ #ifdef USE_LRINT #ifndef USE_DOUBLE #define MYFLT2LONG(x) ((long) lrintf((float) (x))) #define MYFLT2LRND(x) ((long) lrintf((float) (x))) #else #define MYFLT2LONG(x) ((long) lrint((double) (x))) #define MYFLT2LRND(x) ((long) lrint((double) (x))) #endif #else #ifndef USE_DOUBLE #define MYFLT2LONG(x) ((long) (x)) #define MYFLT2LRND(x) ((long) ((float)(x) + ((float)(x) < 0.0f ? -0.5f : 0.5f))) #else #define MYFLT2LONG(x) ((long) (x)) #define MYFLT2LRND(x) ((long) ((double)(x) + ((double)(x) < 0.0 ? -0.5 : 0.5))) #endif #endif -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory > offset with void * pointers, > they have to be cast to whatever they're supposed to be, (MYFLT **) > in this case. > b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere > in the headers > c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be > #ifndef'ed round These should be fixed now. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> WARNING: 'freeverb.dll' is not a Csound plugin library > WARNING: 'ftconv.dll' is not a Csound plugin library > WARNING: 'libsndfile.dll' is not a Csound plugin library > WARNING: 'oscbnk.dll' is not a Csound plugin library > WARNING: 'pmidi.dll' is not a Csound plugin library > WARNING: 'repluck.dll' is not a Csound plugin library > WARNING: 'reverbsc.dll' is not a Csound plugin library > WARNING: 'rtpa.dll' is not a Csound plugin library > WARNING: 'vst4cs.dll' is not a Csound plugin library These may be fixed too, I added PUBLIC to the interface functions. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
Thanks for all the fixes.
Btw: should lrint() be used on Windows (with MSVC)? Victor At 13:44 26/05/2005, you wrote: >Victor Lazzarini wrote: > >>WARNING: 'freeverb.dll' is not a Csound plugin library >>WARNING: 'ftconv.dll' is not a Csound plugin library >>WARNING: 'libsndfile.dll' is not a Csound plugin library >>WARNING: 'oscbnk.dll' is not a Csound plugin library >>WARNING: 'pmidi.dll' is not a Csound plugin library >>WARNING: 'repluck.dll' is not a Csound plugin library >>WARNING: 'reverbsc.dll' is not a Csound plugin library >>WARNING: 'rtpa.dll' is not a Csound plugin library >>WARNING: 'vst4cs.dll' is not a Csound plugin library > >These may be fixed too, I added PUBLIC to the interface functions. >-- >Send bugs reports to [hidden email] > (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) >To unsubscribe, send email to [hidden email] Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Istvan Varga
The preference for audio is convergent rounding; so that 2.5 rounds to 2, while
-2.5 rounds to -2 (or to 3 and -3), so there is no net DC component added - the rounding but be exactly complemenary for both positive and negative samples. This is also an example of "Round to even" (it's a rule which decides what to do with a 0.5 fraction - up or down? - so that the accumulated error is minimised, so 3.5 --> 4 and -3.5 --> -4). Convergent rounding embodies (we hope and pray) "round to nearest", so that 2.75 rounds to 3 and -2.75 rounds to -3 also. Truncation is less satisfactory here, as it will lead to a net (if small) positive DC offset. It is a long time since I had enough patience to analyse all these things in detail, and I may be mis-remembering things. But the assembler round() is easily faster than the compiler cast (at least, with VC++), and that matters these days. The Intel compiler is much better in this respect (doesn't incorporate a function call they way VC++ does), so that the cast is more efficient. It should certainly do no harm to add the assembler routine to sysdep.h, with MYFLT2LRND(x) as the portable alternative. Apparently (from discussions on the CoreAudio list) the PowerPC has the opposite problem, in that conversions to float are slow. It was all so much easier in the old days, where you just used a cast and didn't worry! Richard Dobson Istvan Varga wrote: ... > Should this be added to sysdep.h ? The following is used currently > (USE_LRINT and USE_DOUBLE are build options set in SConstruct): > > /* macros for converting floats to integers */ > /* MYFLT2LONG: converts with unspecified rounding */ > /* MYFLT2LRND: rounds to nearest integer */ > ... -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Richard Dobson
What's the assembly code then for truncation
i = (int) f; for double & single precision? Victor At 12:55 26/05/2005, you wrote: >floor() is horribly slow on VC. May be horribly slow everywhere for that >matter. Something to avoid for audio! This is the assembly function I use: > >#if defined _WIN32 && defined _MSC_VER >/* fast convergent rounding */ >__inline long conv_round(double fval) >{ > int result; > _asm{ > fld fval > fistp result > mov eax,result > } > return result; >} >#endif > >which as the comment indicates, does convergent rounding which is good for >audio. From what Istvan said, this is also what "lrint" does. C requries >(um) a different rounding rule, which from all I have read (and mostly >understood) is not so good for audio data. There was a huge discussion >about all this on music-dsp years ago (from which I gleaned the code >above); should be in the archives somewhere. > > >Richard Dobson > > >Victor Lazzarini wrote: > >>I fudged the source to get through the compiler as >>#ifdef MSVC >>#define round(x) floor(x + 0.5) >>#endif > > >-- >Send bugs reports to [hidden email] > (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) >To unsubscribe, send email to [hidden email] Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> Thanks for all the fixes. > > Btw: should lrint() be used on Windows (with MSVC)? Well, if you want to use it, just add useLrint=1 to the scons flags (it defaults to 0 otherwise). -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
You can see it for yourself if you code that line and watch the disassembly in
the debugger. There are two tasks, one of which is required by C, and the other is, for whatever reason, "required" by VC++: the code: int i; float x = 2.5; i = (int) x; leads to this assembler: 14: i = (int) x; 00401140 fld dword ptr [ebp-2030h] 00401146 call __ftol (00401a30) 0040114B mov dword ptr [ebp-202Ch],eax That is the VC++ task - a function call! The code of ftol = : __ftol: 00401A30 push ebp 00401A31 mov ebp,esp 00401A33 add esp,0FFFFFFF4h 00401A36 wait 00401A37 fnstcw word ptr [ebp-2] 00401A3A wait 00401A3B mov ax,word ptr [ebp-2] 00401A3F or ah,0Ch 00401A42 mov word ptr [ebp-4],ax 00401A46 fldcw word ptr [ebp-4] 00401A49 fistp qword ptr [ebp-0Ch] 00401A4C fldcw word ptr [ebp-2] 00401A4F mov eax,dword ptr [ebp-0Ch] 00401A52 mov edx,dword ptr [ebp-8] 00401A55 leave Most of this deals with the need to change the rounding mode of the FPU to that which is required by C. The central instruction if of course "fistp". Now as it happens, for audio we want the original standard FPU rounding mode, which is convergent, so we don't need all the FPU control flags code, just the fistp! So we trade 17 instructions + function call, for three assember ones. Elminate the function call and things are already much improved, but the code to change the FPU remains, and will have to be there for compliance with the C rules for the int cast. I never got around to doing it, but the same saving can be made for Linux/X86, as gcc will equally have to obey the C rules, even if it is sane enough not to use a function call. Richard Dobson Victor Lazzarini wrote: > What's the assembly code then for truncation > > i = (int) f; > > for double & single precision? > > Victor ... -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> What's the assembly code then for truncation > > i = (int) f; > > for double & single precision? > Well, compiling this C code with gcc 4.0: /* -------- start test.c -------- */ #define _ISOC99_SOURCE #include <math.h> int f2i_cast(float x) { return (int) x; } int d2i_cast(double x) { return (int) x; } int f2i_lrint(float x) { return (int) lrintf(x); } int d2i_lrint(double x) { return (int) lrint(x); } /* -------- end test.c -------- */ For a generic x86 CPU (irrelevant lines filtered out): Compiler flags: -Wall -O2 -fomit-frame-pointer -S -masm=intel f2i_cast: sub %esp, 8 fnstcw WORD PTR [%esp+6] fld DWORD PTR [%esp+12] movzx %eax, WORD PTR [%esp+6] or %ax, 3072 mov WORD PTR [%esp+4], %ax fldcw WORD PTR [%esp+4] fistp DWORD PTR [%esp] fldcw WORD PTR [%esp+6] mov %eax, DWORD PTR [%esp] add %esp, 8 ret d2i_cast: sub %esp, 8 fnstcw WORD PTR [%esp+6] fld QWORD PTR [%esp+12] movzx %eax, WORD PTR [%esp+6] or %ax, 3072 mov WORD PTR [%esp+4], %ax fldcw WORD PTR [%esp+4] fistp DWORD PTR [%esp] fldcw WORD PTR [%esp+6] mov %eax, DWORD PTR [%esp] add %esp, 8 ret f2i_lrint: sub %esp, 16 fld DWORD PTR [%esp+20] #APP fistpl DWORD PTR [%esp+12] #NO_APP mov %eax, DWORD PTR [%esp+12] add %esp, 16 ret d2i_lrint: sub %esp, 16 fld QWORD PTR [%esp+20] #APP fistpl DWORD PTR [%esp+12] #NO_APP mov %eax, DWORD PTR [%esp+12] add %esp, 16 ret Now for Pentium III, using SSE 1 instructions: Compiler flags: -Wall -O3 -march=pentium3 -fomit-frame-pointer -ffast-math -S -masm=intel f2i_cast: cvttss2si %eax, DWORD PTR [%esp+4] ret (the other functions do not change significantly) Now try Pentium 4, with SSE 2: Compiler flags: -Wall -O3 -march=pentium4 -fomit-frame-pointer -ffast-math -S -masm=intel f2i_cast: cvttss2si %eax, DWORD PTR [%esp+4] ret d2i_cast: cvttsd2si %eax, QWORD PTR [%esp+4] ret (again, the lrint based functions remain effectively the same) As you can see, enabling the use of SSE (with -march) can significantly improve float to integer casts. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Richard Dobson
Thanks for the explanation, but what can we use instead of the
C code for truncation (this is a practical question, I want to change all my casts on windows to a faster code)? Would it be this (I'm sorry I don't really do assembly)? _asm{ fld fval mov eax,fval } Victor At 14:43 26/05/2005, you wrote: >You can see it for yourself if you code that line and watch the >disassembly in the debugger. There are two tasks, one of which is >required by C, and the other is, for whatever reason, "required" by VC++: > >the code: > int i; > float x = 2.5; > i = (int) x; > >leads to this assembler: > >14: i = (int) x; >00401140 fld dword ptr [ebp-2030h] >00401146 call __ftol (00401a30) >0040114B mov dword ptr [ebp-202Ch],eax > >That is the VC++ task - a function call! > >The code of ftol = : > >__ftol: >00401A30 push ebp >00401A31 mov ebp,esp >00401A33 add esp,0FFFFFFF4h >00401A36 wait >00401A37 fnstcw word ptr [ebp-2] >00401A3A wait >00401A3B mov ax,word ptr [ebp-2] >00401A3F or ah,0Ch >00401A42 mov word ptr [ebp-4],ax >00401A46 fldcw word ptr [ebp-4] >00401A49 fistp qword ptr [ebp-0Ch] >00401A4C fldcw word ptr [ebp-2] >00401A4F mov eax,dword ptr [ebp-0Ch] >00401A52 mov edx,dword ptr [ebp-8] >00401A55 leave > > >Most of this deals with the need to change the rounding mode of the FPU to >that which is required by C. The central instruction if of course "fistp". > >Now as it happens, for audio we want the original standard FPU rounding >mode, which is convergent, so we don't need all the FPU control flags >code, just the fistp! So we trade 17 instructions + function call, for >three assember ones. Elminate the function call and things are already >much improved, but the code to change the FPU remains, and will have to be >there for compliance with the C rules for the int cast. I never got around >to doing it, but the same saving can be made for Linux/X86, as gcc will >equally have to obey the C rules, even if it is sane enough not to use a >function call. > > >Richard Dobson > > >Victor Lazzarini wrote: > >>What's the assembly code then for truncation >>i = (int) f; >>for double & single precision? >>Victor >... > >-- >Send bugs reports to [hidden email] > (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) >To unsubscribe, send email to [hidden email] Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
Victor Lazzarini wrote:
> Would it be this (I'm sorry I don't really do assembly)? > > _asm{ > fld fval > mov eax,fval > } No, this would not work. However, with MinGW, you should use -march=pentium3 or -march=pentium4 if you have a CPU that supports SSE or SSE2, respectively. Otherwise, the saving, changing, and then restoring of the FPU control word cannot be avoided, and that is what is responsible for most of the slowdown. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Istvan Varga
what if I don't have lrintf() (eg. using the MSVC compiler), are
there any options? Victor At 14:45 26/05/2005, you wrote: >Victor Lazzarini wrote: >>What's the assembly code then for truncation >>i = (int) f; >>for double & single precision? > >Well, compiling this C code with gcc 4.0: > >/* -------- start test.c -------- */ > >#define _ISOC99_SOURCE >#include <math.h> > >int f2i_cast(float x) >{ > return (int) x; >} > >int d2i_cast(double x) >{ > return (int) x; >} > >int f2i_lrint(float x) >{ > return (int) lrintf(x); >} > >int d2i_lrint(double x) >{ > return (int) lrint(x); >} > >/* -------- end test.c -------- */ > >For a generic x86 CPU (irrelevant lines filtered out): > >Compiler flags: -Wall -O2 -fomit-frame-pointer -S -masm=intel > >f2i_cast: > sub %esp, 8 > fnstcw WORD PTR [%esp+6] > fld DWORD PTR [%esp+12] > movzx %eax, WORD PTR [%esp+6] > or %ax, 3072 > mov WORD PTR [%esp+4], %ax > fldcw WORD PTR [%esp+4] > fistp DWORD PTR [%esp] > fldcw WORD PTR [%esp+6] > mov %eax, DWORD PTR [%esp] > add %esp, 8 > ret > >d2i_cast: > sub %esp, 8 > fnstcw WORD PTR [%esp+6] > fld QWORD PTR [%esp+12] > movzx %eax, WORD PTR [%esp+6] > or %ax, 3072 > mov WORD PTR [%esp+4], %ax > fldcw WORD PTR [%esp+4] > fistp DWORD PTR [%esp] > fldcw WORD PTR [%esp+6] > mov %eax, DWORD PTR [%esp] > add %esp, 8 > ret > >f2i_lrint: > sub %esp, 16 > fld DWORD PTR [%esp+20] >#APP > fistpl DWORD PTR [%esp+12] >#NO_APP > mov %eax, DWORD PTR [%esp+12] > add %esp, 16 > ret > >d2i_lrint: > sub %esp, 16 > fld QWORD PTR [%esp+20] >#APP > fistpl DWORD PTR [%esp+12] >#NO_APP > mov %eax, DWORD PTR [%esp+12] > add %esp, 16 > ret > >Now for Pentium III, using SSE 1 instructions: >Compiler flags: -Wall -O3 -march=pentium3 -fomit-frame-pointer -ffast-math >-S -masm=intel > >f2i_cast: > cvttss2si %eax, DWORD PTR [%esp+4] > ret > >(the other functions do not change significantly) > >Now try Pentium 4, with SSE 2: >Compiler flags: -Wall -O3 -march=pentium4 -fomit-frame-pointer -ffast-math >-S -masm=intel > >f2i_cast: > cvttss2si %eax, DWORD PTR [%esp+4] > ret > >d2i_cast: > cvttsd2si %eax, QWORD PTR [%esp+4] > ret > >(again, the lrint based functions remain effectively the same) > >As you can see, enabling the use of SSE (with -march) can significantly >improve float to integer casts. >-- >Send bugs reports to [hidden email] > (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) >To unsubscribe, send email to [hidden email] Victor Lazzarini Music Technology Laboratory Music Department National University of Ireland, Maynooth -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Richard Dobson
Richard Dobson wrote:
> things. But the assembler round() is easily faster than the compiler > cast (at least, with VC++), and that matters these days. If you mean the standard round() function, beware that it is also required to round 0.5 away from zero, and the result of that is what you can see at the bottom of this message. rint() is better, but the fastest library function is lrint()/lrintf() (see prvious message with assembly listings) which - with a good compiler - will expand to a single line of assembly code. > The Intel compiler is much better in this respect (doesn't incorporate > a function call they way VC++ does), so that the cast is more efficient. I assume that it can also use cvttss2si. > It should certainly do no harm to add the assembler routine to sysdep.h, > with MYFLT2LRND(x) as the portable alternative. Better yet, it can be the actual implementation of MYFLT2LONG and MYFLT2LRND for MSVC. ----------------------------------------------------------------------------- 0000ad20 <round>: ad20: 55 push ebp ad21: 89 e5 mov ebp,esp ad23: 83 ec 30 sub esp,0x30 ad26: dd 45 08 fld QWORD PTR [ebp+8] ad29: 89 75 f8 mov DWORD PTR [ebp-8],esi ad2c: 89 7d fc mov DWORD PTR [ebp-4],edi ad2f: 89 5d f4 mov DWORD PTR [ebp-12],ebx ad32: dd 55 d8 fst QWORD PTR [ebp-40] ad35: 8b 75 d8 mov esi,DWORD PTR [ebp-40] ad38: e8 52 ac ff ff call 598f <__i686.get_pc_thunk.bx> ad3d: 81 c3 b7 72 01 00 add ebx,0x172b7 ad43: c7 45 e0 00 00 00 00 mov DWORD PTR [ebp-32],0x0 ad4a: 8b 7d dc mov edi,DWORD PTR [ebp-36] ad4d: c7 45 e4 00 00 00 00 mov DWORD PTR [ebp-28],0x0 ad54: 89 f0 mov eax,esi ad56: 89 c2 mov edx,eax ad58: 89 f8 mov eax,edi ad5a: 89 fe mov esi,edi ad5c: c1 f8 14 sar eax,0x14 ad5f: 25 ff 07 00 00 and eax,0x7ff ad64: 8d b8 01 fc ff ff lea edi,[eax-1023] ad6a: 83 ff 13 cmp edi,0x13 ad6d: 89 7d d4 mov DWORD PTR [ebp-44],edi ad70: 7f 5e jg add0 <round+0xb0> ad72: 85 ff test edi,edi ad74: 0f 88 d9 00 00 00 js ae53 <round+0x133> ad7a: 0f b6 4d d4 movzx ecx,BYTE PTR [ebp-44] ad7e: bf ff ff 0f 00 mov edi,0xfffff ad83: 89 f0 mov eax,esi ad85: d9 c0 fld st(0) ad87: d3 ff sar edi,cl ad89: 21 f8 and eax,edi ad8b: 09 d0 or eax,edx ad8d: 74 52 je ade1 <round+0xc1> ad8f: dd d8 fstp st(0) ad91: dc 83 14 ad ff ff fadd QWORD PTR [ebx-21228] ad97: d9 83 00 ad ff ff fld DWORD PTR [ebx-21248] ad9d: d9 c9 fxch st(1) ad9f: df e9 fucomip %st,st(1) ada1: dd d8 fstp st(0) ada3: 76 11 jbe adb6 <round+0x96> ada5: b8 00 00 08 00 mov eax,0x80000 adaa: d3 f8 sar eax,cl adac: 01 c6 add esi,eax adae: 89 f8 mov eax,edi adb0: f7 d0 not eax adb2: 21 c6 and esi,eax adb4: 31 d2 xor edx,edx adb6: 89 75 e4 mov DWORD PTR [ebp-28],esi adb9: 89 55 e0 mov DWORD PTR [ebp-32],edx adbc: dd 45 e0 fld QWORD PTR [ebp-32] adbf: 8b 5d f4 mov ebx,DWORD PTR [ebp-12] adc2: 8b 75 f8 mov esi,DWORD PTR [ebp-8] adc5: 8b 7d fc mov edi,DWORD PTR [ebp-4] adc8: 89 ec mov esp,ebp adca: 5d pop ebp adcb: c3 ret adcc: 8d 74 26 00 lea esi,[esi] add0: 83 7d d4 33 cmp DWORD PTR [ebp-44],0x33 add4: 7e 1a jle adf0 <round+0xd0> add6: 81 7d d4 00 04 00 00 cmp DWORD PTR [ebp-44],0x400 addd: d9 c0 fld st(0) addf: 74 6b je ae4c <round+0x12c> ade1: dd d9 fstp st(1) ade3: 8b 5d f4 mov ebx,DWORD PTR [ebp-12] ade6: 8b 75 f8 mov esi,DWORD PTR [ebp-8] ade9: 8b 7d fc mov edi,DWORD PTR [ebp-4] adec: 89 ec mov esp,ebp adee: 5d pop ebp adef: c3 ret adf0: 2d 13 04 00 00 sub eax,0x413 adf5: bf ff ff ff ff mov edi,0xffffffff adfa: 88 c1 mov cl,al adfc: d3 ef shr edi,cl adfe: 85 d7 test edi,edx ae00: d9 c0 fld st(0) ae02: 74 dd je ade1 <round+0xc1> ae04: dd d8 fstp st(0) ae06: dc 83 14 ad ff ff fadd QWORD PTR [ebx-21228] ae0c: d9 83 00 ad ff ff fld DWORD PTR [ebx-21248] ae12: d9 c9 fxch st(1) ae14: df e9 fucomip %st,st(1) ae16: dd d8 fstp st(0) ae18: 76 27 jbe ae41 <round+0x121> ae1a: c7 45 ec 33 00 00 00 mov DWORD PTR [ebp-20],0x33 ae21: 8b 45 d4 mov eax,DWORD PTR [ebp-44] ae24: 29 45 ec sub DWORD PTR [ebp-20],eax ae27: b8 01 00 00 00 mov eax,0x1 ae2c: 0f b6 4d ec movzx ecx,BYTE PTR [ebp-20] ae30: d3 e0 shl eax,cl ae32: 8d 04 10 lea eax,[eax+edx] ae35: 39 d0 cmp eax,edx ae37: 0f 92 c2 setb dl ae3a: 0f b6 d2 movzx edx,dl ae3d: 01 d6 add esi,edx ae3f: 89 c2 mov edx,eax ae41: 89 f8 mov eax,edi ae43: f7 d0 not eax ae45: 21 c2 and edx,eax ae47: e9 6a ff ff ff jmp adb6 <round+0x96> ae4c: de c1 faddp st(1),%st ae4e: e9 6c ff ff ff jmp adbf <round+0x9f> ae53: dc 83 14 ad ff ff fadd QWORD PTR [ebx-21228] ae59: d9 83 00 ad ff ff fld DWORD PTR [ebx-21248] ae5f: d9 c9 fxch st(1) ae61: df e9 fucomip %st,st(1) ae63: dd d8 fstp st(0) ae65: 0f 86 4b ff ff ff jbe adb6 <round+0x96> ae6b: 81 e6 00 00 00 80 and esi,0x80000000 ae71: 89 f0 mov eax,esi ae73: 0d 00 00 f0 3f or eax,0x3ff00000 ae78: 47 inc edi ae79: 0f 44 f0 cmove esi,eax ae7c: e9 33 ff ff ff jmp adb4 <round+0x94> -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
|
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> what if I don't have lrintf() (eg. using the MSVC compiler), are > there any options? I think I will add the MSVC specific assembly code to sysdep.h so that it will be the implementation of MYFLT2LONG and MYFLT2LRND if useLrint is not enabled and _MSC_VER and WIN32 are defined. -- Send bugs reports to [hidden email] (or to http://www.cs.bath.ac.uk/cgi-bin/csound ) To unsubscribe, send email to [hidden email] |
| Powered by Nabble | See how NAML generates this page |
