scons MSVC report

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

scons MSVC report

Victor Lazzarini

I did a little work on using scons with MSVC and this is a brief
progress report

1. editing custom.py helps but does not solve all problems
2. scons have to be edited in a few places relating to:
                 a. libs that MSVC requires (mainly to do with fltk)
                 b. stdc++ and supc++  need to be deleted from the lib list
                 c. csound has to be renamed csound5. The linker doesn't like
                  csound.lib and csound.exe sharing the same name (it might
be possible to
                   fix it differently with a linker option).

3. with this, scons will build csound, but some of the sources had to be edited
     a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory
offset with void * pointers,
     they have to be cast to whatever they're supposed to be, (MYFLT **) in
this case.
     b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere
in the headers
     c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be
#ifndef'ed round

    there are also some other changes I had to do before when building from
Visual Studio, but
    I can't remember them.

4. At the end of all this, two hours down the line, I got csound5.exe.

    Tried to run with trapped.csd and this is what I got;

D:\csound5>.\csound5 -otest.wav examples/trapped.csd
Localisation of messages is disabled, using default language.
time resolution is 1000000000.000 ns
WARNING: 'freeverb.dll' is not a Csound plugin library
WARNING: 'ftconv.dll' is not a Csound plugin library
WARNING: 'libsndfile.dll' is not a Csound plugin library
WARNING: 'oscbnk.dll' is not a Csound plugin library
WARNING: 'pmidi.dll' is not a Csound plugin library
WARNING: 'repluck.dll' is not a Csound plugin library
WARNING: 'reverbsc.dll' is not a Csound plugin library
WARNING: 'rtpa.dll' is not a Csound plugin library
WARNING: 'vst4cs.dll' is not a Csound plugin library
0dBFS level = 32768.0
Csound version 5.00 beta (float samples) May 26 2005
libsndfile-1.0.10
UnifiedCSD:  examples/trapped.csd
STARTING FILE
Creating options
Creating orchestra
Creating C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc (7803A710)
Creating score
orchname:  C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc
scorename: C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3337.sco
Csound tidy up: Segmentation violation
inactive allocs returned to freespace
Removing temporary file C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3337.sco ...
Removing temporary file C:\DOCUME~1\VLAZZA~1\LOCALS~1\Temp\cs3336.orc ...


...in other words, it crashes!

Well I'll try to debug this some other time...


Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Richard Dobson
round() is not an ANSI C function, and really, gcc/unix people had no business
putting it in <math.h>. We have had just this problem with CDP code when porting
to OS X, as we have had a "round" function defined in a library for 20 years. On
Windows, we want to use assember code to convert from float to int, as the VC
compiler otherwise generates very slow code. In any case, "round" in gcc etc
returns a double; for an integer (or rather "long") return value use "lround".

Richard Dobson

Victor Lazzarini wrote:

>
> I did a little work on using scons with MSVC and this is a brief
..
>     b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere
..

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
Richard Dobson wrote:

> round() is not an ANSI C function, and really, gcc/unix people had no
> business putting it in <math.h>.

But it can very well exist in math.h, as round() is defined by ISO 9899:1999.
It is a standard function, even though a new standarad.

 > On Windows, we want to use assember code to convert from float to int,
 > as the VC compiler otherwise generates very slow code.

Or use GCC. With SSE instructions enabled, float to int casts will compile
to the 'cvttss2si' instruction which is very fast and copies from a float
register to an integer register (such as eax) directly. If you do not want
to use SIMD instructions, it is still possible to generate fast code by
using lrint() (see below). Note that in Csound5 sysdep.h defines the macros
MYFLT2LONG and MYFLT2LRND that can optionally use lrint/lrintf if enabled
in SConstruct.

> for an integer (or rather "long") return value use "lround".

No, you want to use lrint() and lrintf() which will compile to a single
assembly instruction like 'fistp' or similar. lround is required to round
fractional parts of 0.5 away from zero, and this strict definition results
in emitting very complex (and thus slow) assembly code to comply with the
standard.
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Victor Lazzarini
In reply to this post by Richard Dobson
I fudged the source to get through the compiler as

#ifdef MSVC
#define round(x) floor(x + 0.5)
#endif


At 12:10 26/05/2005, you wrote:

>round() is not an ANSI C function, and really, gcc/unix people had no
>business putting it in <math.h>. We have had just this problem with CDP
>code when porting to OS X, as we have had a "round" function defined in a
>library for 20 years. On Windows, we want to use assember code to convert
>from float to int, as the VC compiler otherwise generates very slow code.
>In any case, "round" in gcc etc returns a double; for an integer (or
>rather "long") return value use "lround".
>
>Richard Dobson
>
>Victor Lazzarini wrote:
>
>>I did a little work on using scons with MSVC and this is a brief
>..
>>     b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere
>..
>
>--
>Send bugs reports to [hidden email]
>              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
>To unsubscribe, send email to [hidden email]

Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
> 3. with this, scons will build csound, but some of the sources had to be
> edited
>     a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory
> offset with void * pointers,
>     they have to be cast to whatever they're supposed to be, (MYFLT **)
> in this case.

More likely char*, because the void* pointers (most probably of AUXCH)
used to be char* in old versions of Csound on which CsoundAV (from where
the Maldonado opcodes have been ported) is based. The code is probably
calculating some byte offset to an allocated block of memory, possibly
with making assumptions about alignment that may not be correct on 64 bit
platforms.

>     b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere
> in the headers

Looking at the code, it may be preferable to add casts to int instead,
as the values are used this way later.

>     c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be
> #ifndef'ed round

You can safely skip building this utility, it is not essential.

> Localisation of messages is disabled, using default language.
> time resolution is 1000000000.000 ns

Did you add WIN32 to the defines ? On Windows QueryPerformanceCounter()
should (and actually is with MinGW) be used for timers, and not time().
This is odd.

> WARNING: 'freeverb.dll' is not a Csound plugin library
> WARNING: 'ftconv.dll' is not a Csound plugin library
> WARNING: 'libsndfile.dll' is not a Csound plugin library
> WARNING: 'oscbnk.dll' is not a Csound plugin library
> WARNING: 'pmidi.dll' is not a Csound plugin library
> WARNING: 'repluck.dll' is not a Csound plugin library
> WARNING: 'reverbsc.dll' is not a Csound plugin library
> WARNING: 'rtpa.dll' is not a Csound plugin library
> WARNING: 'vst4cs.dll' is not a Csound plugin library

Suggests problems with plugin libraries; I assume 'PUBLIC' needs to be
added to the exported interface functions, for example:

PUBLIC int csoundModuleCreate(void *csound)

rather than just

int csoundModuleCreate(void *csound)
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Richard Dobson
In reply to this post by Victor Lazzarini
floor() is horribly slow on VC. May be horribly slow everywhere for that matter.
Something to avoid for audio! This is the assembly function I use:

#if defined _WIN32 && defined _MSC_VER
/* fast convergent rounding */
__inline long conv_round(double fval)
{
        int result;
        _asm{
                fld fval
                fistp result
                mov eax,result
        }
        return result;
}
#endif

which as the comment indicates, does convergent rounding which is good for
audio. From what Istvan said, this is also what "lrint" does. C requries (um) a
different rounding rule, which from all I have read (and mostly understood) is
not so good for audio data. There was a huge discussion about all this on
music-dsp years ago (from which I gleaned the code above); should be in the
archives somewhere.


Richard Dobson


Victor Lazzarini wrote:

> I fudged the source to get through the compiler as
>
> #ifdef MSVC
> #define round(x) floor(x + 0.5)
> #endif
>
>


--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
Richard Dobson wrote:

> floor() is horribly slow on VC. May be horribly slow everywhere for that
> matter. Something to avoid for audio! This is the assembly function I use:

I already replaced round() with simple casts to int in vst4cs.cpp; in fact,
looking at the rest of the code, it seems that truncation is actually preferred
in this case and not rounding.

> #if defined _WIN32 && defined _MSC_VER
> /* fast convergent rounding */
> __inline long conv_round(double fval)
> {
>     int result;
>     _asm{
>         fld    fval
>         fistp    result
>         mov    eax,result
>     }
>     return result;
> }
> #endif

Should this be added to sysdep.h ? The following is used currently
(USE_LRINT and USE_DOUBLE are build options set in SConstruct):

/* macros for converting floats to integers */
/* MYFLT2LONG: converts with unspecified rounding */
/* MYFLT2LRND: rounds to nearest integer */

#ifdef USE_LRINT
#ifndef USE_DOUBLE
#define MYFLT2LONG(x) ((long) lrintf((float) (x)))
#define MYFLT2LRND(x) ((long) lrintf((float) (x)))
#else
#define MYFLT2LONG(x) ((long) lrint((double) (x)))
#define MYFLT2LRND(x) ((long) lrint((double) (x)))
#endif
#else
#ifndef USE_DOUBLE
#define MYFLT2LONG(x) ((long) (x))
#define MYFLT2LRND(x) ((long) ((float)(x) + ((float)(x) < 0.0f ? -0.5f : 0.5f)))
#else
#define MYFLT2LONG(x) ((long) (x))
#define MYFLT2LRND(x) ((long) ((double)(x) + ((double)(x) < 0.0 ? -0.5 : 0.5)))
#endif
#endif
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:

>     a. Opcodes/gab/vectorial.c: cl doesn't like trying to acess memory
> offset with void * pointers,
>     they have to be cast to whatever they're supposed to be, (MYFLT **)
> in this case.
>     b. vst4cs/vst4cs.cpp and vsthost.cpp: round() doesn't exist anywhere
> in the headers
>     c. util1/csd_util/cs.c: unistd.h doesn't exist for MSVC, had to be
> #ifndef'ed round

These should be fixed now.
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:

> WARNING: 'freeverb.dll' is not a Csound plugin library
> WARNING: 'ftconv.dll' is not a Csound plugin library
> WARNING: 'libsndfile.dll' is not a Csound plugin library
> WARNING: 'oscbnk.dll' is not a Csound plugin library
> WARNING: 'pmidi.dll' is not a Csound plugin library
> WARNING: 'repluck.dll' is not a Csound plugin library
> WARNING: 'reverbsc.dll' is not a Csound plugin library
> WARNING: 'rtpa.dll' is not a Csound plugin library
> WARNING: 'vst4cs.dll' is not a Csound plugin library

These may be fixed too, I added PUBLIC to the interface functions.
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Victor Lazzarini
Thanks for all the fixes.

Btw: should lrint() be used on Windows (with MSVC)?

Victor

At 13:44 26/05/2005, you wrote:

>Victor Lazzarini wrote:
>
>>WARNING: 'freeverb.dll' is not a Csound plugin library
>>WARNING: 'ftconv.dll' is not a Csound plugin library
>>WARNING: 'libsndfile.dll' is not a Csound plugin library
>>WARNING: 'oscbnk.dll' is not a Csound plugin library
>>WARNING: 'pmidi.dll' is not a Csound plugin library
>>WARNING: 'repluck.dll' is not a Csound plugin library
>>WARNING: 'reverbsc.dll' is not a Csound plugin library
>>WARNING: 'rtpa.dll' is not a Csound plugin library
>>WARNING: 'vst4cs.dll' is not a Csound plugin library
>
>These may be fixed too, I added PUBLIC to the interface functions.
>--
>Send bugs reports to [hidden email]
>              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
>To unsubscribe, send email to [hidden email]

Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Richard Dobson
In reply to this post by Istvan Varga
The preference for audio is convergent rounding; so that 2.5 rounds to 2, while
-2.5 rounds to -2 (or to 3 and -3), so there is no net DC component added - the
rounding but be exactly complemenary for both positive and negative samples.
This is also an example of "Round to even"  (it's a rule which decides what to
do with a 0.5 fraction - up or down? - so that the accumulated error is
minimised, so 3.5 --> 4 and -3.5 --> -4). Convergent rounding embodies (we hope
and pray)  "round to nearest", so that 2.75 rounds to 3 and -2.75 rounds to -3
also. Truncation is less satisfactory here, as it will lead to a net (if small)
positive DC offset. It is a long time since I had enough patience to analyse all
these things in detail, and I may be mis-remembering things. But the assembler
round() is easily faster than the compiler cast (at least, with VC++), and that
matters these days. The Intel compiler is much better in this respect (doesn't
incorporate a function call they way VC++ does), so that the cast is more efficient.

It should certainly do no harm to add the assembler routine to sysdep.h, with
MYFLT2LRND(x) as the portable alternative.

Apparently (from discussions on the CoreAudio list) the PowerPC has the opposite
problem, in that conversions to float are slow. It was all so much easier in the
old days, where you just used a cast and didn't worry!

Richard Dobson


Istvan Varga wrote:

...
> Should this be added to sysdep.h ? The following is used currently
> (USE_LRINT and USE_DOUBLE are build options set in SConstruct):
>
> /* macros for converting floats to integers */
> /* MYFLT2LONG: converts with unspecified rounding */
> /* MYFLT2LRND: rounds to nearest integer */
>
...

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Victor Lazzarini
In reply to this post by Richard Dobson
What's the assembly code then for truncation

i = (int) f;

for double & single precision?

Victor

At 12:55 26/05/2005, you wrote:

>floor() is horribly slow on VC. May be horribly slow everywhere for that
>matter. Something to avoid for audio! This is the assembly function I use:
>
>#if defined _WIN32 && defined _MSC_VER
>/* fast convergent rounding */
>__inline long conv_round(double fval)
>{
>         int result;
>         _asm{
>                 fld     fval
>                 fistp   result
>                 mov     eax,result
>         }
>         return result;
>}
>#endif
>
>which as the comment indicates, does convergent rounding which is good for
>audio. From what Istvan said, this is also what "lrint" does. C requries
>(um) a different rounding rule, which from all I have read (and mostly
>understood) is not so good for audio data. There was a huge discussion
>about all this on music-dsp years ago (from which I gleaned the code
>above); should be in the archives somewhere.
>
>
>Richard Dobson
>
>
>Victor Lazzarini wrote:
>
>>I fudged the source to get through the compiler as
>>#ifdef MSVC
>>#define round(x) floor(x + 0.5)
>>#endif
>
>
>--
>Send bugs reports to [hidden email]
>              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
>To unsubscribe, send email to [hidden email]

Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:

> Thanks for all the fixes.
>
> Btw: should lrint() be used on Windows (with MSVC)?

Well, if you want to use it, just add useLrint=1 to the scons flags
(it defaults to 0 otherwise).
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Richard Dobson
In reply to this post by Victor Lazzarini
You can see it for yourself if you code that line and watch the disassembly in
the debugger.  There are two tasks, one of which is required by C, and the other
   is, for whatever reason, "required" by VC++:

the code:
        int i;
        float x = 2.5;
        i = (int) x;

leads to this assembler:

14:       i = (int) x;
00401140   fld         dword ptr [ebp-2030h]
00401146   call        __ftol (00401a30)
0040114B   mov         dword ptr [ebp-202Ch],eax

That is the VC++ task - a function call!

The code of ftol = :

__ftol:
00401A30   push        ebp
00401A31   mov         ebp,esp
00401A33   add         esp,0FFFFFFF4h
00401A36   wait
00401A37   fnstcw      word ptr [ebp-2]
00401A3A   wait
00401A3B   mov         ax,word ptr [ebp-2]
00401A3F   or          ah,0Ch
00401A42   mov         word ptr [ebp-4],ax
00401A46   fldcw       word ptr [ebp-4]
00401A49   fistp       qword ptr [ebp-0Ch]
00401A4C   fldcw       word ptr [ebp-2]
00401A4F   mov         eax,dword ptr [ebp-0Ch]
00401A52   mov         edx,dword ptr [ebp-8]
00401A55   leave


Most of this deals with the need to change the rounding mode of the FPU to that
which is required by C. The central instruction if of course "fistp".

Now as it happens, for audio we want the original standard FPU rounding mode,
which is convergent, so we don't need all the FPU control flags code, just the
fistp! So we trade 17 instructions + function call, for three assember ones.
Elminate the function call and things are already much improved, but the code to
change the FPU remains, and will have to be there for compliance with the C
rules for the int cast. I never got around to doing it, but the same saving can
be made for Linux/X86, as gcc will equally have to obey the C rules, even if it
is sane enough not to use a function call.


Richard Dobson


Victor Lazzarini wrote:

> What's the assembly code then for truncation
>
> i = (int) f;
>
> for double & single precision?
>
> Victor
...

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:
> What's the assembly code then for truncation
>
> i = (int) f;
>
> for double & single precision?
>

Well, compiling this C code with gcc 4.0:

/* -------- start test.c -------- */

#define _ISOC99_SOURCE
#include <math.h>

int f2i_cast(float x)
{
     return (int) x;
}

int d2i_cast(double x)
{
     return (int) x;
}

int f2i_lrint(float x)
{
     return (int) lrintf(x);
}

int d2i_lrint(double x)
{
     return (int) lrint(x);
}

/* -------- end test.c -------- */

For a generic x86 CPU (irrelevant lines filtered out):

Compiler flags: -Wall -O2 -fomit-frame-pointer -S -masm=intel

f2i_cast:
         sub     %esp, 8
         fnstcw  WORD PTR [%esp+6]
         fld     DWORD PTR [%esp+12]
         movzx   %eax, WORD PTR [%esp+6]
         or      %ax, 3072
         mov     WORD PTR [%esp+4], %ax
         fldcw   WORD PTR [%esp+4]
         fistp   DWORD PTR [%esp]
         fldcw   WORD PTR [%esp+6]
         mov     %eax, DWORD PTR [%esp]
         add     %esp, 8
         ret

d2i_cast:
         sub     %esp, 8
         fnstcw  WORD PTR [%esp+6]
         fld     QWORD PTR [%esp+12]
         movzx   %eax, WORD PTR [%esp+6]
         or      %ax, 3072
         mov     WORD PTR [%esp+4], %ax
         fldcw   WORD PTR [%esp+4]
         fistp   DWORD PTR [%esp]
         fldcw   WORD PTR [%esp+6]
         mov     %eax, DWORD PTR [%esp]
         add     %esp, 8
         ret

f2i_lrint:
         sub     %esp, 16
         fld     DWORD PTR [%esp+20]
#APP
         fistpl DWORD PTR [%esp+12]
#NO_APP
         mov     %eax, DWORD PTR [%esp+12]
         add     %esp, 16
         ret

d2i_lrint:
         sub     %esp, 16
         fld     QWORD PTR [%esp+20]
#APP
         fistpl DWORD PTR [%esp+12]
#NO_APP
         mov     %eax, DWORD PTR [%esp+12]
         add     %esp, 16
         ret

Now for Pentium III, using SSE 1 instructions:
Compiler flags: -Wall -O3 -march=pentium3 -fomit-frame-pointer -ffast-math -S -masm=intel

f2i_cast:
         cvttss2si       %eax, DWORD PTR [%esp+4]
         ret

(the other functions do not change significantly)

Now try Pentium 4, with SSE 2:
Compiler flags: -Wall -O3 -march=pentium4 -fomit-frame-pointer -ffast-math -S -masm=intel

f2i_cast:
         cvttss2si       %eax, DWORD PTR [%esp+4]
         ret

d2i_cast:
         cvttsd2si       %eax, QWORD PTR [%esp+4]
         ret

(again, the lrint based functions remain effectively the same)

As you can see, enabling the use of SSE (with -march) can significantly
improve float to integer casts.
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Victor Lazzarini
In reply to this post by Richard Dobson
Thanks for the explanation, but what can we use instead of the
C code for truncation (this is a practical question, I want to change all
my casts on windows to a faster code)?

Would it be this (I'm sorry I don't really do assembly)?

_asm{
                 fld     fval
          mov    eax,fval
}



Victor

At 14:43 26/05/2005, you wrote:

>You can see it for yourself if you code that line and watch the
>disassembly in the debugger.  There are two tasks, one of which is
>required by C, and the other   is, for whatever reason, "required" by VC++:
>
>the code:
>         int i;
>         float x = 2.5;
>         i = (int) x;
>
>leads to this assembler:
>
>14:       i = (int) x;
>00401140   fld         dword ptr [ebp-2030h]
>00401146   call        __ftol (00401a30)
>0040114B   mov         dword ptr [ebp-202Ch],eax
>
>That is the VC++ task - a function call!
>
>The code of ftol = :
>
>__ftol:
>00401A30   push        ebp
>00401A31   mov         ebp,esp
>00401A33   add         esp,0FFFFFFF4h
>00401A36   wait
>00401A37   fnstcw      word ptr [ebp-2]
>00401A3A   wait
>00401A3B   mov         ax,word ptr [ebp-2]
>00401A3F   or          ah,0Ch
>00401A42   mov         word ptr [ebp-4],ax
>00401A46   fldcw       word ptr [ebp-4]
>00401A49   fistp       qword ptr [ebp-0Ch]
>00401A4C   fldcw       word ptr [ebp-2]
>00401A4F   mov         eax,dword ptr [ebp-0Ch]
>00401A52   mov         edx,dword ptr [ebp-8]
>00401A55   leave
>
>
>Most of this deals with the need to change the rounding mode of the FPU to
>that which is required by C. The central instruction if of course "fistp".
>
>Now as it happens, for audio we want the original standard FPU rounding
>mode, which is convergent, so we don't need all the FPU control flags
>code, just the fistp! So we trade 17 instructions + function call, for
>three assember ones. Elminate the function call and things are already
>much improved, but the code to change the FPU remains, and will have to be
>there for compliance with the C rules for the int cast. I never got around
>to doing it, but the same saving can be made for Linux/X86, as gcc will
>equally have to obey the C rules, even if it is sane enough not to use a
>function call.
>
>
>Richard Dobson
>
>
>Victor Lazzarini wrote:
>
>>What's the assembly code then for truncation
>>i = (int) f;
>>for double & single precision?
>>Victor
>...
>
>--
>Send bugs reports to [hidden email]
>              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
>To unsubscribe, send email to [hidden email]

Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
Victor Lazzarini wrote:

> Would it be this (I'm sorry I don't really do assembly)?
>
> _asm{
>                 fld     fval
>          mov    eax,fval
> }

No, this would not work. However, with MinGW, you should use -march=pentium3
or -march=pentium4 if you have a CPU that supports SSE or SSE2, respectively.
Otherwise, the saving, changing, and then restoring of the FPU control word
cannot be avoided, and that is what is responsible for most of the slowdown.
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Victor Lazzarini
In reply to this post by Istvan Varga
what if I don't have lrintf() (eg. using the MSVC compiler), are
there any options?

Victor

At 14:45 26/05/2005, you wrote:

>Victor Lazzarini wrote:
>>What's the assembly code then for truncation
>>i = (int) f;
>>for double & single precision?
>
>Well, compiling this C code with gcc 4.0:
>
>/* -------- start test.c -------- */
>
>#define _ISOC99_SOURCE
>#include <math.h>
>
>int f2i_cast(float x)
>{
>     return (int) x;
>}
>
>int d2i_cast(double x)
>{
>     return (int) x;
>}
>
>int f2i_lrint(float x)
>{
>     return (int) lrintf(x);
>}
>
>int d2i_lrint(double x)
>{
>     return (int) lrint(x);
>}
>
>/* -------- end test.c -------- */
>
>For a generic x86 CPU (irrelevant lines filtered out):
>
>Compiler flags: -Wall -O2 -fomit-frame-pointer -S -masm=intel
>
>f2i_cast:
>         sub     %esp, 8
>         fnstcw  WORD PTR [%esp+6]
>         fld     DWORD PTR [%esp+12]
>         movzx   %eax, WORD PTR [%esp+6]
>         or      %ax, 3072
>         mov     WORD PTR [%esp+4], %ax
>         fldcw   WORD PTR [%esp+4]
>         fistp   DWORD PTR [%esp]
>         fldcw   WORD PTR [%esp+6]
>         mov     %eax, DWORD PTR [%esp]
>         add     %esp, 8
>         ret
>
>d2i_cast:
>         sub     %esp, 8
>         fnstcw  WORD PTR [%esp+6]
>         fld     QWORD PTR [%esp+12]
>         movzx   %eax, WORD PTR [%esp+6]
>         or      %ax, 3072
>         mov     WORD PTR [%esp+4], %ax
>         fldcw   WORD PTR [%esp+4]
>         fistp   DWORD PTR [%esp]
>         fldcw   WORD PTR [%esp+6]
>         mov     %eax, DWORD PTR [%esp]
>         add     %esp, 8
>         ret
>
>f2i_lrint:
>         sub     %esp, 16
>         fld     DWORD PTR [%esp+20]
>#APP
>         fistpl DWORD PTR [%esp+12]
>#NO_APP
>         mov     %eax, DWORD PTR [%esp+12]
>         add     %esp, 16
>         ret
>
>d2i_lrint:
>         sub     %esp, 16
>         fld     QWORD PTR [%esp+20]
>#APP
>         fistpl DWORD PTR [%esp+12]
>#NO_APP
>         mov     %eax, DWORD PTR [%esp+12]
>         add     %esp, 16
>         ret
>
>Now for Pentium III, using SSE 1 instructions:
>Compiler flags: -Wall -O3 -march=pentium3 -fomit-frame-pointer -ffast-math
>-S -masm=intel
>
>f2i_cast:
>         cvttss2si       %eax, DWORD PTR [%esp+4]
>         ret
>
>(the other functions do not change significantly)
>
>Now try Pentium 4, with SSE 2:
>Compiler flags: -Wall -O3 -march=pentium4 -fomit-frame-pointer -ffast-math
>-S -masm=intel
>
>f2i_cast:
>         cvttss2si       %eax, DWORD PTR [%esp+4]
>         ret
>
>d2i_cast:
>         cvttsd2si       %eax, QWORD PTR [%esp+4]
>         ret
>
>(again, the lrint based functions remain effectively the same)
>
>As you can see, enabling the use of SSE (with -march) can significantly
>improve float to integer casts.
>--
>Send bugs reports to [hidden email]
>              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
>To unsubscribe, send email to [hidden email]

Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Richard Dobson
Richard Dobson wrote:

> things. But the assembler round() is easily faster than the compiler
> cast (at least, with VC++), and that matters these days.

If you mean the standard round() function, beware that it is also required
to round 0.5 away from zero, and the result of that is what you can see at
the bottom of this message. rint() is better, but the fastest library function
is lrint()/lrintf() (see prvious message with assembly listings) which - with
a good compiler - will expand to a single line of assembly code.

> The Intel compiler is much better in this respect (doesn't incorporate
 > a function call they way VC++ does), so that the cast is more efficient.

I assume that it can also use cvttss2si.

> It should certainly do no harm to add the assembler routine to sysdep.h,
> with MYFLT2LRND(x) as the portable alternative.

Better yet, it can be the actual implementation of MYFLT2LONG and MYFLT2LRND
for MSVC.

-----------------------------------------------------------------------------

0000ad20 <round>:
     ad20: 55                   push   ebp
     ad21: 89 e5                 mov    ebp,esp
     ad23: 83 ec 30             sub    esp,0x30
     ad26: dd 45 08             fld    QWORD PTR [ebp+8]
     ad29: 89 75 f8             mov    DWORD PTR [ebp-8],esi
     ad2c: 89 7d fc             mov    DWORD PTR [ebp-4],edi
     ad2f: 89 5d f4             mov    DWORD PTR [ebp-12],ebx
     ad32: dd 55 d8             fst    QWORD PTR [ebp-40]
     ad35: 8b 75 d8             mov    esi,DWORD PTR [ebp-40]
     ad38: e8 52 ac ff ff       call   598f <__i686.get_pc_thunk.bx>
     ad3d: 81 c3 b7 72 01 00     add    ebx,0x172b7
     ad43: c7 45 e0 00 00 00 00 mov    DWORD PTR [ebp-32],0x0
     ad4a: 8b 7d dc             mov    edi,DWORD PTR [ebp-36]
     ad4d: c7 45 e4 00 00 00 00 mov    DWORD PTR [ebp-28],0x0
     ad54: 89 f0                 mov    eax,esi
     ad56: 89 c2                 mov    edx,eax
     ad58: 89 f8                 mov    eax,edi
     ad5a: 89 fe                 mov    esi,edi
     ad5c: c1 f8 14             sar    eax,0x14
     ad5f: 25 ff 07 00 00       and    eax,0x7ff
     ad64: 8d b8 01 fc ff ff     lea    edi,[eax-1023]
     ad6a: 83 ff 13             cmp    edi,0x13
     ad6d: 89 7d d4             mov    DWORD PTR [ebp-44],edi
     ad70: 7f 5e                 jg     add0 <round+0xb0>
     ad72: 85 ff                 test   edi,edi
     ad74: 0f 88 d9 00 00 00     js     ae53 <round+0x133>
     ad7a: 0f b6 4d d4           movzx  ecx,BYTE PTR [ebp-44]
     ad7e: bf ff ff 0f 00       mov    edi,0xfffff
     ad83: 89 f0                 mov    eax,esi
     ad85: d9 c0                 fld    st(0)
     ad87: d3 ff                 sar    edi,cl
     ad89: 21 f8                 and    eax,edi
     ad8b: 09 d0                 or     eax,edx
     ad8d: 74 52                 je     ade1 <round+0xc1>
     ad8f: dd d8                 fstp   st(0)
     ad91: dc 83 14 ad ff ff     fadd   QWORD PTR [ebx-21228]
     ad97: d9 83 00 ad ff ff     fld    DWORD PTR [ebx-21248]
     ad9d: d9 c9                 fxch   st(1)
     ad9f: df e9                 fucomip %st,st(1)
     ada1: dd d8                 fstp   st(0)
     ada3: 76 11                 jbe    adb6 <round+0x96>
     ada5: b8 00 00 08 00       mov    eax,0x80000
     adaa: d3 f8                 sar    eax,cl
     adac: 01 c6                 add    esi,eax
     adae: 89 f8                 mov    eax,edi
     adb0: f7 d0                 not    eax
     adb2: 21 c6                 and    esi,eax
     adb4: 31 d2                 xor    edx,edx
     adb6: 89 75 e4             mov    DWORD PTR [ebp-28],esi
     adb9: 89 55 e0             mov    DWORD PTR [ebp-32],edx
     adbc: dd 45 e0             fld    QWORD PTR [ebp-32]
     adbf: 8b 5d f4             mov    ebx,DWORD PTR [ebp-12]
     adc2: 8b 75 f8             mov    esi,DWORD PTR [ebp-8]
     adc5: 8b 7d fc             mov    edi,DWORD PTR [ebp-4]
     adc8: 89 ec                 mov    esp,ebp
     adca: 5d                   pop    ebp
     adcb: c3                   ret
     adcc: 8d 74 26 00           lea    esi,[esi]
     add0: 83 7d d4 33           cmp    DWORD PTR [ebp-44],0x33
     add4: 7e 1a                 jle    adf0 <round+0xd0>
     add6: 81 7d d4 00 04 00 00 cmp    DWORD PTR [ebp-44],0x400
     addd: d9 c0                 fld    st(0)
     addf: 74 6b                 je     ae4c <round+0x12c>
     ade1: dd d9                 fstp   st(1)
     ade3: 8b 5d f4             mov    ebx,DWORD PTR [ebp-12]
     ade6: 8b 75 f8             mov    esi,DWORD PTR [ebp-8]
     ade9: 8b 7d fc             mov    edi,DWORD PTR [ebp-4]
     adec: 89 ec                 mov    esp,ebp
     adee: 5d                   pop    ebp
     adef: c3                   ret
     adf0: 2d 13 04 00 00       sub    eax,0x413
     adf5: bf ff ff ff ff       mov    edi,0xffffffff
     adfa: 88 c1                 mov    cl,al
     adfc: d3 ef                 shr    edi,cl
     adfe: 85 d7                 test   edi,edx
     ae00: d9 c0                 fld    st(0)
     ae02: 74 dd                 je     ade1 <round+0xc1>
     ae04: dd d8                 fstp   st(0)
     ae06: dc 83 14 ad ff ff     fadd   QWORD PTR [ebx-21228]
     ae0c: d9 83 00 ad ff ff     fld    DWORD PTR [ebx-21248]
     ae12: d9 c9                 fxch   st(1)
     ae14: df e9                 fucomip %st,st(1)
     ae16: dd d8                 fstp   st(0)
     ae18: 76 27                 jbe    ae41 <round+0x121>
     ae1a: c7 45 ec 33 00 00 00 mov    DWORD PTR [ebp-20],0x33
     ae21: 8b 45 d4             mov    eax,DWORD PTR [ebp-44]
     ae24: 29 45 ec             sub    DWORD PTR [ebp-20],eax
     ae27: b8 01 00 00 00       mov    eax,0x1
     ae2c: 0f b6 4d ec           movzx  ecx,BYTE PTR [ebp-20]
     ae30: d3 e0                 shl    eax,cl
     ae32: 8d 04 10             lea    eax,[eax+edx]
     ae35: 39 d0                 cmp    eax,edx
     ae37: 0f 92 c2             setb   dl
     ae3a: 0f b6 d2             movzx  edx,dl
     ae3d: 01 d6                 add    esi,edx
     ae3f: 89 c2                 mov    edx,eax
     ae41: 89 f8                 mov    eax,edi
     ae43: f7 d0                 not    eax
     ae45: 21 c2                 and    edx,eax
     ae47: e9 6a ff ff ff       jmp    adb6 <round+0x96>
     ae4c: de c1                 faddp  st(1),%st
     ae4e: e9 6c ff ff ff       jmp    adbf <round+0x9f>
     ae53: dc 83 14 ad ff ff     fadd   QWORD PTR [ebx-21228]
     ae59: d9 83 00 ad ff ff     fld    DWORD PTR [ebx-21248]
     ae5f: d9 c9                 fxch   st(1)
     ae61: df e9                 fucomip %st,st(1)
     ae63: dd d8                 fstp   st(0)
     ae65: 0f 86 4b ff ff ff     jbe    adb6 <round+0x96>
     ae6b: 81 e6 00 00 00 80     and    esi,0x80000000
     ae71: 89 f0                 mov    eax,esi
     ae73: 0d 00 00 f0 3f       or     eax,0x3ff00000
     ae78: 47                   inc    edi
     ae79: 0f 44 f0             cmove  esi,eax
     ae7c: e9 33 ff ff ff       jmp    adb4 <round+0x94>
--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: scons MSVC report

Istvan Varga
In reply to this post by Victor Lazzarini
Victor Lazzarini wrote:

> what if I don't have lrintf() (eg. using the MSVC compiler), are
> there any options?

I think I will add the MSVC specific assembly code to sysdep.h so that
it will be the implementation of MYFLT2LONG and MYFLT2LRND if useLrint
is not enabled and _MSC_VER and WIN32 are defined.

--
Send bugs reports to [hidden email]
              (or to http://www.cs.bath.ac.uk/cgi-bin/csound )
To unsubscribe, send email to [hidden email]
12
Loading...