Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, [wasm] for web assembly, and for Java bytecode, use the tag java-bytecode-asm instead.

Filter by
Sorted by
Tagged with
2310 votes
12 answers
235k views

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and ...
xis's user avatar
  • 24.6k
1772 votes
15 answers
151k views

Is < faster than <=?

Is if (a < 901) faster than if (a <= 900)? Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated ...
Vinícius's user avatar
  • 15.6k
1638 votes
11 answers
197k views

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. ...
gexicide's user avatar
  • 39.2k
941 votes
11 answers
181k views

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

I wrote these two solutions for Project Euler Q14, in assembly and in C++. They implement identical brute force approach for testing the Collatz conjecture. The assembly solution was assembled with: ...
rosghub's user avatar
  • 9,084
880 votes
17 answers
912k views

What's the purpose of the LEA instruction?

For me, it just seems like a funky MOV. What's its purpose and when should I use it?
user200557's user avatar
  • 8,989
700 votes
4 answers
91k views

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE ...
user1059432's user avatar
  • 7,568
532 votes
17 answers
527k views

How do you get assembler output from C/C++ source in GCC?

How does one do this? If I want to analyze how something is getting compiled, how would I get the emitted assembly code?
Doug T.'s user avatar
  • 65k
502 votes
40 answers
150k views

When is assembly faster than C? [closed]

One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. ...
320 votes
7 answers
87k views

Why does this code execute more slowly after strength-reducing multiplications to loop-carried additions?

I was reading Agner Fog's optimization manuals, and I came across this example: double data[LEN]; void compute() { const double A = 1.1, B = 2.2, C = 3.3; int i; for(i=0; i<LEN; i++) {...
ttsiodras's user avatar
  • 11k
318 votes
16 answers
938k views

Is it possible to "decompile" a Windows .exe? Or at least view the Assembly?

A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in ...
swilliams's user avatar
  • 48.5k
315 votes
11 answers
220k views

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able ...
James's user avatar
  • 3,692
312 votes
11 answers
72k views

What does multicore assembly language look like?

Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc. With modern CPUs that have 4 ...
Paul Hollingsworth's user avatar
311 votes
4 answers
130k views

How to run a program without an operating system?

How do you run a program all by itself without an operating system running? Can you create assembly programs that the computer can load and run at startup, e.g. boot the computer from a flash drive ...
user2320609's user avatar
  • 2,117
287 votes
5 answers
16k views

Why does Java switch on contiguous ints appear to run faster with added cases?

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying ...
Andrew Bissell's user avatar
285 votes
12 answers
74k views

Is 'switch' faster than 'if'?

Is a switch statement actually faster than an if statement? I ran the code below on Visual Studio 2010's x64 C++ compiler with the /Ox flag: #include <stdlib.h> #include <stdio.h> #include ...
user541686's user avatar
  • 208k
284 votes
6 answers
263k views

What exactly is the base pointer and stack pointer? To what do they point?

Using this example coming from Wikipedia, in which DrawSquare() calls DrawLine(): (Note that this diagram has high addresses at the bottom and low addresses at the top.) Could anyone explain to me ...
devoured elysium's user avatar
283 votes
10 answers
194k views

Assembly code vs Machine code vs Object code?

What is the difference between object code, machine code and assembly code? Can you give a visual example of their difference?
mmcdole's user avatar
  • 92.2k
282 votes
3 answers
98k views

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to ...
BeeOnRope's user avatar
  • 62.5k
271 votes
5 answers
38k views

Why does GCC use multiplication by a strange number in implementing integer division?

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include <stdlib.h> #include <stdio.h> int ...
qiubit's user avatar
  • 4,766
239 votes
8 answers
351k views

Show current assembly instruction in GDB

I'm doing some assembly-level debugging in GDB. Is there a way to get GDB to show me the current assembly instruction in the same way that it shows the current source line? The default output after ...
JSBձոգչ's user avatar
  • 41.1k
236 votes
4 answers
38k views

Why would introducing useless MOV store instructions speed up a tight loop in x86_64 assembly?

Background: While optimizing some Pascal code with embedded assembly language, I noticed an unnecessary MOV instruction, and removed it. To my surprise, removing the un-necessary instruction caused ...
tangentstorm's user avatar
  • 7,262
229 votes
25 answers
83k views

Protecting executable from reverse engineering?

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been ...
graphitemaster's user avatar
217 votes
5 answers
277k views

The point of test %eax %eax [duplicate]

Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? I'm very very new to assembly language programming, and I'm currently trying to read the assembly language generated from a binary. I'...
pauliwago's user avatar
  • 6,513
216 votes
32 answers
287k views

Why aren't programs written in Assembly more often? [closed]

It seems to be a mainstream opinion that assembly programming takes longer and is more difficult to program in than a higher level language such as C. Therefore it seems to be recommend or assumed ...
205 votes
21 answers
73k views

Is inline assembly language slower than native C++ code?

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code: #define TIMES 100000 void calcuC(...
user957121's user avatar
  • 3,006
201 votes
4 answers
184k views

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux: http://www.int80h.org/bsdasm/#system-calls http://www.freebsd.org/doc/en/books/developers-handbook/x86-...
claws's user avatar
  • 53.3k
190 votes
3 answers
36k views

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(...
orlp's user avatar
  • 114k
189 votes
13 answers
25k views

Is incrementing an int effectively atomic in specific cases?

In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here): void f() { int ...
Leo Heinsaar's user avatar
  • 3,997
189 votes
12 answers
249k views

What is the difference between MOV and LEA?

I would like to know what the difference between these instructions is: MOV AX, [TABLE-ADDR] and LEA AX, [TABLE-ADDR]
codeandcloud's user avatar
  • 54.4k
189 votes
1 answer
87k views

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
balajimc55's user avatar
  • 2,263
183 votes
3 answers
120k views

How do you use gcc to generate assembly code in Intel syntax?

The gcc -S option will generate assembly code in AT&T syntax, is there a way to generate files in Intel syntax? Or is there a way to convert between the two?
hyperlogic's user avatar
  • 7,635
183 votes
1 answer
79k views

Why do ARM chips have an instruction with Javascript in the name (FJCVTZS)?

FJCVTZS is "Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero". It is supported in Arm v8.3-A chips and later. Which is odd, because you don't expect to see ...
Tim Smith's user avatar
  • 1,764
179 votes
4 answers
49k views

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (...
Nubok's user avatar
  • 3,612
161 votes
4 answers
55k views

What is the meaning of "non temporal" memory accesses in x86

This is a somewhat low-level question. In x86 assembly there are two SSE instructions: MOVDQA xmmi, m128 and MOVNTDQA xmmi, m128 The IA-32 Software Developer's Manual says that the NT in ...
Nathan Fellman's user avatar
160 votes
3 answers
200k views

What does `dword ptr` mean?

Could someone explain what this means? (Intel Syntax, x86, Windows) and dword ptr [ebp-4], 0
小太郎's user avatar
  • 5,570
158 votes
14 answers
124k views

How can I see the assembly code for a C++ program?

How can I see the assembly code for a C++ program? What are the popular tools to do this?
Geek's user avatar
  • 23.2k
156 votes
6 answers
110k views

What is the purpose of XORing a register with itself? [duplicate]

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0? 012B1002 in al,dx 012B1003 push ...
devoured elysium's user avatar
155 votes
5 answers
275k views

Purpose of ESI & EDI registers?

What is the actual purpose and use of the EDI & ESI registers in assembler? I know they are used for string operations for one thing. Can someone also give an example?
Tony The Lion's user avatar
154 votes
13 answers
15k views

How are everyday machines programmed? [closed]

How are everyday machines (not so much computers and mobile devices as appliances, digital watches, etc) programmed? What kind of code goes into the programming of a Coca-Cola vending machine? How ...
147 votes
5 answers
461k views

What is the function of the push / pop instructions used on registers in x86 assembly?

When reading about assembler I often come across people writing that they push a certain register of the processor and pop it again later to restore it's previous state. How can you push a register? ...
Ars emble's user avatar
  • 1,489
147 votes
7 answers
44k views

How does this milw0rm heap spraying exploit work?

I usually do not have difficulty to read JavaScript code but for this one I can’t figure out the logic. The code is from an exploit that has been published 4 days ago. You can find it at milw0rm. ...
Patrick Desjardins's user avatar
147 votes
2 answers
167k views

What is the purpose of the RBP register in x86_64 assembler?

I'm trying to learn a little bit of assembly, because I need it for Computer Architecture class. I wrote a few programs, like printing the Fibonacci sequence. I recognized that whenever I write a ...
user avatar
143 votes
3 answers
311k views

How can one see content of stack with GDB?

I am new to GDB, so I have some questions: How can I look at content of the stack? Example: to see content of register, I type info registers. For the stack, what should it be? How can I see the ...
user avatar
141 votes
11 answers
161k views

How to view the assembly behind the code using Visual C++?

I was reading another question pertaining the efficiency of two lines of code, and the OP said that he looked at the assembly behind the code and both lines were identical in assembly. Digression ...
user avatar
138 votes
11 answers
285k views

How to disassemble a binary executable in Linux to get the assembly code?

I was told to use a disassembler. Does gcc have anything built in? What is the easiest way to do this?
Syntax_Error's user avatar
  • 6,102
138 votes
4 answers
71k views

What are CFI directives in Gnu Assembler (GAS) used for?

There seem to be a .CFI directive after every line and also there are wide varieties of these ex.,.cfi_startproc , .cfi_endproc etc.. more here. .file "temp.c" .text .globl main ...
claws's user avatar
  • 53.3k
138 votes
6 answers
127k views

What is the "FS"/"GS" register intended for?

So I know what the following registers and their uses are supposed to be: CS = Code Segment (used for IP) DS = Data Segment (used for MOV) ES = Destination Segment (used for MOVS, etc.) SS = Stack ...
user541686's user avatar
  • 208k
138 votes
4 answers
33k views

Why does Windows64 use a different calling convention from all other OSes on x86-64?

AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it's own x86-64 calling convention. Why? Does anyone know the ...
JanKanis's user avatar
  • 6,494
136 votes
9 answers
189k views

What does "int 0x80" mean in assembly code?

Can someone explain what the following assembly code does? int 0x80
Josh Curren's user avatar
  • 10.3k
133 votes
3 answers
7k views

Possible GCC bug when returning struct from a function

I believe I found a bug in GCC while implementing O'Neill's PCG PRNG. (Initial code on Godbolt's Compiler Explorer) After multiplying oldstate by MULTIPLIER, (result stored in rdi), GCC doesn't add ...
vitorhnn's user avatar
  • 1,043

1
2 3 4 5
887