前列腺有什么症状| 丿是什么字| 葡萄像什么| 鼻窦炎用什么药效果最好| 哈西奈德溶液治什么病| 疫苗是什么| 刺猬的刺有什么作用| 老树盘根是什么意思| 十玉九裂是什么意思| 陆陆续续是什么意思| 废话是什么意思| 头孢是什么| 卵黄囊偏大是什么原因| 同舟共济是什么意思| 男人左眼皮跳是什么预兆| 扦脚是什么意思| 农历七月是什么月份| 乳糖不耐受不能吃什么| 最小的单位是什么| 楔形是什么形状图片| 世界上最多笔画的字是什么| 低密度脂蛋白高是什么原因| 世界上最高的高原是什么| 舌下腺囊肿挂什么科| 汽球是什么生肖| 叹服是什么意思| 月经吃什么水果| 范仲淹号什么| 河东狮吼什么意思| 什么牌子的洗发水好用| 享福是什么意思| o型血生的孩子是什么血型| 什么是电商平台| 男女接吻有什么好处| 十二生肖各代表什么花| 富士山什么时候喷发| 劲酒兑什么饮料好喝| 人彘是什么意思| 勺子是什么意思| 西葫芦不能和什么一起吃| 宫颈炎有什么症状表现| 吃什么对血管好| 吃亚麻籽有什么好处| 身上长红色痣是什么原因| 十一月十七日是什么星座| 宝宝照蓝光有什么副作用| 孤独终老什么意思| 7月30号什么星座| 羊和什么相冲| 为什么会说梦话| 宝宝佛适合什么人戴| 淋巴细胞偏低是什么意思| 如初是什么意思| 踮脚有什么好处| 喜闻乐见什么意思| 缩阳是什么意思| 孙俪是什么星座| 脂肪肝挂什么科| 吃燕窝有什么功效| 绿树成荫是什么季节| 家慈是什么意思| 核准是什么意思| 例假淋漓不尽是什么原因造成的| 下身灼热感什么原因| 女贞子是什么| 黄体破裂是什么症状| 河南为什么简称豫| 猴子偷桃是什么意思| 头疼吃什么药最有效| 什么是体脂率| 腋毛癣用什么药膏最好| 产后42天复查都检查什么| 小孩满月送什么礼物好| 毛巾为什么会臭| 胆小如鼠的意思是什么| 手上脱皮是什么原因| 冠脉造影是什么意思| 嫡长子是什么意思| 正高是什么级别| 宝宝什么时候断奶最好| 尿常规异常是什么意思| 山竹里面黄黄的是什么| 半夜喉咙痒咳嗽是什么原因| acd是什么意思| 黄桃什么时候上市| 桦树茸有什么作用| 什么是一桌餐| 犒赏是什么意思| 7月29号是什么日子| 头一直疼是什么原因| 霉菌性炎症用什么药效果最好| 1952年属什么生肖| 表白送什么花| 有气质是什么意思| 一如既往什么意思| 狐臭应该挂什么科| 尿液阳性是什么意思| 214是什么意思| 婴儿头发竖起来是什么原因| 梦见买豆腐是什么意思| 什么药膏能让疣体脱落| 珍珠母贝是什么东西| 熟啤酒是什么意思| 1922年属什么生肖| 2000年出生属什么| 坠积效应是什么意思| 冰箱什么牌子好又省电质量又好| 非即食是什么意思| 直肠指检能检查出什么| 滑石粉是什么| 高压氧舱治疗什么效果| 什么东西燃烧脂肪最快| 荷花象征什么| 什么是躯体化症状表现| 睡觉张嘴巴是什么原因| 孤寡是什么意思| 花胶有什么功效| abr是什么检查| rh阳性是什么意思| 哺乳期吃什么下奶| 粘土是什么土| 勃是什么意思| 1999年属兔的是什么命| 锁阳泡酒有什么功效| 注意是什么意思| 宝宝多吃什么蔬菜好| 微量元素检查挂什么科| 暖和的什么| 中国铁塔是干什么的| 手关节黑是什么原因| 尿酸低有什么危害| 探花是什么意思| 推测是什么意思| 晚上睡觉睡不着是什么原因| 心电图是检查什么的| 月经病是什么意思啊| 黄芪入什么经| 女生经常手淫有什么危害| 公务员属于什么行业| 三月初一是什么星座| 空调抽真空是什么意思| 桑蚕丝被有什么好处| 桂圆什么时候上市| 微信被拉黑后显示什么| 建档立卡是什么| 葱长什么样| 苏州有什么好玩的地方| 鼻窦炎挂什么科| 喉咙长溃疡是什么原因| 脚脖子浮肿是什么原因引起的| 福寿延绵是什么意思| 摇呼啦圈有什么好处| gxg是什么牌子| 猫摇尾巴是什么意思| 双卵巢是什么意思| 卵子是什么| 安是什么生肖| 炮机是什么| 消化功能紊乱吃什么药| 空调病吃什么药| 被利用的信任是什么歌| 9月13日什么星座| 4月18日什么星座| 为什么一来月经就头疼| 为什么老做梦| 别开生面是什么意思| 天津五行属什么| 葫芦什么时候开花| 吃什么补气血最快| 银色的什么| 什么面什么方| 辐射对称是什么意思| 吕布的马叫什么名字| 孩子脚后跟疼是什么原因| 夏天摆摊适合卖什么| 什么是打飞机| 低血糖挂什么科| 抹茶绿配什么颜色好看| 红参有什么作用| 轻微脑梗吃什么药| 帕金森挂什么科| 肛裂是什么样子的图片| 戴玉手镯有什么好处| 反应迟钝是什么原因造成的| mdt是什么意思| 田螺姑娘是什么意思| 肾功能不好吃什么药调理| 吃什么补血小板快| 止境是什么意思| 死灰复燃是什么意思| 那天离开你是什么歌| 吃甘草片有什么副作用| 什么日什么秋| 膝盖发软无力是什么原因| cos是什么牌子| 念珠菌感染用什么药| 泌尿系统感染挂什么科| 巨蟹是什么象星座| 因子是什么| 桃符指的是什么| 清水文是什么意思| 心血管堵塞吃什么药| 一劳永逸什么意思| 洗完牙需要注意什么| 十年婚姻是什么婚| 副区长什么级别| 基层是什么意思| 大便隐血弱阳性是什么意思| 二重唱是什么意思| 吃榴莲不能吃什么| 梦见孩子哭是什么意思| pr是什么| 西柚是什么水果| 肛周脓肿什么症状| 省政协主席什么级别| www是什么网| 科目二学什么| 米娜桑是什么意思| senda是什么牌子| 瓞是什么意思| 有机和无机是什么意思| 三季人是什么意思| 老是打嗝是什么病的征兆| 长瘊子是什么原因| 心肌梗塞有什么症状| 高血压突然变成低血压是什么原因| 刁子鱼是什么鱼| 炙什么意思| 不怀孕需要做什么检查项目| 天上的月亮是什么生肖| 狗消化不良吃什么药| 心气虚吃什么中成药| 为什么减肥不建议喝粥| 周杰伦什么时候出道| 坐月子什么不能吃| 住房公积金缴存基数是什么意思| 一什么沙发| 为什么脸一边大一边小| 养尊处优什么意思| 人乳头瘤病毒hpv是什么意思| 嘴巴疱疹用什么药膏| 烂脚丫用什么药| 见地是什么意思| 坐飞机要带什么证件| 打点滴是什么意思| 什么是高血脂| 三个龙读什么| 肝回声密集是什么意思| 十月十二日是什么星座| 脚肿吃什么药消肿| 胃疼吃什么药效果好| 风口浪尖是什么意思| 什么油最健康| 什么时候洗头是最佳时间| 今年流行什么颜色头发| 苹果吃了有什么好处| 什么是ci| 钡餐是什么| 真实是什么意思| 陈小春什么星座| 漫不经心是什么意思| 非礼什么意思| 6月19日是什么日子| 看见黑猫代表什么预兆| 百度

智慧社会让生活更美好

百度 这样的办案方式,简化了办案程序,强化了办案效率。

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption.[1] Optimization is generally implemented as a sequence of optimizing transformations, a.k.a. compiler optimizations – algorithms that transform code to produce semantically equivalent code optimized for some aspect.

Optimization is limited by a number of factors. Theoretical analysis indicates that some optimization problems are NP-complete, or even undecidable.[2] Also, producing perfectly optimal code is not possible since optimizing for one aspect often degrades performance for another. Optimization is a collection of heuristic methods for improving resource usage in typical programs.[3]:?585?

Categorization

edit

Local vs. global scope

edit

Scope describes how much of the input code is considered to apply optimizations.

Local scope optimizations use information local to a basic block.[4] Since basic blocks contain no control flow statements, these optimizations require minimal analysis, reducing time and storage requirements. However, no information is retained across jumps.

Global scope optimizations, also known as intra-procedural optimizations, operate on individual functions.[4] This gives them more information to work with but often makes expensive computations necessary. Worst-case assumptions need to be made when function calls occur or global variables are accessed because little information about them is available.

Peephole optimization

edit

Peephole optimizations are usually performed late in the compilation process after machine code has been generated. This optimization examines a few adjacent instructions (similar to "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions.[3]:?554? For instance, a multiplication of a value by two might be more efficiently executed by left-shifting the value or by adding the value to itself (this example is also an instance of strength reduction).

Inter-procedural optimization

edit

Interprocedural optimizations analyze all of a program's source code. The more information available, the more effective the optimizations can be. The information can be used for various optimizations, including function inlining, where a call to a function is replaced by a copy of the function body.

edit

Link-time optimization (LTO), or whole-program optimization, is a more general class of interprocedural optimization. During LTO, the compiler has visibility across translation units which allows it to perform more aggressive optimizations like cross-module inlining and devirtualization.

Machine and object code optimization

edit

Machine code optimization involves using an object code optimizer to analyze the program after all machine code has been linked. Techniques such as macro compression, which conserves space by condensing common instruction sequences, become more effective when the entire executable task image is available for analysis.[5]

Language-independent vs. language-dependent

edit

Most high-level programming languages share common programming constructs and abstractions, such as branching constructs (if, switch), looping constructs (for, while), and encapsulation constructs (structures, objects). Thus, similar optimization techniques can be used across languages. However, certain language features make some optimizations difficult. For instance, pointers in C and C++ make array optimization difficult; see alias analysis. However, languages such as PL/I that also support pointers implement optimizations for arrays. Conversely, some language features make certain optimizations easier. For example, in some languages, functions are not permitted to have side effects. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can infer that the function's result only needs to be computed once. In languages where functions are allowed to have side effects, the compiler can restrict such optimization to functions that it can determine have no side effects.

Machine-independent vs. machine-dependent

edit

Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. Examples are instructions that do several things at once, such as decrement register and branch if not zero.

The following is an instance of a local machine-dependent optimization. To set a register to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself or subtract it from itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register"; the same applies on IBM System/360 and successors for the subtract variant.[6] A potential problem with this is that XOR or subtract may introduce a data dependency on the previous value of the register, causing a pipeline stall, which occurs when the processor must delay execution of an instruction because it depends on the result of a previous instruction. However, processors often treat the XOR of a register with itself or the subtract of a register from itself as a special case that does not cause stalls.

Factors affecting optimization

edit
Target machine
Whether particular optimizations can and should be applied may depend on the characteristics of the target machine. Some compilers such as GCC and Clang parameterize machine-dependent factors so that they can be used to optimize for different machines.[7]
Target CPU architecture
  • Number of registers: Registers can be used to optimize for performance. Local variables can be stored in registers instead of the stack. Temporary/intermediate results can be accessed in registers instead of slower memory.
  • RISC vs. CISC: CISC instruction sets often have variable instruction lengths,[8] often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant in length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see instruction selection).
  • Pipelines: A pipeline is a CPU broken up into an assembly line. It allows the use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to pipeline stalls: where the CPU wastes cycles waiting for a conflict to resolve. Compilers can schedule, or reorder, instructions so that pipeline stalls occur less frequently.
  • Number of functional units: Some CPUs have several ALUs and FPUs that allow them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts. Instructions can be scheduled so that the functional units are fully loaded.
Machine architecture
  • CPU cache size and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as inline expansion and loop unrolling may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly used section of code (like inner loops in various algorithms) no longer fits in the cache as a result of optimizations that increase code size. Also, caches that are not fully associative have higher chances of cache collisions even in an unfilled cache.
  • Cache/memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications.
Intended use
  • Debugging: During development, optimizations are often disabled to speed compilation or to make the executable code easier to debug. Optimizing transformations, particularly those that reorder code, can make it difficult to relate the executable code to the source code.
  • General-purpose use: Prepackaged software is often expected to run on a variety of machines that may share the same instruction set but have different performance characteristics. The code may not be optimized to any particular machine or may be tuned to work best on the most popular machine while working less optimally on others.
  • Special-purpose use: If the software is compiled for machines with uniform characteristics, then the compiler can heavily optimize the generated code for those machines.
Notable cases include code designed for parallel and vector processors, for which special parallelizing compilers are used.
Firmware for an embedded system can be optimized for the target CPU and memory. System cost or reliability may be more important than the code speed. For example, compilers for embedded software usually offer options that reduce code size at the expense of speed. The code's timing may need to be predictable, rather than as fast as possible, so code caching might be disabled, along with compiler optimizations that require it.

Common themes

edit

Optimization includes the following, sometimes conflicting themes.

Optimize the common case
The common case may have unique properties that allow a fast path at the expense of a slow path. If the fast path is taken more often, the result is better overall performance.
Avoid redundancy
Reuse results that are already computed and store them for later use, instead of recomputing them.
Less code
Remove unnecessary computations and intermediate values. Less work for the CPU, cache, and memory usually results in faster execution. Alternatively, in embedded systems, less code brings a lower product cost.
Fewer jumps by using straight line code, also called branch-free code
Less complicated code. Jumps (conditional or unconditional branches) interfere with the prefetching of instructions, thus slowing down code. Using inlining or loop unrolling can reduce branching, at the cost of increasing binary file size by the length of the repeated code. This tends to merge several basic blocks into one.
Locality
Code and data that are accessed closely together in time should be placed close together in memory to increase spatial locality of reference.
Exploit the memory hierarchy
Accesses to memory are increasingly more expensive for each level of the memory hierarchy, so place the most commonly used items in registers first, then caches, then main memory, before going to disk.
Parallelize
Reorder operations to allow multiple computations to happen in parallel, either at the instruction, memory, or thread level.
More precise information is better
The more precise the information the compiler has, the better it can employ any or all of these optimization techniques.
Runtime metrics can help
Information gathered during a test run can be used in profile-guided optimization. Information gathered at runtime, ideally with minimal overhead, can be used by a JIT compiler to dynamically improve optimization.
Strength reduction
Replace complex, difficult, or expensive operations with simpler ones. For example, replacing division by a constant with multiplication by its reciprocal, or using induction variable analysis to replace multiplication by a loop index with addition.

Specific techniques

edit

Loop optimizations

edit

Loop optimization acts on the statements that make up a loop, such as a for loop, for example loop-invariant code motion. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.[3]:?596?

Some optimization techniques primarily designed to operate on loops include:

Induction variable analysis
Roughly, if a variable in a loop is a simple linear function of the index variable, such as j := 4*i + 1, it can be updated appropriately each time the loop variable is changed. This is a strength reduction and also may allow the index variable's definitions to become dead code.[3]:?596–598? This information is also useful for bounds-checking elimination and dependence analysis, among other things.
Loop fission or loop distribution
Loop fission attempts to break a loop into multiple loops over the same index range with each new loop taking only a part of the original loop's body. This can improve locality of reference to both the data being accessed within the loop and the code in the loop's body.
Loop fusion or loop combining or loop ramming or loop jamming
Another technique that attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times regardless of whether that number is known at compile time, their bodies can be combined as long as they do not refer to each other's data.
Loop inversion
This technique changes a standard while loop into a do/while (also known as repeat/until) loop wrapped in an if conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the if guard can be skipped.
Loop interchange
These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve the locality of reference, depending on the array's layout.
Loop-invariant code motion
If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins.[3]:?596? This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop.
Loop nest optimization
Some pervasive algorithms such as matrix multiplication have very poor cache behavior and excessive memory accesses. Loop nest optimization increases the number of cache hits by operating over small blocks and by using a loop interchange.
Loop reversal
Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization that can help eliminate dependencies and thus enable other optimizations. Furthermore, on some architectures, loop reversal contributes to smaller code, as when the loop index is being decremented, the condition that needs to be met for the running program to exit the loop is a comparison with zero. This is often a special, parameter-less instruction, unlike a comparison with a number, which needs the number to compare to. Therefore, the amount of bytes needed to store the parameter is saved by using the loop reversal. Additionally, if the comparison number exceeds the size of word of the platform, in standard loop order, multiple instructions would need to be executed to evaluate the comparison, which is not the case with loop reversal.
Loop unrolling
Unrolling duplicates the body of the loop multiple times, to decrease the number of times the loop condition is tested and the number of jumps; tests and jumps can hurt performance by impairing the instruction pipeline. A "fewer jumps" optimization. Completely unrolling a loop eliminates all overhead, but requires that the number of iterations be known at compile time.
Loop splitting
Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops that have the same bodies but iterate over different contiguous portions of the index range. A useful special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop.
Loop unswitching
Unswitching moves a conditional from inside a loop to outside the loop by duplicating the loop's body inside each of the if and else clauses of the conditional.
Software pipelining
The loop is restructured in such a way that work done in an iteration is split into several parts and done over several iterations. In a tight loop, this technique hides the latency between loading and using values.
Automatic parallelization
A loop is converted into multi-threaded or vectorized (or even both) code to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine, including multi-core machines.

Prescient store optimizations

edit

Prescient store optimizations allow store operations to occur earlier than would otherwise be permitted in the context of threads and locks. The process needs some way of knowing ahead of time what value will be stored by the assignment that it should have followed. The purpose of this relaxation is to allow compiler optimization to perform certain kinds of code rearrangements that preserve the semantics of properly synchronized programs.[9]

Data-flow optimizations

edit

Data-flow optimizations, based on data-flow analysis, primarily depend on how certain properties of data are propagated by control edges in the control-flow graph. Some of these include:

Common subexpression elimination
In the expression (a + b) - (a + b)/4, "common subexpression" refers to the duplicated (a + b). Compilers implementing this technique realize that (a + b) will not change, and so only calculate its value once.[3]:?592–594?
Constant folding and propagation
Replacing expressions consisting of constants (e.g., 3 + 5) with their final value (8) at compile time, rather than doing the calculation in run-time.[10] Used in most modern languages.
Induction variable recognition and elimination
See discussion above about induction variable analysis.
Alias classification and pointer analysis
In the presence of pointers, it is difficult to make any optimizations at all, since potentially any variable can have been changed when a memory location is assigned to. By specifying which pointers can alias which variables, unrelated pointers can be ignored.
Dead-store elimination
Removal of assignments to variables that are not subsequently read, either because the lifetime of the variable ends or because of a subsequent assignment that will overwrite the first value.

SSA-based optimizations

edit

These optimizations are intended to be done after transforming the program into a special form called Static Single Assignment, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation.

Global value numbering
GVN eliminates redundancy by constructing a value graph of the program, and then determining which values are computed by equivalent expressions. GVN can identify some redundancy that common subexpression elimination cannot, and vice versa.
Sparse conditional constant propagation
Combines constant propagation, constant folding, and dead-code elimination, and improves upon what is possible by running them separately.[11][12] This optimization symbolically executes the program, simultaneously propagating constant values and eliminating portions of the control-flow graph that this makes unreachable.

Code generator optimizations

edit
Register allocation
The most frequently used variables should be kept in processor registers for the fastest access. To find which variables to put in registers, an interference-graph is created. Each variable is a vertex and when two variables are used at the same time (have an intersecting liverange) they have an edge between them. This graph is colored using for example Chaitin's algorithm using the same number of colors as there are registers. If the coloring fails one variable is "spilled" to memory and the coloring is retried.
Instruction selection
Most architectures, particularly CISC architectures and those with many addressing modes, offer several different ways of performing a particular operation, using entirely different sequences of instructions. The job of the instruction selector is to do a good job overall of choosing which instructions to implement which operators in the low-level intermediate representation with. For example, on many processors in the 68000 family and the x86 architecture, complex addressing modes can be used in statements like lea 25(a1,d5*4), a0, allowing a single instruction to perform a significant amount of arithmetic with less storage.
Instruction scheduling
Instruction scheduling is an important optimization for modern pipelined processors, which avoids stalls or bubbles in the pipeline by clustering instructions with no dependencies together, while being careful to preserve the original semantics.
Rematerialization
Rematerialization recalculates a value instead of loading it from memory, eliminating an access to memory. This is performed in tandem with register allocation to avoid spills.
Code factoring
If several sequences of code are identical, or can be parameterized or reordered to be identical, they can be replaced with calls to a shared subroutine. This can often share code for subroutine set-up and sometimes tail-recursion.[13]
Trampolines
Many[citation needed] CPUs have smaller subroutine call instructions to access low memory. A compiler can save space by using these small calls in the main body of code. Jump instructions in low memory can access the routines at any address. This multiplies space savings from code factoring.[13]
Reordering computations
Based on integer linear programming, restructuring compilers enhance data locality and expose more parallelism by reordering computations. Space-optimizing compilers may reorder code to lengthen sequences that can be factored into subroutines.

Functional language optimizations

edit

Although many of these also apply to non-functional languages, they either originate in or are particularly critical in functional languages such as Lisp and ML.

Tail-call optimization
A function call consumes stack space and involves some overhead related to parameter passing and flushing the instruction cache. Tail-recursive algorithms can be converted to iteration through a process called tail-recursion elimination or tail-call optimization.
Deforestation (data structure fusion)
In languages where it is common for a sequence of transformations to be applied to a list, deforestation attempts to remove the construction of intermediate data structures.
Partial evaluation
Computations that produce the same output regardless of the dynamic input at runtime can be evaluated at compile time.

Other optimizations

edit
Bounds-checking elimination
Many languages, such as Java, enforce bounds checking of all array accesses. This is a severe performance bottleneck on certain applications such as scientific code. Bounds-checking elimination allows the compiler to safely remove bounds checking in many situations where it can determine that the index must fall within valid bounds; for example, if it is a simple loop variable.
Branch-offset optimization (machine dependent)
Choose the shortest branch displacement that reaches the target.
Code-block reordering
Code-block reordering alters the order of the basic blocks in a program to reduce conditional branches and improve the locality of reference.
Dead-code elimination
Removes instructions that will not affect the behaviour of the program, for example, definitions that have no uses, called dead code. This reduces code size and eliminates unnecessary computation.
Factoring out of invariants (loop invariants)
If an expression is carried out both when a condition is met and is not met, it can be written just once outside of the conditional statement. Similarly, if certain types of expressions (e.g., the assignment of a constant into a variable) appear inside a loop, they can be moved out of it because their effect will be the same no matter if they're executed many times or just once. This is also known as total redundancy elimination. A similar but more powerful optimization is partial-redundancy elimination (PRE).
Inline expansion or macro expansion
When some code invokes a procedure, it is possible to directly insert the body of the procedure inside the calling code rather than transferring control to it. This saves the overhead related to procedure calls, as well as providing an opportunity for many different parameter-specific optimizations, but comes at the cost of space; the procedure body is duplicated each time the procedure is called inline. Generally, inlining is useful in performance-critical code that makes a large number of calls to small procedures. This is a "fewer jumps" optimization. The statements of imperative programming languages are also an example of such an optimization. Although statements could be implemented with function calls they are almost always implemented with code inlining.
Jump threading
In this optimization, consecutive conditional jumps predicated entirely or partially on the same condition are merged.
E.g., if (c) { foo; } if (c) { bar; } to if (c) { foo; bar; },
and if (c) { foo; } if (!c) { bar; } to if (c) { foo; } else { bar; }.
Macro compression
A space optimization that recognizes common sequences of code, creates subprograms ("code macros") that contain the common code, and replaces the occurrences of the common code sequences with calls to the corresponding subprograms.[5] This is most effectively done as a machine code optimization, when all the code is present. The technique was first used to conserve space in an interpretive byte stream used in an implementation of Macro Spitbol on microcomputers.[14] The problem of determining an optimal set of macros that minimizes the space required by a given code segment is known to be NP-complete,[5] but efficient heuristics attain near-optimal results.[15]
Reduction of cache collisions
(e.g., by disrupting alignment within a page)
Stack-height reduction
Rearrange an expression tree to minimize resources needed for expression evaluation.[clarification needed]
Test reordering
If we have two tests that are the condition for something, we can first deal with the simpler tests (e.g., comparing a variable to something) and only then with the complex tests (e.g., those that require a function call). This technique complements lazy evaluation, but can be used only when the tests are not dependent on one another. Short-circuiting semantics can make this difficult.

Interprocedural optimizations

edit

Interprocedural optimization works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and a global part. Typical interprocedural optimizations are procedure inlining, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis, array access analysis, and the construction of a call graph.

Interprocedural optimization is common in modern commercial compilers from SGI, Intel, Microsoft, and Sun Microsystems. For a long time, the open source GCC was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving.[16] Another open-source compiler with full analysis and optimization infrastructure is Open64.

Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.

Practical considerations

edit

There can be a wide range of optimizations that a compiler can perform, ranging from simple and straightforward optimizations that take little compilation time to elaborate and complex optimizations that involve considerable amounts of compilation time.[3]:?15? Accordingly, compilers often provide options to their control command or procedure to allow the compiler user to choose how much optimization to request; for instance, the IBM FORTRAN H compiler allowed the user to specify no optimization, optimization at the registers level only, or full optimization.[3]:?737? By the 2000s, it was common for compilers, such as Clang, to have several compiler command options that could affect a variety of optimization choices, starting with the familiar -O2 switch.[17]

An approach to isolating optimization is the use of so-called post-pass optimizers (some commercial versions of which date back to mainframe software of the late 1970s).[18] These tools take the executable output by an optimizing compiler and optimize it even further. Post-pass optimizers usually work on the assembly language or machine code level (in contrast with compilers that optimize intermediate representations of programs). One such example is the Portable C Compiler (PCC) of the 1980s, which had an optional pass that would perform post-optimizations on the generated assembly code.[3]:?736?

Another consideration is that optimization algorithms are complicated and, especially when being used to compile large, complex programming languages, can contain bugs that introduce errors in the generated code or cause internal errors during compilation. Compiler errors of any kind can be disconcerting to the user, but especially so in this case, since it may not be clear that the optimization logic is at fault.[19] In the case of internal errors, the problem can be partially ameliorated by a "fail-safe" programming technique in which the optimization logic in the compiler is coded such that a failure is trapped, a warning message issued, and the rest of the compilation proceeds to successful completion.[20]

History

edit

Early compilers of the 1960s were often primarily concerned with simply compiling code correctly or efficiently, such that compile times were a major concern. One notable early optimizing compiler was the IBM FORTRAN H compiler of the late 1960s.[3]:?737? Another of the earliest and important optimizing compilers, that pioneered several advanced techniques, was that for BLISS (1970), which was described in The Design of an Optimizing Compiler (1975).[3]:?740,?779? By the late 1980s, optimizing compilers were sufficiently effective that programming in assembly language declined. This co-evolved with the development of RISC chips and advanced processor features such as superscalar processors, out-of-order execution, and speculative execution, which were designed to be targeted by optimizing compilers rather than by human-written assembly code.[citation needed]

List of static code analyses

edit

See also

edit

References

edit
  1. ^ Godbolt, Matt (November 12, 2019). "Optimizations in C++ Compilers". ACM Queue. Vol. 17, no. 5.
  2. ^ "Lecture 15: NP-completeness, Optimization and Separation" (PDF). IE 511: Integer Programming, Spring 2021.
  3. ^ a b c d e f g h i j k Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (1986). Compilers: Principles, Techniques, and Tools. Reading, Massachusetts: Addison-Wesley. ISBN 0-201-10088-6.
  4. ^ a b Cooper, Keith D.; Torczon, Linda (2003) [2025-08-07]. Engineering a Compiler. Morgan Kaufmann. pp. 404, 407. ISBN 978-1-55860-698-2.
  5. ^ a b c Goss, Clinton F. (August 2013) [First published June 1986]. Machine Code Optimization – Improving Executable Object Code (PDF) (Ph.D. dissertation). Vol. Computer Science Department Technical Report #246. Courant Institute, New York University. arXiv:1308.4815. Bibcode:2013arXiv1308.4815G. Archived (PDF) from the original on 2025-08-07. Retrieved 22 Aug 2013.
  6. ^ A Programmer's Introduction to IBM System/360 Assembly Language (PDF). IBM. p. 42. GC20-1645-5.
  7. ^ "GCC – Machine-Dependent Options". GNU Project.
  8. ^ "RISC vs. CISC". cs.stanford.edu. Retrieved 2025-08-07.
  9. ^ James Gosling; Bill Joy; Guy Steele. "17 Threads and Locks". The Java Language Specification (1.0 ed.). 17.8 Prescient Store Actions.
  10. ^ Muchnick, Steven; Muchnick and Associates (15 August 1997). Advanced Compiler Design Implementation. Morgan Kaufmann. pp. 329–. ISBN 978-1-55860-320-2. constant folding.
  11. ^ Wegman, Mark N.; Zadeck, F. Kenneth (April 1991). "Constant Propagation with Conditional Branches" (PDF). ACM Transactions on Programming Languages and Systems. 13 (2): 181–210. doi:10.1145/103135.103136.
  12. ^ Click, Clifford; Cooper, Keith. (March 1995). "Combining Analyses, Combining Optimizations" (PDF). ACM Transactions on Programming Languages and Systems. 17 (2): 181–196. doi:10.1145/201059.201061.
  13. ^ a b Cx51 Compiler Manual, version 09.2001, p. 155, Keil Software Incorporated.
  14. ^ Dewar, Robert B. K.; Golumbic, Martin Charles; Goss, Clinton F. (August 2013) [October 1979]. MICRO SPITBOL. Computer Science Department Technical Report. Vol. 11. Courant Institute of Mathematical Sciences. arXiv:1308.6096. Bibcode:2013arXiv1308.6096D.
  15. ^ Golumbic, Martin Charles; Dewar, Robert B. K.; Goss, Clinton F. (1980). "Macro Substitutions in MICRO SPITBOL – a Combinatorial Analysis". Proceedings of the 11th Southeastern Conference on Combinatorics, Graph Theory and Computing, Congressus Numerantium, Utilitas Math., Winnipeg, Canada. 11th Southeastern Conference on Combinatorics, Graph Theory and Computing. Vol. 29. pp. 485–495.
  16. ^ Glazunov, N. M. (November 25, 2012). "Foundations of Scientific Research". arXiv:1212.1651 [cs.OH].
  17. ^ Guelton, Serge (August 5, 2019). "Customize the compilation process with Clang: Optimization options". Red Hat.
  18. ^ Evans, Michael (December 1982). "Software engineering for the Cobol environment". Communications of the ACM. 25 (12): 874–882. doi:10.1145/358728.358732. Retrieved 2025-08-07.
  19. ^ Sun, Chengnian; et al. (July 18–20, 2016). "Toward understanding compiler bugs in GCC and LLVM". Proceedings of the 25th International Symposium on Software Testing and Analysis. Issta 2016. pp. 294–305. doi:10.1145/2931037.2931074. ISBN 9781450343909. S2CID 8339241.
  20. ^ Schilling, Jonathan L. (August 1993). "Fail-safe programming in compiler optimization". ACM SIGPLAN Notices. 28 (8): 39–42. doi:10.1145/163114.163118. S2CID 2224606.
edit
二甲双胍缓释片什么时候吃 宫颈口大是什么原因 舒五行属性是什么 seifini是什么牌子 转氨酶高吃什么
学分是什么 站桩有什么好处 尿有泡沫是什么原因 爱打哈欠是什么原因 臭屁多是什么原因
特应性皮炎是什么意思 h皮带是什么牌子 拒服兵役是什么意思 手突然抽搐是什么原因 总是抽筋是什么原因
8宫代表什么 成熟是什么意思 乳腺囊肿吃什么药 放疗什么意思 早上起来后背疼是什么原因
扁平疣是什么原因造成的onlinewuye.com 地球什么时候毁灭hcv8jop3ns7r.cn 人流后吃什么最补子宫hcv9jop1ns7r.cn 胸小是什么原因hcv9jop1ns6r.cn 早上八点半是什么时辰hcv7jop9ns1r.cn
拼图用什么软件hcv8jop2ns6r.cn absolue是兰蔻的什么产品hanqikai.com 今年农历什么年hcv9jop2ns7r.cn 大礼是什么意思cl108k.com 烂嘴角是缺什么维生素hcv8jop4ns1r.cn
无量寿经讲的是什么hcv8jop5ns2r.cn 中药饮片是什么hcv7jop5ns6r.cn 乳糖酶是什么东西hcv7jop9ns6r.cn 肝钙化灶什么意思hcv9jop2ns0r.cn 宫外孕有什么症状hcv8jop1ns2r.cn
晚上睡觉脚抽筋是什么原因引起的hcv9jop1ns0r.cn 什么地流520myf.com 忽然心口疼是什么原因hcv9jop2ns9r.cn 骨龄是什么意思beikeqingting.com 什么是对称轴hcv9jop6ns8r.cn
百度