쓸데없는 생각말고 공부나 해라!

floating point bug (0)	2011.11.26
Exception, RuntimeException (0)	2011.11.26
MD5 구현(C++, Java, Js) (0)	2011.11.26
stack allocation (0)	2011.11.26
ArrayList, Vector (0)	2011.11.26

stack allocation

Java 2011. 11. 26. 20:23

결론을 이야기 하자면 컴파일러인지 JVM인지 몰라도 아무튼 알아서(implicit하게) 하므로 우리는 신경쓸 필요가 없다는 것.

http://www.ibm.com/developerworks/kr/library/j-jtp09275.html

Java theory and practice: 퍼포먼스

상상 이상의 메모리 할당 속도

문서 옵션

이 페이지를 이메일로 보내기

토론

난이도 : 중급

Brian Goetz, Principal Consultant, Quiotix

2005 년 9 월 27 일

자바언어는 퍼포먼스와 관련하여 상당한 논쟁의 대상이 되어왔다. 이 가운데 일부는 인정되어 질 수 있다하더라도, 게시판이나 뉴스그룹에 게시된 상당수의 기사는 자바가상머신(JVM, Java Virtual Machine)이 어떻게 작동하는가를 제대로 이해하지 못함으로 인해 제기된 것들이다. 이번 달의Java theory and practice에서는 JVM 상에서의 메모리할당이 늦다는 잘못된 인식에 대해 살펴보고자 한다.

깜짝 퀴즈: 자바와 C/C++ 중, 할당 퍼포먼스가 더 빠른 것은? 놀라지 말라. 현재 JVM의 할당 퍼포먼스는 최고의 퍼포먼스를 보이는 malloc 구현 보다 훨씬 더 좋다. HotSpot 1.4.2 및 이후 버전에서 new Object()의 일반 코드 경로는 약 10개의 머신 명령어(출처: Sun, 참고자료)가 필요하다면, malloc 구현은 한 호출당 평균 60개에서 100개의 명령어가 필요하다. (Detlefs, 참고자료). 할당 퍼포먼스는 전체 퍼포먼스에 있어 중요한 요소이다. Perl과 Ghostscript 같은 실제 C와 C++ 프로그램들에서 malloc과 free의 실행 시간은 총 실행 시간의 20%-30%를 차지한다. 일반 자바 애플리케이션의 할당 및 가비지 컬렉션 오버헤드 보다 훨씬 많은 수치이다. (Zorn, 참고자료).

가비지 컬렉션

"가비지 컬렉션은 직접적인 메모리 관리만큼 효율적이지 못하다." 같은 문장을 찾기 위해 블로그나 Slashdot 게시판을 볼 필요가 없다. 그 문장도 맞는다. 능동적인 메모리 관리는 그렇게 빠르지는 않지만, 가끔은 빠를 때도 있다. malloc/free 접근방식은 메모리 블록을 한꺼번에 처리하는 반면, 가비지 컬렉션은 메모리 관리를 일괄로 처리한다.

하루종일 먼지를 하나씩 집어내는 것 보다 한 개의 큰 쓰레기 덩어리를 치우는 것이 쉽다는 이 “그럴듯한 논쟁”은 한 데이터에 기반 한다. 한 연구(Zorn, 참고자료)에서는 수 많은 일반적인 C++ 애플리케이션에서 malloc을 보수적인 Boehm-Demers-Weiser (BDW) 가비지 컬렉터로 대체했다는 것을 밝혀냈다. 결국 이러한 많은 프로그램들의 속도가 빨라졌다. (BDW는 보수적이고 움직임이 없는 가비지 컬렉터로서 할당을 최적화하는 기능을 상당히 많이 제한하고 메모리의 지역성을 향상시킨다. JVM에서 사용되는 것과 같은 재배치 컬렉터들은 더 빠르게 수행할 수 있다.)

JVM에서의 할당이 언제나 빠른 것은 아니다. 초기 JVM은 정말이지 할당이나 가비지 컬렉션 퍼포먼스나 형편없었다. 바로 여기에서 온갖 미신들이 생겨난 것 같다. 초기에는 "할당이 느리다"는 많은 충고를 받았다. 사실이 그랬으니까. 초기 JVM에서는 다른 것도 마찬가지였다. 퍼포먼스 고수들은 객체 풀링(object pooling)처럼 할당을 피하는 다양한 트릭을 사용했다. (객체 풀링은 모든 경우에 있어서 심각한 퍼포먼스의 손실을 초래한다.) 하지만 JDK 1.0 이후 많은 일들이 일어났다. JDK 1.2에서 ‘generational collector’를 도입함으로서 할당 방식을 단순화 했다. 그리고 퍼포먼스도 높였다.

세대형(Generational) 가비지 컬렉션

세대형(generational) 가비지 컬렉터는 힙을 여러 세대(generation)들로 나눈다. 대부분의 JVM은 두 개의 제너레이션("young" 제너레이션과 "old" 제너레이션)을 사용한다. 객체들은 "young" 제너레이션에 할당된다. 이들이 가비지 컬렉션을 거치고도 살아있다면 "장수(long lived)"한 것으로 간주되고 "old" 제너레이션으로 올라간다.

HotSpot이 세 가지의 "young" 제너레이션 컬렉터(직렬 카피, 병렬 카피, parallel scavenge)를 제공한다. 이들은 모두 "카피(copying)"컬렉터의 형태이고 일반적으로 중요한 특성을 갖고 있다. 카피 컬렉터는 메모리 공간을 반으로 나누고 한번에 반만 사용한다. 처음에는, 사용되는 반쪽이 하나의 큰 메모리 블록을 형성한다. 할당자는 나머지 사용중이지 않는 N 바이트의 부분을 리턴함으로서 할당 요청을 수행하면서 "유휴(free)" 파트에서 "사용된(used)" 파트로 분리하는 포인터를 옮긴다. (Listing 1). 사용중인 공간이 채워지면 가비지 컬렉터는 모든 살아있는 객체들(쓰레기가 아닌 것)을 다른 반쪽의 바닥에 복사하고 또 다른 반쪽에서 할당을 시작한다.

Listing 1. 할당자 작동

void *malloc(int n) {

if (heapTop - heapStart < n)

doGarbageCollection();

void *wasStart = heapStart;

heapStart += n;

return wasStart;

}

위 코드를 보면 복사 컬렉터가 그와 같이 빠른 할당을 실행하는지 이유를 알 수 있다. 새로운 객체의 할당은 단지 힙에 충분한 공간이 남아있는지를 확인하는 것이며 만약 남아있다면 포인터를 교체하는 것에 지나지 않는다. 그저 힙에서 N 만큼만 포착하여 수행한다.

비할당(deallocation)

할당이 메모리 관리의 반을 차지한다면 비할당은 나머지 반에 해당한다. 대부분의 객체의 경우 직접적인 가비지 컬렉션 비용은 전혀 없다. 이는 컬렉터를 카피하는 것이 죽은 객체를 방문하거나 복사할 필요가 없기 때문에 오직 살아있는 것만 복사하면 된다. 따라서 할당 후에 쓰레기가 되는 객체들은 컬렉션 사이클에 어떤 워크로드도 주지 않는다

전형적인 객체 지향 프로그램에서 대다수의 객체들은 "일찍 죽는다." 이들은 할당 후에 바로 쓰레기가 된다는 의미이다. (이 속성을 "generational hypothesis" 이라고 하며, 많은 객체 지향 언어에서 사실로 판명되었다.) 따라서 할당이 빨라질 뿐더러 대부분의 객체들의 경우 비할당은 무료이다.

쓰레드-로컬 할당

할당자가 구현된다면(Listing 1) 공유된 heapStart 필드는 빠르게 중요한 동시성 병목현상이 된다. 모든 할당이 이 필드를 보호하는 락(lock)을 획득하기 때문이다. 이러한 문제를 피하기 위해 대부분의 JVM은 쓰레드-로컬 할당 블록(thread-local allocation blocks)을 사용한다. 여기에서 각 쓰레드는 힙에서 큰 메모리 청크를 할당하고 쓰레드-로컬 블록 밖에서는 보다 작은 할당 요청들을 처리한다. 결과적으로 쓰레드가 공유된 힙 락(heap lock)을 획득해야 하는 회수가 줄어들고 병행성이 향상된다. (전통적인 malloc 구현에서는 이러한 문제를 다루는 것이 더 어렵고 비용도 많이 든다. 쓰레드 지원과 가비지 컬렉션을 이 플랫폼에 구현하면 시너지 효과를 누릴 수 있다.)

위로

스택 할당

C++은 힙 또는 스택에 객체를 할당할 수 있는 선택권을 준다. 스택 기반 할당은 보다 효율적이다. 할당은 더 수월해지고 비할당에 드는 노력은 거의 없다. 이 언어는 객체 라이프 사이클을 분리하여 객체를 자유롭게 하는 것을 잊을 위험성을 줄여준다. 한편 C++에서 여러분은 스택 기반 객체를 퍼블리시 하거나 레퍼런스를 공유할 때 신중해야 한다. 스택 기반 객체들은 스택 프레임이 풀릴 때 자동으로 자유가 되기 때문에 포인터를 따라다니게 된다.

스택 기반 할당의 또 다른 장점은 훨씬 캐시 친화적이란 점이다. 현대적 프로세서에서 캐시 소실 비용은 막대하다. 따라서 언어와 런타임이 프로그램이 더 나은 데이터를 로컬에서 얻을 수 있도록 도와준다면 퍼포먼스는 향상될 것이다. 이 스택의 상단에는 언제는 가장 "중요한" 캐시가 메모리가 놓이기 마련이다. 반면 힙의 상단에는 언제나 " 중요하지 않은" (오랫동안 사용되지 않은 메모리) 것이 놓인다. 결과적으로 힙에 객체를 할당하는 것은 스택에 객체를 할당하는 것 보다 더 많은 캐시 손실을 수반하게 될 것이다.

더욱이 객체를 힙에 할당할 때 캐시 손실은 특별히 중첩된 메모리 인터랙션을 갖고 있다. 힙에서 메모리를 할당할 때 그 메모리의 컨텐츠는 쓰레기이다. 이미 캐시에는 없는 힙에 메모리 블록을 할당하면 그 메모리의 컨텐츠가 캐시로 들어가는 동안 실행이 되지 않는다. 그렇게 될 경우 여러분은 아마도 그 값들을 겹쳐 쓰게 될 것이고 메모리 액티비티가 많이 낭비되는 결과를 초래한다. (Azul의 Vega 같은 프로세서에는 힙 할당을 촉진하는 하드웨어 지원이 포함된다.)

Escape 분석

자바는 객체를 스택에 명확히 할당하는 방법을 제공하지는 않지만 이렇다고 해서 JVM에서 스택 할당을 사용할 수 없는 것은 아니다. JVM은 Escape 분석 기술을 사용한다. 특정 객체들이 전체 수명기간 동안 하나의 쓰레드에 국한되어있고 그 수명은 주어진 스택 프레임의 수명을 따라간다. 그와 같은 객체들은 힙 대신 스택에 안전하게 할당될 수 있다. 더 나은 것은 작은 객체들의 경우 JVM은 할당을 완전히 최적화 할 수 있으며 객체의 필드를 레지스터로 간단히 끌어올릴 수 있다.

Listing 2는 Escape 분석을 사용하여 힙 할당을 최적화 하는 방법을 보여준다. Component.getLocation() 메소드는 컴포넌트 위치의 방어 카피를 만들어서 콜러가 갑작스럽게 컴포넌트의 실제 위치를 변경할 수 없도록 한다. getDistanceFrom()을 먼저 호출하면 다른 컴포넌트의 위치를 알 수 있다. 여기에는 객체 할당이 포함되고 getLocation()에 의해 리턴된 Point의 x와 y 필드를 사용하여 두 컴포넌트 간 거리를 계산한다.

Listing 2. 방어 카피 방식

public class Point {

private int x, y;

public Point(int x, int y) {

this.x = x; this.y = y;

}

public Point(Point p) { this(p.x, p.y); }

public int getX() { return x; }

public int getY() { return y; }

}

public class Component {

private Point location;

public Point getLocation() { return new Point(location); }

public double getDistanceFrom(Component other) {

Point otherLocation = other.getLocation();

int deltaX = otherLocation.getX() - location.getX();

int deltaY = otherLocation.getY() - location.getY();

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

getLocation() 메소드는 콜러가 리턴한 Point로 어떤 일을 수행하는지를 모른다. 이것을 컬렉션에 두어 getLocation()이 방어적으로 코딩 되도록 한다. 같은 레퍼런스를 포함하고 있을 수 있다. 하지만 이 예제에서 getDistanceFrom()은 이런 일을 하지 않는다. 이것은 그저 Point를 단기간 사용하다가 버린다. 너무 완벽한 객체의 소비처럼 보인다.

똑똑한 JVM은 어떤 일이 진행되는지를 볼 수 있고 방어 카피의 할당을 최적화 할 수 있다. getX()와 getY()로의 호출 처럼 getLocation()에 대한 호출이 인라이닝 된다. 결과적으로 getDistanceFrom()은 Listing 3 처럼 작동한다.

Listing 3. getDistanceFrom()에 인라이닝 최적화 적용 결과

public double getDistanceFrom(Component other) {

Point otherLocation = new Point(other.x, other.y);

int deltaX = otherLocation.x - location.x;

int deltaY = otherLocation.y - location.y;

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

이 부분에서 Escape 분석은 첫 번째 라인에 할당된 객체가 기본 블록에서 결코 벗어나지 않고 getDistanceFrom()이 other 컴포넌트의 상태를 절대로 변경하지 않는다. (escape에 의해 레퍼런스는 힙에 저장되지 않고 또는 카피를 보유하고 있는 알려지지 않는 코드로 전달되지 않는다.) Point가 완전히 쓰레드-로컬이고 라이프타임이 이것이 할당된 기본 블록에 종속되면 이것은 스택 할당 또는 완전히 최적화된 것이다.(Listing 4)

Listing 4. getDistanceFrom()의 할당 최적화

public double getDistanceFrom(Component other) {

int tempX = other.x, tempY = other.y;

int deltaX = tempX - location.x;

int deltaY = tempY - location.y;

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

캡슐화와 방어 카피에서 오는 안전성을 확보하는 동시에 모든 필드들이 공개되는 것 같은 퍼포먼스를 얻을 수 있다.

Mustang에서의 Escape 분석

Escape 분석은 꽤 오랫동안 논의되어온 최적화이고 현재 구현인 Mustang (Java SE 6)이 Escape 분석을 수행하고, 힙 할당을 스택 할당으로 변환한다. 몇몇 할당을 제거하기 위해 Escape 분석을 사용하면 할당 시간이 빨라지고 메모리 풋프린트도 줄어들며 캐시 손실도 적어진다. 더욱이 할당을 최적화하면 가비지 컬렉터로의 압력을 줄여서 컬렉션이 좀 덜 실행된다.

Escape 분석은 소스 코드에서 이를 수행하는 것이 현실적이지 않은 곳에서 조차 스택 할당의 기회를 찾을 수 있다. 언어가 이 옵션을 제공하더라도 특정 할당이 최적화 되는지의 여부는 객체 리턴 메소드의 결과가 실제로 특별한 코드 경로에 사용되는 방법에 의존한다 getLocation()에서 리턴된 Point는 어떤 경우든 스택 할당에는 적합하지 않다. 하지만 일단 JVM이 getLocation()을 인라인하면 각 호출을 개별적으로 최적화 할 수 있다.

위로

결론

JVM이 스택 할당과 힙 할당을 선택할 수 있도록 함으로서 프로그래머가 스택에 할당할 지 힙에 할당할지를 고민하지 않아도 되고 스택 할당의 퍼포먼스 효과를 누릴 수 있다.

위로

참고자료

교육

Java theory and practice: Urban performance legends

Escape analysis for Java

Garbage Collection in the HotSpot JVM

A brief history of garbage collection

Garbage collection in the Java Virtual Machine

Memory allocation costs in large C and C++ programs (Detlefs, Dosser, and Zorn) (pdf file)

The measured cost of conservative garbage collection (Benjamin Zorn) (pdf file) including the BDW collector.

Java theory and practice

The Java technology zone

제품 및 기술 얻기

The Boehm-Demers-Weiser conservative garbage collector for C

토론

Participate in the discussion forum.

developerWorks blogs

위로

필자소개

Brian Goetz, 컨설턴트, Quiotix

원문 :

http://www.ibm.com/developerworks/java/library/j-jtp09275.html

Java theory and practice: Urban performance legends, revisited

Allocation is faster than you think, and getting faster

Brian Goetz (brian@quiotix.com), Principal Consultant, Quiotix

Summary: The Java™ language is the target of a lot of abuse for performance. And while some of it may well be deserved, a tour of message board and newsgroup postings on the subject shows that there is a great deal of misunderstanding about how a Java Virtual Machine (JVM) actually works. In this month's Java theory and practice, Brian Goetz pokes some holes in the oft-repeated performance myth of slow allocation in JVMs.

View more content in this series

Tag this!

Update My dW interests (Log in | What's this?)Skip to help for Update My dW interests

Date: 27 Sep 2005

Level: Intermediate

Also available in: Russian Japanese

Activity: 47256 views

Comments: 0 (View | Add comment - Sign in)

Average rating (564 votes)

Rate this article

Pop quiz: Which language boasts faster raw allocation performance, the Java language, or C/C++? The answer may surprise you -- allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions (data provided by Sun; see Resources), whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call (Detlefs, et. al.; seeResources). And allocation performance is not a trivial component of overall performance -- benchmarks show that many real-world C and C++ programs, such as Perl and Ghostscript, spend 20 to 30 percent of their total execution time in malloc and free-- far more than the allocation and garbage collection overhead of a healthy Java application (Zorn; see Resources).

Go ahead, make a mess

You don't have to search through too many blogs or Slashdot postings to find confidently worded statements like "Garbage collection will never be as efficient as direct memory management." And, in a way, those statements are right -- dynamic memory management is not as fast -- it's often considerably faster. The malloc/free approach deals with blocks of memory one at a time, whereas the garbage collection approach tends to deal with memory management in large batches, yielding more opportunities for optimization (at the cost of some loss in predictability).

This "plausibility argument" -- that it is easier to clean up a mess in one big batch than to pick up individual pieces of dust throughout the day -- is borne out by the data. One study (Zorn; see Resources) also measured the effects of replacing mallocwith the conservative Boehm-Demers-Weiser (BDW) garbage collector for a number of common C++ applications, and the result was that many of these programs exhibited speedups when running with a garbage collector instead of a traditional allocator. (And BDW is a conservative, nonmoving garbage collector, which greatly constrains its ability to optimize allocation and reclamation and improve memory locality; exact relocating collectors such as those used in JVMs can do far better.)

Allocation in JVMs was not always so fast -- early JVMs indeed had poor allocation and garbage collection performance, which is almost certainly where this myth got started. In the very early days, we saw a lot of "allocation is slow" advice -- because it was, along with everything else in early JVMs -- and performance gurus advocated various tricks to avoid allocation, such as object pooling. (Public service announcement: Object pooling is now a serious performance loss for all but the most heavyweight of objects, and even then it is tricky to get right without introducing concurrency bottlenecks.) However, a lot has happened since the JDK 1.0 days; the introduction of generational collectors in JDK 1.2 has enabled a much simpler approach to allocation, greatly improving performance.

Generational garbage collection

A generational garbage collector divides the heap into multiple generations; most JVMs use two generations, a "young" and an "old" generation. Objects are allocated in the young generation; if they survive past a certain number of garbage collections, they are considered "long lived" and get promoted into the old generation.

While HotSpot offers a choice of three young-generation collectors (serial copying, parallel copying, and parallel scavenge), they are all forms of "copying" collectors and have several important characteristics in common. A copying collector divides the memory space in half and only uses one half at a time. Initially, the half that is in use forms one big block of available memory; the allocator satisfies allocation requests by returning the first N bytes of the portion that is not in use, moving a pointer that separates the "used" part from the "free" part, as shown by the pseudocode in Listing 1. When the half in use fills up, the garbage collector copies all the live objects (those that are not garbage) to the bottom of the other half (compacting the heap), and starts allocating from the other half instead.

Listing 1. Behavior of allocator in the presence of a copying collector

void *malloc(int n) {

if (heapTop - heapStart < n)

doGarbageCollection();

void *wasStart = heapStart;

heapStart += n;

return wasStart;

}

From this pseudocode, you can see why copying collectors enables such fast allocation -- allocating a new object is nothing more than checking to see if there's enough space left in the heap and bumping a pointer if there is. No searching free lists, best-fit, first-fit, lookaside lists -- just grab the first N bytes out of the heap and you're done.

What about deallocation?

But allocation is only half of memory management -- deallocation is the other half. It turns out that for most objects, the direct garbage collection cost is -- zero. This is because a copying collector does not need to visit or copy dead objects, only live ones. So objects that become garbage shortly after allocation contribute no workload to the collection cycle.

It turns out that the vast majority of objects in typical object-oriented programs (between 92 and 98 percent according to various studies) "die young," which means they become garbage shortly after they are allocated, often before the next garbage collection. (This property is called the generational hypothesis and has been empirically tested and found to be true for many object-oriented languages.) Therefore, not only is allocation fast, but for most objects, deallocation is free.

Thread-local allocation

If the allocator were truly implemented as shown in Listing 1, the shared heapStart field would quickly become a significant concurrency bottleneck, as every allocation would involve acquiring the lock that guards this field. To avoid this problem, most JVMs use thread-local allocation blocks, where each thread allocates a larger chunk of memory from the heap and services small allocation requests sequentially out of that thread-local block. As a result, the number of times a thread has to acquire the shared heap lock is greatly reduced, improving concurrency. (It is more difficult and costly to address this problem in the context of a traditional malloc implementation; building both thread support and garbage collection into the platform facilitates synergies such as this one.)

Stack allocation

C++ offers programmers a choice of allocating objects on the heap or on the stack. Stack-based allocation is more efficient: allocation is cheaper, deallocation costs are truly zero, and the language provides assistance in demarcating object lifecycles, reducing the risk of forgetting to free the object. On the other hand, in C++, you need to be very careful when publishing or sharing references to stack-based objects because stack-based objects are automatically freed when the stack frame is unwound, leading to dangling pointers.

Another advantage of stack-based allocation is that it is far more cache-friendly. On modern processors, the cost of a cache miss is significant, so if the language and runtime can help your program achieve better data locality, performance will be improved. The top of the stack is almost always "hot" in the cache, whereas the top of the heap is almost always "cold" (because it has likely been a long time since that memory was used). As a result, allocating an object on the heap will likely entail more cache misses than allocating that object on the stack.

Worse, a cache miss when allocating an object on the heap has a particularly nasty memory interaction. When allocating memory from the heap, the contents of that memory are garbage -- whatever bits happen to be left over from the last time that memory was used. If you allocate a block of memory on the heap that is not already in the cache, execution will stall while the contents of that memory are brought into the cache. Then, you will immediately overwrite those values that you paid to bring into the cache with zeros or other initial values, resulting in a lot of wasted memory activity. (Some processors, such as Azul's Vega, include hardware support for accelerating heap allocation.)

Escape analysis

The Java language does not offer any way to explicitly allocate an object on the stack, but this fact doesn't prevent JVMs from still using stack allocation where appropriate. JVMs can use a technique called escape analysis, by which they can tell that certain objects remain confined to a single thread for their entire lifetime, and that lifetime is bounded by the lifetime of a given stack frame. Such objects can be safely allocated on the stack instead of the heap. Even better, for small objects, the JVM can optimize away the allocation entirely and simply hoist the object's fields into registers.

Listing 2 shows an example where escape analysis can be used to optimize away heap allocation. TheComponent.getLocation() method makes a defensive copy of the component location, so that a caller cannot accidentally change the actual location of the component. Calling getDistanceFrom() first gets the location of the other component, which involves an object allocation, and then uses the x and y fields of the Point returned by getLocation() to compute the distance between two components.

Listing 2. Typical defensive copying approach to returning a compound value

public class Point {

private int x, y;

public Point(int x, int y) {

this.x = x; this.y = y;

}

public Point(Point p) { this(p.x, p.y); }

public int getX() { return x; }

public int getY() { return y; }

}

public class Component {

private Point location;

public Point getLocation() { return new Point(location); }

public double getDistanceFrom(Component other) {

Point otherLocation = other.getLocation();

int deltaX = otherLocation.getX() - location.getX();

int deltaY = otherLocation.getY() - location.getY();

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

The getLocation() method does not know what its caller is going to do with the Point it returns; it might retain a reference to it, such as putting it in a collection, so getLocation() is coded defensively. However in this example, getDistanceFrom() is not going to do this; it is just going to use the Point for a short time and then discard it, which seems like a waste of a perfectly good object.

A smart JVM can see what is going on and optimize away the allocation of the defensive copy. First, the call to getLocation() will be inlined, as will the calls to getX() and getY(), resulting in getDistanceFrom() effectively behaving like Listing 3.

Listing 3. Pseudocode describing the result of applying inlining optimizations to getDistanceFrom()

public double getDistanceFrom(Component other) {

Point otherLocation = new Point(other.x, other.y);

int deltaX = otherLocation.x - location.x;

int deltaY = otherLocation.y - location.y;

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

At this point, escape analysis can show that the object allocated in the first line never escapes from its basic block and thatgetDistanceFrom() never modifies the state of the other component. (By escape, we mean that a reference to it is not stored into the heap or passed to unknown code that might retain a copy.) Given that the Point is truly thread-local and its lifetime is known to be bounded by the basic block in which it is allocated, it can be either stack-allocated or optimized away entirely, as shown in Listing 4.

Listing 4. Pseudocode describing the result of optimizing away allocation in getDistanceFrom()

public double getDistanceFrom(Component other) {

int tempX = other.x, tempY = other.y;

int deltaX = tempX - location.x;

int deltaY = tempY - location.y;

return Math.sqrt(deltaX*deltaX + deltaY*deltaY);

}

The result is that we get exactly the same performance as we would if all the fields were public while retaining the safety that encapsulation and defensive copying (among other safe coding techniques) give us.

Escape analysis in Mustang

Escape analysis is an optimization that has been talked about for a long time, and it is finally here -- the current builds of Mustang (Java SE 6) can do escape analysis and convert heap allocation to stack allocation (or no allocation) where appropriate. The use of escape analysis to eliminate some allocations results in even faster average allocation times, reduced memory footprint, and fewer cache misses. Further, optimizing away some allocations reduces pressure on the garbage collector and allows collection to run less often.

Escape analysis can find opportunities for stack allocation even where it might not be practical to do so in the source code, even if the language provided the option, because whether a particular allocation gets optimized away is determined based on how the result of an object-returning method is actually used in a particular code path. The Point returned from getLocation() may not be suitable for stack allocation in all cases, but once the JVM inlines getLocation(), it is free to optimize each invocation separately, offering us the best of both worlds: optimal performance with less time spent making low-level, performance-tuning decisions.

Conclusion

JVMs are surprisingly good at figuring out things that we used to assume only the developer could know. By letting the JVM choose between stack allocation and heap allocation on a case-by-case basis, we can get the performance benefits of stack allocation without making the programmer agonize over whether to allocate on the stack or on the heap.

Resources

Learn

Java theory and practice: Urban performance legends: Explores some performance myths and how they came about.

Escape analysis for Java: A paper from IBM research that was presented at OOPSLA '99 on implementing escape analysis.

A brief history of garbage collection: Comparison of various garbage collection approaches, including the copying approach used by HotSpot for the young generation.

Garbage collection in the Java Virtual Machine: A presentation from JavaOne 2003 which provides the data on the cost of allocation in HotSpot.

Memory allocation costs in large C and C++ programs (Detlefs, Dosser, and Zorn): Examines the cost of allocation in numerous C and C++ applications.

The measured cost of conservative garbage collection (Benjamin Zorn): Compares the performance of several allocators for C and C++, including the BDW collector.

The Java technology zone: Hundreds of articles about every aspect of Java programming.

Get products and technologies

The Boehm-Demers-Weiser conservative garbage collector for C: It is possible to use garbage collection in C; the BDW collector is a replacement for malloc() and free() that uses a garbage-collected heap.

Discuss

developerWorks blogs: Get involved in the developerWorks community.

About the author

Brian Goetz has been a professional software developer for over 18 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California, and he serves on several JCP Expert Groups. See Brian'spublished and upcoming articles in popular industry publications.

저작자표시 비영리 변경금지 (새창열림)

'Java' 카테고리의 다른 글

Exception, RuntimeException (0)	2011.11.26
bitmap class 구현 (0)	2011.11.26
MD5 구현(C++, Java, Js) (0)	2011.11.26
ArrayList, Vector (0)	2011.11.26
패키지 사용법 (0)	2011.11.26

Posted by 휘사마

이전 1 ··· 9 10 11 12 13 14 15 ··· 32 다음

쓸데없는 생각말고 공부나 해라!

bitmap class 구현

'Java' 카테고리의 다른 글

MD5 구현(C++, Java, Js)

'Java' 카테고리의 다른 글

stack allocation

'Java' 카테고리의 다른 글

카테고리

최근에 올라온 글

최근에 달린 댓글

티스토리툴바