W3Cschool
恭喜您成為首批注冊(cè)用戶
獲得88經(jīng)驗(yàn)值獎(jiǎng)勵(lì)
編寫:kesenhoo - 原文:http://developer.android.com/training/articles/smp.html
從Android 3.0開始,系統(tǒng)針對(duì)多核CPU架構(gòu)的機(jī)器做了優(yōu)化支持。這份文檔介紹了針對(duì)多核系統(tǒng)應(yīng)該如何編寫C,C++以及Java程序。這里只是作為Android應(yīng)用開發(fā)者的入門教程,并不會(huì)深入討論這個(gè)話題,并且我們會(huì)把討論范圍集中在ARM架構(gòu)的CPU上。
如果你并沒有時(shí)間學(xué)習(xí)整篇文章,你可以跳過前面的理論部分,直接查看實(shí)踐部分。但是我們并不建議這樣做。
SMP 的全稱是“Symmetric Multi-Processor”。 它表示的是一種雙核或者多核CPU的設(shè)計(jì)架構(gòu)。在幾年前,所有的Android設(shè)備都還是單核的。
大多數(shù)的Android設(shè)備已經(jīng)有了多個(gè)CPU,但是通常來說,其中一個(gè)CPU負(fù)責(zé)執(zhí)行程序,其他的CPU則處理設(shè)備硬件的相關(guān)事務(wù)(例如,音頻)。這些CPU可能有著不同的架構(gòu),運(yùn)行在上面的程序無法在內(nèi)存中彼此進(jìn)行溝通交互。
目前大多數(shù)售賣的Android設(shè)備都是SMP架構(gòu)的,這使得軟件開發(fā)者處理問題更加復(fù)雜。對(duì)于多線程的程序,如果多個(gè)線程執(zhí)行在不同的內(nèi)核上,這會(huì)使得程序更加容易發(fā)生race conditions。 更糟糕的是,基于ARM架構(gòu)的SMP比起x86架構(gòu)來說,更加復(fù)雜,更難進(jìn)行處理。那些在x86上測(cè)試通過的程序可能會(huì)在ARM上崩潰。
下面我們會(huì)介紹為何會(huì)這樣以及如何做才能夠使得你的代碼行為正常。
這里會(huì)快速并且簡(jiǎn)要的介紹這個(gè)復(fù)雜的主題。其中一些部分并不完整,但是并沒有出現(xiàn)錯(cuò)誤或者誤導(dǎo)。
查看文章末尾的進(jìn)一步閱讀可以了解這個(gè)主題的更多知識(shí)。
內(nèi)存一致性模型(Memory consistency models)通常也被叫做“memory models”,描述了硬件架構(gòu)如何確保內(nèi)存訪問的一致性。例如,如果你對(duì)地址A進(jìn)行了一個(gè)賦值,然后對(duì)地址B也進(jìn)行了賦值,那么內(nèi)存一致性模型就需要確保每一個(gè)CPU都需要知道剛才的操作賦值與操作順序。
這個(gè)模型通常被程序員稱為:順序一致性(sequential consistency), 請(qǐng)從文章末尾的進(jìn)一步閱讀查看Adve & Gharachorloo這篇文章。
如果你關(guān)注一段代碼在內(nèi)存中的讀寫操作,在sequentially-consistent的CPU架構(gòu)上,是按照期待的順序執(zhí)行的。It’s possible that the CPU is actually reordering instructions and delaying reads and writes, but there is no way for code running on the device to tell that the CPU is doing anything other than execute instructions in a straightforward manner. (We’re ignoring memory-mapped device driver I/O for the moment.)
To illustrate these points it’s useful to consider small snippets of code, commonly referred to as litmus tests. These are assumed to execute in program order, that is, the order in which the instructions appear here is the order in which the CPU will execute them. We don’t want to consider instruction reordering performed by compilers just yet.
Here’s a simple example, with code running on two threads:
Thread 1 Thread 2 A = 3 B = 5 reg0 = B reg1 = A
Thread 1 | Thread 2 |
---|---|
A = 3 B = 5 | reg0 = B reg1 = A |
In this and all future litmus examples, memory locations are represented by capital letters (A, B, C) and CPU registers start with “reg”. All memory is initially zero. Instructions are executed from top to bottom. Here, thread 1 stores the value 3 at location A, and then the value 5 at location B. Thread 2 loads the value from location B into reg0, and then loads the value from location A into reg1. (Note that we’re writing in one order and reading in another.)
Thread 1 and thread 2 are assumed to execute on different CPU cores. You should always make this assumption when thinking about multi-threaded code.
Sequential consistency guarantees that, after both threads have finished executing, the registers will be in one of the following states:
Registers | States |
---|---|
reg0=5, reg1=3 | possible (thread 1 ran first) |
reg0=0, reg1=0 | possible (thread 2 ran first) |
reg0=0, reg1=3 | possible (concurrent execution) |
reg0=5, reg1=0 | never |
To get into a situation where we see B=5 before we see the store to A, either the reads or the writes would have to happen out of order. On a sequentially-consistent machine, that can’t happen.
Most uni-processors, including x86 and ARM, are sequentially consistent. Most SMP systems, including x86 and ARM, are not.
調(diào)試內(nèi)存一致性(memory consistency)的問題非常困難。如果內(nèi)存柵欄(memory barrier)導(dǎo)致一些代碼讀取到陳舊的數(shù)據(jù),你將無法通過調(diào)試器檢查內(nèi)存dumps文件來找出原因。By the time you can issue a debugger query, the CPU cores will have all observed the full set of accesses, and the contents of memory and the CPU registers will appear to be in an “impossible” state.
我們沒有討論過Java語言的一些相關(guān)特性,因此我們首先來簡(jiǎn)要的看下那些特性。
“synchronized”關(guān)鍵字提供了Java一種內(nèi)置的鎖機(jī)制。每一個(gè)對(duì)象都有一個(gè)相對(duì)應(yīng)的“monitor”,這個(gè)監(jiān)聽器可以提供互斥的訪問。
“synchronized”代碼段的實(shí)現(xiàn)機(jī)制與自旋鎖(spin lock)有著相同的基礎(chǔ)結(jié)構(gòu): 他們都是從獲取到CAS開始,以釋放CAS結(jié)束。這意味著編譯器(compilers)與代碼優(yōu)化器(code optimizers)可以輕松的遷移代碼到“synchronized”代碼段中。一個(gè)實(shí)踐結(jié)果是:你不能判定synchronized代碼段是執(zhí)行在這段代碼下面一部分的前面,還是這段代碼上面一部分的后面。更進(jìn)一步,如果一個(gè)方法有兩個(gè)synchronized代碼段并且鎖住的是同一個(gè)對(duì)象,那么在這兩個(gè)操作的中間代碼都無法被其他的線程所檢測(cè)到,編譯器可能會(huì)執(zhí)行“鎖粗化lock coarsening”并且把這兩者綁定到同一個(gè)代碼塊上。
另外一個(gè)相關(guān)的關(guān)鍵字是“volatile”。在Java 1.4以及之前的文檔中是這樣定義的:volatile聲明和對(duì)應(yīng)的C語言中的一樣可不靠。從Java 1.5開始,提供了更有力的保障,甚至和synchronization一樣具備強(qiáng)同步的機(jī)制。
volatile的訪問效果可以用下面這個(gè)例子來說明。如果線程1給volatile字段做了賦值操作,線程2緊接著讀取那個(gè)字段的值,那么線程2是被確保能夠查看到之前線程1的任何寫操作。更通常的情況是,任何線程對(duì)那個(gè)字段的寫操作對(duì)于線程2來說都是可見的。實(shí)際上,寫volatile就像是釋放件監(jiān)聽器,讀volatile就像是獲取監(jiān)聽器。
非volatile的訪問有可能因?yàn)檎疹檝olatile的訪問而需要做順序的調(diào)整。例如編譯器可能會(huì)往上移動(dòng)一個(gè)非volatile加載操作,但是不會(huì)往下移動(dòng)。Volatile之間的訪問不會(huì)因?yàn)楸舜硕龀鲰樞虻恼{(diào)整。虛擬機(jī)會(huì)注意處理如何的內(nèi)存柵欄(memory barriers)。
當(dāng)加載與保存大多數(shù)的基礎(chǔ)數(shù)據(jù)類型,他們都是原子的atomic, 對(duì)于long以及double類型的數(shù)據(jù)則不具備原子型,除非他們被聲明為volatile。即使是在單核處理器上,并發(fā)多線程更新非volatile字段值也還是不確定的。
下面是一個(gè)錯(cuò)誤實(shí)現(xiàn)的單步計(jì)數(shù)器(monotonic counter)的示例: (Java theory and practice: Managing volatility).
class Counter {
private int mValue;
public int get() {
return mValue;
}
public void incr() {
mValue++;
}
}
假設(shè)get()與incr()方法是被多線程調(diào)用的。然后我們想確保當(dāng)get()方法被調(diào)用時(shí),每一個(gè)線程都能夠看到當(dāng)前的數(shù)量。最引人注目的問題是mValue++實(shí)際上包含了下面三個(gè)操作。
如果兩個(gè)線程同時(shí)在執(zhí)行incr()
方法,其中的一個(gè)更新操作會(huì)丟失。為了確保正確的執(zhí)行++
的操作,我們需要把incr()
方法聲明為“synchronized”。這樣修改之后,這段代碼才能夠在單核多線程的環(huán)境中正確的執(zhí)行。
然而,在SMP的系統(tǒng)下還是會(huì)執(zhí)行失敗。不同的線程通過get()
方法獲取到得值可能是不一樣的。因?yàn)槲覀兪鞘褂猛ǔ5募虞d方式來讀取這個(gè)值的。我們可以通過聲明get()
方法為synchronized的方式來修正這個(gè)錯(cuò)誤。通過這些修改,這樣的代碼才是正確的了。
不幸的是,我們有介紹過有可能發(fā)生的鎖競(jìng)爭(zhēng)(lock contention),這有可能會(huì)傷害到程序的性能。除了聲明get()
方法為synchronized之外,我們可以聲明mValue
為“volatile”. (請(qǐng)注意incr()
必須使用synchronize) 現(xiàn)在我們知道volatile的mValue的寫操作對(duì)于后續(xù)的讀操作都是可見的。incr()
將會(huì)稍稍有點(diǎn)變慢,但是get()
方法將會(huì)變得更加快速。因此讀操作多于寫操作時(shí),這會(huì)是一個(gè)比較好的方案。(請(qǐng)參考AtomicInteger.)
下面是另外一個(gè)示例,和之前的C示例有點(diǎn)類似:
class MyGoodies {
public int x, y;
}
class MyClass {
static MyGoodies sGoodies;
void initGoodies() { // runs in thread 1
MyGoodies goods = new MyGoodies();
goods.x = 5;
goods.y = 10;
sGoodies = goods;
}
void useGoodies() { // runs in thread 2
if (sGoodies != null) {
int i = sGoodies.x; // could be 5 or 0
....
}
}
}
這段代碼同樣存在著問題,sGoodies = goods
的賦值操作有可能在goods
成員變量賦值之前被察覺到。如果你使用volatile
聲明sGoodies
變量,你可以認(rèn)為load操作為atomic_acquire_load()
,并且把store操作認(rèn)為是atomic_release_store()
。
(請(qǐng)注意僅僅是sGoodies
的引用本身為volatile
,訪問它的內(nèi)部字段并不是這樣的。賦值語句z = sGoodies.x
會(huì)執(zhí)行一個(gè)volatile load MyClass.sGoodies的操作,其后會(huì)伴隨一個(gè)non-volatile的load操作::sGoodies.x
。如果你設(shè)置了一個(gè)本地引用MyGoodies localGoods = sGoodies, z = localGoods.x
,這將不會(huì)執(zhí)行任何volatile loads.)
另外一個(gè)在Java程序中更加常用的范式就是臭名昭著的“double-checked locking”:
class MyClass {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized (this) {
if (helper == null) {
helper = new Helper();
}
}
}
return helper;
}
}
上面的寫法是為了獲得一個(gè)MyClass的單例。我們只需要?jiǎng)?chuàng)建一次這個(gè)實(shí)例,通過getHelper()
這個(gè)方法。為了避免兩個(gè)線程會(huì)同時(shí)創(chuàng)建這個(gè)實(shí)例。我們需要對(duì)創(chuàng)建的操作加synchronize機(jī)制。然而,我們不想要為了每次執(zhí)行這段代碼的時(shí)候都為“synchronized”付出額外的代價(jià),因此我們僅僅在helper對(duì)象為空的時(shí)候加鎖。
在單核系統(tǒng)上,這是不能正常工作的。JIT編譯器會(huì)破壞這件事情。請(qǐng)查看4)Appendix的“‘Double Checked Locking is Broken’ Declaration”獲取更多的信息, 或者是Josh Bloch’s Effective Java書中的Item 71 (“Use lazy initialization judiciously”)。
在SMP系統(tǒng)上執(zhí)行這段代碼,引入了一個(gè)額外的方式會(huì)導(dǎo)致失敗。把上面那段代碼換成C的語言實(shí)現(xiàn)如下:
if (helper == null) {
// acquire monitor using spinlock
while (atomic_acquire_cas(&this.lock, 0, 1) != success)
;
if (helper == null) {
newHelper = malloc(sizeof(Helper));
newHelper->x = 5;
newHelper->y = 10;
helper = newHelper;
}
atomic_release_store(&this.lock, 0);
}
此時(shí)問題就更加明顯了: helper
的store操作發(fā)生在memory barrier之前,這意味著其他的線程能夠在store x/y之前觀察到非空的值。
你應(yīng)該嘗試確保store helper執(zhí)行在atomic_release_store()
方法之后。通過重新排序代碼進(jìn)行加鎖,但是這是無效的,因?yàn)橥弦苿?dòng)的代碼,編譯器可以把它移動(dòng)回原來的位置:在atomic_release_store()
前面。 (這里沒有讀懂,下次再回讀)
有2個(gè)方法可以解決這個(gè)問題:
下面的示例演示了使用volatile的2各重要問題:
class MyClass {
int data1, data2;
volatile int vol1, vol2;
void setValues() { // runs in thread 1
data1 = 1;
vol1 = 2;
data2 = 3;
}
void useValues1() { // runs in thread 2
if (vol1 == 2) {
int l1 = data1; // okay
int l2 = data2; // wrong
}
}
void useValues2() { // runs in thread 2
int dummy = vol2;
int l1 = data1; // wrong
int l2 = data2; // wrong
}
請(qǐng)注意useValues1()
,如果thread 2還沒有察覺到vol1
的更新操作,那么它也無法知道data1
或者data2
被設(shè)置的操作。一旦它觀察到了vol1
的更新操作,那么它也能夠知道data1的更新操作。然而,對(duì)于data2
則無法做任何猜測(cè),因?yàn)閟tore操作是在volatile store之后發(fā)生的。
useValues2()
使用了第2個(gè)volatile字段:vol2,這會(huì)強(qiáng)制VM生成一個(gè)memory barrier。這通常不會(huì)發(fā)生。為了建立一個(gè)恰當(dāng)?shù)摹癶appens-before”關(guān)系,2個(gè)線程都需要使用同一個(gè)volatile字段。在thread 1中你需要知道vol2是在data1/data2之后被設(shè)置的。(The fact that this doesn’t work is probably obvious from looking at the code; the caution here is against trying to cleverly “cause” a memory barrier instead of creating an ordered series of accesses.)
在C/C++中,使用pthread
操作,例如mutexes與semaphores。他們會(huì)使用合適的memory barriers,在所有的Android平臺(tái)上提供正確有效的行為。請(qǐng)確保正確這些技術(shù),例如在沒有獲得對(duì)應(yīng)的mutex的情況下賦值操作需要很謹(jǐn)慎。
避免直接使用atomic方法。如果locking與unlocking之間沒有競(jìng)爭(zhēng),locking與unlocking一個(gè)pthread mutex 分別需要一個(gè)單獨(dú)的atomic操作。如果你需要一個(gè)lock-free的設(shè)計(jì),你必須在開始寫代碼之前了解整篇文檔的要點(diǎn)。(或者是尋找一個(gè)已經(jīng)為SMP ARM設(shè)計(jì)好的庫(kù)文件)。
Be extremely circumspect with "volatile” in C/C++. It often indicates a concurrency problem waiting to happen.
In Java, the best answer is usually to use an appropriate utility class from the java.util.concurrent package. The code is well written and well tested on SMP.
Perhaps the safest thing you can do is make your class immutable. Objects from classes like String and Integer hold data that cannot be changed once the class is created, avoiding all synchronization issues. The book Effective Java, 2nd Ed. has specific instructions in “Item 15: Minimize Mutability”. Note in particular the importance of declaring fields “final" (Bloch).
If neither of these options is viable, the Java “synchronized” statement should be used to guard any field that can be accessed by more than one thread. If mutexes won’t work for your situation, you should declare shared fields “volatile”, but you must take great care to understand the interactions between threads. The volatile declaration won’t save you from common concurrent programming mistakes, but it will help you avoid the mysterious failures associated with optimizing compilers and SMP mishaps.
The Java Memory Model guarantees that assignments to final fields are visible to all threads once the constructor has finished — this is what ensures proper synchronization of fields in immutable classes. This guarantee does not hold if a partially-constructed object is allowed to become visible to other threads. It is necessary to follow safe construction practices.(Safe Construction Techniques in Java).
The pthread library and VM make a couple of useful guarantees: all accesses previously performed by a thread that creates a new thread are observable by that new thread as soon as it starts, and all accesses performed by a thread that is exiting are observable when a join() on that thread returns. This means you don’t need any additional synchronization when preparing data for a new thread or examining the results of a joined thread.
Whether or not these guarantees apply to interactions with pooled threads depends on the thread pool implementation.
In C/C++, the pthread library guarantees that any accesses made by a thread before it unlocks a mutex will be observable by another thread after it locks that same mutex. It also guarantees that any accesses made before calling signal() or broadcast() on a condition variable will be observable by the woken thread.
Java language threads and monitors make similar guarantees for the comparable operations.
The C and C++ language standards are evolving to include a sophisticated collection of atomic operations. A full matrix of calls for common data types is defined, with selectable memory barrier semantics (choose from relaxed, consume, acquire, release, acq_rel, seq_cst).
See the Further Reading section for pointers to the specifications.
While this document does more than merely scratch the surface, it doesn’t manage more than a shallow gouge. This is a very broad and deep topic. Some areas for further exploration:
@ThreadSafe
and @GuardedBy
(from net.jcip.annotations).The Further Reading section in the appendix has links to documents and web sites that will better illuminate these topics.
Copyright©2021 w3cschool編程獅|閩ICP備15016281號(hào)-3|閩公網(wǎng)安備35020302033924號(hào)
違法和不良信息舉報(bào)電話:173-0602-2364|舉報(bào)郵箱:jubao@eeedong.com
掃描二維碼
下載編程獅App
編程獅公眾號(hào)
聯(lián)系方式:
更多建議: