diff --git a/hw06/README.md b/hw06/README.md
index 43563ee..a5f0076 100644
--- a/hw06/README.md
+++ b/hw06/README.md
@@ -69,41 +69,241 @@ B. Random case: In this case, where blocks are mapped randomly to sectors, readi
| 4. | 32 | 1,024 | 32 | 2 | 16 | 23 | 4 | 5 |
#### 6.27
+A. In this case, set 1 contains two valid lines with a tag of `0x45`, `0x38` respectively. Since there are two valid lines in the set, eight addresses will hit. We can know from the below fields, these addresses have the binary form `0 1000 1010 01xx` or `0 0111 0000 01xx`.
+- CO: `??`
+- CI: `0b001`
+- CT: `0b01000101` or `0b00111000`
+
+Thus, the eight hex addresses that hit in set 1 are `0x0704`, `0x0705`, `0x0706`, `0x0707`, `0x08A4`, `0x08A5`, `0x08A6` and `0x08A7`.
+
+B. There are only one valid line in the set, and the same principle follows:
+- CO: `??`
+- CI: `0b110`
+- CT: `0b10010001`
+
+Thus, the four hex addresses that hit in set 6 are `0x1238`, `0x1239`, `0x123A` and `0x123B`.
#### 6.28
+A. In this case, set 2 contains zero valid lines. Since there are none valid lines in the set, no addresses will hit.
+
+B. In this case, set 4 contains two valid lines with a tag of `0xC7`, `0x05` accordingly. Since there are two valid lines in the set, eight addresses will hit. We can know from the below fields, these addresses have the binary form `1 1000 1111 00xx` or `0 0000 1011 00xx`.
+- CO: `??`
+- CI: `0b100`
+- CT: `0b11000111` or `0b00000101`
+
+Thus the eight hex addresses that hit in set 4 are `0x18F0`, `0x18F1`, `0x18F2`, `0x18F3`, `0x00B0`, `0x00B1`, `0x00B2` and `0x00B3`.
+
+C. In this case, set 5 contains one valid line with a tag of `0x71`. Since there are only one valid line in the set, four addresses will hit. We can know from the below fields, these addresses have the binary form `0 1110 0011 01xx`.
+- CO: `??`
+- CI: `0b101`
+- CT: `0b01110001`
+
+Thus, the four hex addresses that hit in set 5 are `0x0E34`, `0x0E35`, `0x0E36` and `0x0E37`.
+
+D. In this case, set 7 contains one valid lines with a tag of `0xDE`. Since there are only one valid line in the set, four addresses will hit. We can know from the below fields, these addresses have the binary form `1 1011 1101 11xx`.
+- CO: `??`
+- CI: `0b111`
+- CT: `0b11011110`
+
+Thus, the four hex addresses that hit in set 7 are `0x1BDC`, `0x1BDD`, `0x1BDE` and `0x1BDF`.
#### 6.29
+A. Given the information, B = 4, S = 4, we can get b = 2, s = 2. Then, t = m - (s + b) = 12 - 4 = 8.
+So the diagram should be like:
+| N/A | CT | CT | CT | CT | CT | CT | CT | CT | CI | CI | CO | CO |
+| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
+| 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+
+B.
+| Operation | Address | Hit? | Read value (or unknown) |
+| :--: | :--: | :--: | :--: |
+| Read | 0x834 | N | Unknown |
+| Write | 0x836 | Y | Unknown |
+| Read | 0xFFD | Y | 0xC0 |
#### 6.30
+A. Given the information, E = 4, B = 4, S = 8, so we can get the size (C) of this cache is 128 bytes.
+
+B.
+| CT | CT | CT | CT | CT | CT | CT | CT | CI | CI | CI | CO | CO |
+| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
+| 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
#### 6.31
+A.
+| 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
+| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
+| 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+
+B.
+| Parameter | Value |
+| :-----------------: | :---: |
+| Block offset (CO) | 0x02 |
+| Index (CI) | 0x06 |
+| Cache tag (CT) | 0x38 |
+| Cache hit? (Y/N) | N |
+| Cache byte returned | — |
#### 6.32
+A.
+| 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
+| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
+| 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+
+B.
+| Parameter | Value |
+| :-----------------: | :---: |
+| Block offset (CO) | 0x00 |
+| Index (CI) | 0x02 |
+| Cache tag (CT) | 0xB7 |
+| Cache hit? (Y/N) | N |
+| Cache byte returned | — |
#### 6.33
+These addresses have the binary form `1011 1100 010x x` or `1011 0110 010x x`. Concretely, these are `0x16C8`, `0x16C9`, `0x16CA`, `0x16CB`, `0x1788`, `0x1789`, `0x178A` and `0x178B`.
#### 6.34
-
+Just as the graph shows, each cache line can hold exactly one row of the array. And because the cache is too small to hold both arrays, references to one array keep evicting useful lines from the other array.
+
+
+
+| `src` | Col. 0 | Col. 1 | Col. 2 | Col. 3 |
+| :---: | :----: | :----: | :----: | :----: |
+| Row 0 | m | m | m | m |
+| Row 1 | m | m | h | h |
+| Row 2 | m | m | h | h |
+| Row 3 | h | h | m | m |
+
+| `dst` | Col. 0 | Col. 1 | Col. 2 | Col. 3 |
+| :---: | :----: | :----: | :----: | :----: |
+| Row 0 | m | m | m | m |
+| Row 1 | m | m | m | m |
+| Row 2 | m | m | m | m |
+| Row 3 | m | m | m | m |
#### 6.35
+When the cache is 128 bytes, it is large enough to hold both arrays. Thus the only misses are the initial cold misses.
+| `src` | Col. 0 | Col. 1 | Col. 2 | Col. 3 |
+| :---: | :----: | :----: | :----: | :----: |
+| Row 0 | m | m | m | m |
+| Row 1 | h | h | h | h |
+| Row 2 | h | h | h | h |
+| Row 3 | h | h | h | h |
+
+| `dst` | Col. 0 | Col. 1 | Col. 2 | Col. 3 |
+| :---: | :----: | :----: | :----: | :----: |
+| Row 0 | m | h | h | h |
+| Row 1 | m | h | h | h |
+| Row 2 | m | h | h | h |
+| Row 3 | m | h | h | h |
#### 6.36
+A. 100%. Since `x[0][i]` and `x[1][i]` will be cached into the same block.
+
+B. 25%. Since the whole array can be cached, the only misses are the initial cold misses.
+
+C. Still 25%.
+
+D. No. Because the initial cold misses will still happen, and after that, only three of read accesses will be cache hit.
+
+E. Yes. Since we can only get 1 miss per 8 read accesses, the final miss rate would be 12.5%.
#### 6.37
+| Function | N = 64 | N = 60 |
+| :------: | :----: | :----: |
+| sumA | 25% | 25% |
+| sumB | 25% | 25% |
+| sumC | 25% | 25% |
#### 6.38
+A. From the loop, we can see there are 16 * 16 * 4 = 1024 writes.
+
+B. About 768 writes will hit in the cache. Since `sizeof(point_color) = 16`, B = 16, so we will get one cold miss, and three cache hit in one iteration at a time.
+
+C. 3 / 4 = 75%.
#### 6.39
+A. Still 1024 writes.
+
+B. About 768 writes will hit in the cache.
+
+C. 3 / 4 = 75%.
#### 6.40
+A. 1024 writes.
+
+B. 768 writes.
+
+C. 3 / 4 = 75%.
#### 6.41
+About 7 / 8 = 87.5%.
+
#### 6.42
+Same as before. The hit rate is 87.5%.
+
#### 6.43
+Because of the change of this line `int *iptr = (int *)buffer;`, the hit rate will become 50%.
+
#### 6.44
+We can look for peaks and plateaus in the memory bandwidth values:
+
+1. **L1 Cache**: The smallest and fastest cache level is usually the L1 cache. Look for the peak in memory bandwidth where the performance suddenly drops. This is likely the boundary between the L1 cache and main memory.
+
+2. **L2 Cache**: After the L1 cache, there may be a plateau where memory bandwidth remains relatively high. This plateau represents the L2 cache.
+
+3. **L3 Cache**: It is usually another plateau in memory bandwidth, and we can use that to estimate the L3 cache size.
+
+4. **Main Memory**: Beyond the cache sizes, we will notice a significant drop in memory bandwidth. This is where we transition from cache to main memory.
+
+Based on the provided data and rough estimates:
+
+- L1 Cache: Estimated between 32KB and 64KB.
+- L2 Cache: Estimated between 256KB and 512KB.
+- L3 Cache: Estimated between 4MB and 16MB.
+
#### 6.45
+We could use blocking as follows.
+```c
+void transpose(int *dst, int *src, int dim) {
+ int block_size = 32; // Adjust block size as needed
+ for (int i = 0; i < dim; i += block_size) {
+ for (int j = 0; j < dim; j += block_size) {
+ for (int x = i; x < i + block_size && x < dim; x++) {
+ for (int y = j; y < j + block_size && y < dim; y++) {
+ dst[y * dim + x] = src[x * dim + y];
+ }
+ }
+ }
+ }
+}
+
+```
+
#### 6.46
+
+```c
+void col_convert(int *G, int dim) {
+ int block_size = 32; // Adjust block size as needed
+
+ for (int i = 0; i < dim; i += block_size) {
+ for (int j = 0; j < dim; j += block_size) {
+ for (int x = i; x < i + block_size && x < dim; x++) {
+ for (int y = j; y < j + block_size && y < dim; y++) {
+ G[y * dim + x] |= G[x * dim + y]; // Perform logical OR
+ }
+ }
+ }
+ }
+}
+```
+
+1. **Blocking**: Similar to the matrix transpose optimization, the code uses blocking or tiling with a fixed block size (e.g., 32x32). This enhances spatial locality by working within smaller blocks.
+
+2. **Efficient OR Operation**: Instead of using a conditional `||` operation, the code uses a bitwise OR (`|`) operation to combine values. Since the matrix elements are binary, and the OR operation effectively creates an undirected edge when either a directed edge from vi to vj or from vj to vi exists.
+
+4. **Symmetry**: Since the adjacency matrix of an undirected graph is symmetric, this code takes advantage of this property by processing only the upper or lower triangular part of the matrix and then copying the result to the other half.
\ No newline at end of file
diff --git a/hw06/mountain/mountain-for-my-machine.txt b/hw06/mountain/mountain-for-my-machine.txt
new file mode 100644
index 0000000..30d04d4
--- /dev/null
+++ b/hw06/mountain/mountain-for-my-machine.txt
@@ -0,0 +1,17 @@
+Clock frequency is approx. 2445.4 MHz
+Memory mountain (MB/sec)
+ s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15
+128m 19731 11596 8502 6721 5256 4038 3350 2916 2683 2426 2284 2241 2300 2404 2395
+64m 20200 11464 8743 7402 6217 5259 4539 3869 3711 3381 3430 3510 3494 3592 3684
+32m 26043 18673 14312 11976 8909 7588 6821 6878 6339 6005 5664 5855 5795 5860 6136
+16m 42623 30706 20508 15445 12343 10344 8831 7746 6375 6439 6136 6159 6032 6169 6231
+8m 42617 30663 20519 15402 12321 10289 8817 7709 6947 6481 6178 6154 6041 6180 6176
+4m 42533 30554 20420 15389 12436 10331 8831 7761 6971 6526 6241 6157 6101 6181 6235
+2m 40363 30496 20450 15301 12317 10249 8383 7655 7093 6662 6367 6268 6125 5924 5781
+1024k 42305 28272 18991 15528 11900 10371 8873 7820 7268 6823 6753 6878 7081 7620 8104
+512k 40163 33205 26796 20866 17102 14251 12439 11031 10828 11016 11299 11416 11770 12058 12284
+256k 42894 36444 31831 24683 20127 16967 14431 12629 12694 12702 12584 12602 12656 12715 12640
+128k 44047 37594 34069 25758 20599 17171 14718 12776 12751 12704 12652 12676 12579 12804 12644
+64k 43609 37603 33534 25552 20773 17034 14837 12983 13218 13082 13513 13627 13977 17318 31149
+32k 43621 39867 38934 38975 36338 36388 35997 31498 33091 32699 33102 34061 31441 38927 31233
+16k 41954 37167 38934 37236 32699 38814 38927 29288 30277 32833 29604 27138 31441 29195 27249
\ No newline at end of file