Friday, December 6, 2013

Last Week of Coding

Map Reduce

I did not quite understand this part of the lecture,but although it was just a brief walk through, I still see the power of its algorithm.

I thought this was a good picture that taught me the basics of it, and I hope it would help others who also have doubts on it.


An example given was that the Google search engine uses a lot of MapReduce to lower their runtime costs. I can imagine the work of processing so much data. The basic steps taken are:

1. Read the raw data from the storage and pass it onto the next step.
2. Filter the data through a map function that returns a collection of pairs of names and values. If the data is to big, then it should be separated into sub-sets of data and be ran on parallel map functions onto the next step.
3. Shuffle these pairs of names and values together and group those data that have similar charisteristics.
4. For each group, run them through a reduction function that returns collections of values
5. store the final sets of values for use.

Tuesday, November 26, 2013

Week 11 of Coding

Assignment 2

This in fact was not a really hard assignment, it only has very straightforward for loops and recursions. So I went on the web and looked over more background information about regular expressions.

Firstly, this complex syntax is used to check whether a given string matches or contains a given pattern. But it is not this simple, it has a flexible system that allows check upon a pattern not just fixed characters. Like in the assignment, we could match a symbol to multiple number of spaces.

It consists of various symbols and patterns,
a, X, 9
ordinary characters just match themselves exactly. 

. ^ $ * + ? { [ ]  | ( ) 
meta-characters with special meanings (see below)
 
. (a period)
matches any single character except newline 'n'

w    
matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. 
It only matches a single character not a whole word.

W 
matches any non-word character.

w+
matches one or more words / characters

b 
boundary between word and non-word

s 
matches a single whitespace character, space, newline, return, tab, form

S 
matches any non-whitespace character.

t, n, r 
tab, newline, return

D
matches anything but a digit

d 
matches a decimal digit [0-9]

d{1,5}
matches a digit between 1 and 5 in lengths. 

{n} d{5}
matches for 5 digits in a row

^
match the start of the string

$
match the of the string end

*
matches 0 or more repetitions

?
matches 0 or 1 characters of whatever precedes it

 
use . to match a period or  to match a slash. 
With these 'rules' I believe they can compare any complex syntax in an efficient way.

The regular expressions also have some methods such as search, sub, compile.

Search is just for searching a regex pattern.

sub replaces occurences of certain sub-strings.

compile in my understanding is putting patterns together to build an object that is necessary for searching or matching.

Friday, November 22, 2013

Week 10 of Coding

MIDTERM 2

it was a really frustrating experience writing this test. I expected everything from the second half of our syllabus like binary search tree, pre-order, in-order traversal, and the big-Oh especially.

```````````````
Note: tree traversal is a systematic way of going through each node of the binary tree. Usually used for searching a node.

Pre-order traversal

1.root
2.left child
3.right child

Post-order traversal
1.left
2.right
3.root

In-order traversal
1. left
2. root
3. right
```````````````

Big-Oh is basically just a count of how much work to go through for each algorithm.
If an algorithm counts how many numbers are in a list, this is O(n), because it just counts n numbers.
An algorithm that looks for an x in list A that is also in list B takes O(n squared). because for each item in A going through, One has to look for at most n times in list B that matches.
```````````````

Anyhow, the test was mostly about recursions which I am worst at. I took the first 15 min thinking about what I can grab from mid-air. I don't think I've received a good mark on it, but I do know where to work on for my final now. The good thing was I was not completely blank about the questions, but it just took me longer than usual to work them out.

Wednesday, November 13, 2013

Week 9 of Coding

There are many ways to sort a piece of data according to its complexity and desired organization. I'm not talking about the sort method that is build in python. There are in fact complex algorithms coded just for sorting.

```````````````
Bubble Sort

numbers with index: 0 - 1- 2- 3- 4 - 5

1.compare every pair of numbers in turn ( 0 and 1, 1 and 2)
2. switch around if bigger one in the front
3. the biggest number would end up at the back after a number of switches.
4. redo the comparison from the beginning.

this takes huge amount of time, because it requires going over n comparisons for n numbers in the list and then n rounds in total.

```````````````
Selection Sort

this is better than Bubble sort in that it only makes one 'comparison' each round.

1. find the biggest number of the list.
2. place at the back
3. repeat

```````````````
Insertion Sort

Troublesome as well, but works in a new way from the two above.

numbers with index: 0 - 1- 2- 3- 4 - 5

1. assume number at index 0 is sorted
2. compare index 1 to 0, switch if smaller one at the back
3. assume index 0 and 1 are sorted sub-list
4. compare index 2 to each number in sub-list and make switches.
5. now index 0, 1, and 2 are the sorted sub-list
6. repeat

This is like bubble sort, it make n comparison at most for the n numbers in list, and then repeat for n rounds.

```````````````
Merge Sort

I always thought that this is a hard concept. Because it is much different from any other sorting method.

1. split the list in half
2. split each sub-list in half (until you cannot split anymore)
3. sort the numbers in each sub-list
4. merge with the other sub-list when splitting on the upper level before. Sort it
5. continue to merge all sub-lists

Wednesday, November 6, 2013

Week 8 of Coding

Now comes my favorite part of this class, the binary tree. This is a data structure that builds upon linked lists. Simply, linked lists are one way with one pointer, like a chain. On the other hand, binary tree is more complicated with at most two branches under each data. One can even say that linked lists are binary trees but with one branch all the way. Of course there might be other tree structures with more than two branches, but what we are learning now is only the basic type.

```````````````
We still start with a 'linked list'-like structure, only with more parameters. Data is still the value we are looking at, left and right are simply named for the two branches the data connects to. Like a family tree, the data on top is a parent, and it has left children and right children.

class Node:
      def __init__(self, data, left, right):
           self.data = data
           self.left = left
           self.right = right

Of course there has to be a tree top in the whole binary tree to start with, we call it the root.

class BinaryTree(Node):
     def__init__(self):
          self.root = None

```````````````
Binary Tree has a much better use on data with multiple branching factors. A linked list may be a string of numbers. The Binary Tree would be able to give a tree of numbers differentiating odd and even, big and small. Normally the tree data is in a certain order so when we look for a particular piece of information, we know to look into the left branch or the right. (suppose left is for data smaller than value X, right is for data bigger than value X).

Inserting a node would not be hard in this sorted structure.
```````````````
inserting an object into the tree

def insert(self, root, object):
     if root == None:
          root = Node(object)
     else:
          if(object <= root.data):
               root.left = self.insert(root.left, object)
          else:
               root.right = self.insert(root.right, object)
 
the first if clause is to place the object where should be placed.


Wednesday, October 30, 2013

Week 7 of Coding

This week was all about linked lists. This is a basic data structure that connects information together. It is very simple and only requires two components to start with: the current data and the data it wants to link to.

``````````````
class Node:
   def __init__(self, data, next):
   self.data = data
   self.next = next
``````````````

One simply may use the algorithm to organize an entire list of data with the sorting method one would like. This is convenient for huge amount of information that are random and troublesome to look into. Also, this huge amount of info may only have little diversity, otherwise after converting into linked lists, the information given would still be random anyways. For example if there is a list of numbers, we want to link each number with the next in non-decreasing value. So the structure would look something like: 1-2-3-4-5-6....

``````````````
1. look into this number_lsit and use the method 'sort' to put in desired order.
2. set current = number_list[0]
3. set next = number_list[1]
4. then number_linked_list = Node(current, next)

this would be the basic structure, all is left is to add a for loop, that for each pair of numbers on index 0&1, 1&2 etc., this linked list adds the values after.
``````````````

Saturday, October 19, 2013

Week 6 of Coding

For the entire time I've been preparing for the term test that happened on Wednesday. It was relatively easier than I thought, considering how much trouble I went through working on the past tests. Things like image pixels and page long test cases which I thought were going to appear did not. 



Anyhow, hard work did not go wasted. Two of the questions on the test were almost exactly the same to some of those I went through! One of them was the tedious recursions. Although I might have figured it out anyways, but without the direct solution I would've spent the entire hour working on this problem. I hope no one calls me cheating.. The other was just some simple test cases, I was surprised how it only needed descriptions. Google-ing for explanations of set-up and tear-down functions and memorizing the base format was not pleasant, I actually wanted them on the test.

sidenote: the technique to not lose yourself in a recursion problem is:
  1. figure out when you want to come out of this loop.(when a value is ###, when you come to the last letter of a string, or finished browsing all occurences of ***)This is the base case.
  2. write different possibilities that occur in this loop.(like those horoscope tests that direct you to a type of personality after you choose the choices that best fit you)
  3. ALWAYS TO REMEMBER:in most cases, you have a value or a string or something recorded. You must have it added repetitively when you go through each recursion, or the total result gets lost. It might only record the value from the last loop. The trick is: value += or -= recursion loop.
One interesting question I enjoyed on the test was the first one. I am sure it messed up a lot of us. I feel lucky that I went through a question on stack-overflow.(people ask about computer coding questions on here)

'..... (min, max) for i in range(2:)....'
previously I thought you could only return one single term 'for i in range..', now I know that it would do for set of any number of terms. As long as possible. This was the key to the question.

Then we just had to calculate all the numbers, substitute into the pair of terms, and compare each pair of values....hold on! When we compare a pair of values. e.g. (2,3) and (4,1). How do we tell which one is bigger! Well 2 is smaller than 4 but 3 is bigger than 1, and put it together... my own strategy was just to go with my instincts. Python is not a human-being. It would go through the first variable of each, and it is 2 and 4. So ta-da (4,1) is bigger. No matter it was right or wrong, I would do everything this way to make sure I don't waste too much time one those one-mark questions.(Now I see why our intelligent prof put this as the first question.)

I do think that I've received a decent grade, however I still don't know whereabout to check it.