Class 9 Tues., Feb. 21
We were looking at an inventory example using a Map of String name to ItemInfo.
"pencil" -> (“pencil, 120, 42) 120 pencils in bin 42
"tape" -> (“tape”, 44, 11) 44 rolls of tape in bin 11
…
public class ItemInfo {
private String name;
private int quantity;
private int binNumber;
//Constructor, getters, setters…
};
Map<String,ItemInfo> inventory = new TreeMap<String, ItemInfo>;
ItemInfo item = new ItemInfo(“pencil”, 120, 42);
inventory.put(“pencil”,item);
…
later:
ItemInfo item = inventory.get(“pencil”);
System.out.println(“we have “+ item.getQuantity() + item.getName() + “s”);
Now we want to find all the information in the Map.
Look at Map interface, pg 237, for way to scan through Map.
see Set<KeyType> keySet();
Set<String> keys = inventory.keySet();
for (String k: keys)
System.out.println("key = "+ k + "val = "+ inventory.get(k));
Look at Map interface in general, pg 237
Drop “extends Serializable” and anywhere else in Chap 6. you see it.
containsKey, get, and remove actually take Object in the JDK, like Collection, when equals is involved.
Means the compiler can’t catch our mistake if we try to get an Integer key from a Map of String to whatever. Doesn’t stop us from doing anything that’s useful.
We have HashMap and TreeMap, both easy to use with String, Integer, etc. keys. Advantage of TreeMap is order by key. HashMap is a little faster.
Intro to PA2
Analyzing English text, including alice.txt, Alice in Wonderland, by Lewis Carroll, and tom.txt, Tom Sawyer, by Mark Twain. First example is
test1.txt:
See Spot run.
See Sally run.
Run, Spot, run.
Run fast!
We want to tokenize this
into tokens like this:
See Spot run . See Sally run . Run Spot run . Run fast .
Then make them lowercase, like this, so run and Run count the same.
see spot run . see sally run . run spot run . run fast .Then for each word, keep track of its followers in the same sentence:
see->spot
spot->run
see->sally
sally->run
run->spot
spot->run
run->fast
Then for each word, we can find the most common follower:
see: spot
spot: run
sally: run
run: spot
fast:
Random sentence: take random first word, use followers, at most 5 words
spot run spot run spot
see spot run spot run.
Spot
run. Run fast!
Gives us an idea for sentences.
Sentences end in periods, question marks or exclamation
points, followed by whitespace.
delimiter.
First basics with ordinary letters. These are regular expressions, also studied in CS240 for use with grep, UNIX search
abc means match "abc" "cba" means match "cba". Order counts
a|b|c means match a or b or c, same as c|b|a
[abc] also means a|b|c, or [cba]
[a-c] another way to say it
[a-zA-Z] matches any letter
[^abc] is any char other than a or b or c
[^A-Za-z] is any char other than a letter
a* is any number of a's, including none at all
a+ is one of more a's
a? is an optional single a
Whitepace char: \s, need to write \\s in a String constant, because \ is a special mark for String constants.So delimiter for sentences set by useDelimiter("[.?!]\\s*"); Use the star to gobble up extra whitepace we don't want to see.
Then we can tokenize a sentence into words. A word is terminated by a non-letter:
useDelimiter("[^A-Za-z]*");
But this makes can't into two words. Also see-saw. Can we fix this? Add ' and - to the letter-chars:, but - has a special meaning inside [], so we need to escape it with \:
useDelimiter("[^A-Za-z'\-]*");
OK, now can't is an OK word. This does allow 'xxx' as a word. If you want to fix this, it's possible via a fancier pattern.