-
Notifications
You must be signed in to change notification settings - Fork 1
Work on Bloom Filter
I wrote a lot of code over the last few days and not much prose.
Right now I'm running this command.
haruhi run job -clusterId tinyAwsCluster -jarId telepath inBloomFilter -input s3n://wikimedia-summary/monthlyAll/2008-01/part-r-00000.gz -bloomFilter s3n://wikimedia-summary/test/firstBloom -k 7 -output s3n://wikimedia-summary/test/firstBloomOut
I've got a painful 10 minute development cycle because I'm not practicing good TDD, but the questions I'm answering have to do with uncertainties of what I can get away with I/O and until I know those answers, I can't write the unit tests.
I am thinking of patching Haruhi so I can keep an AWS cluster running between flows. This gets us from "8 cents per test" to "8 cents per working hour", which is a cost savings, and it could speed the development cycle up to 1-2 minutes.
The current exception is
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
at org.apache.hadoop.fs.Path.<init>(Path.java:90)
at com.ontology2.telepath.bloom.InBloomFilterMapper.setup(InBloomFilterMapper.java:30)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The right thing to do is write tests and we have a precedent for that because the SingleJobTool
provides a test point, used in the following code:
Since the only interesting thing most Tool
(s) do is create a Job
object, we can write a unit test that checks that we created the correct job.
Now I have the error
2014-01-15 17:51:59,000 WARN org.apache.hadoop.mapred.Child (main): Error running child
java.lang.IllegalArgumentException: This file system object (hdfs://10.194.249.65:9000) does not support access to the request path 's3n://wikimedia-summary/test/firstBloom' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:384)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:798)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1538)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1533)
at com.ontology2.telepath.bloom.InBloomFilterMapper.setup(InBloomFilterMapper.java:35)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
amazingly it tells me exactly what I did wrong!