After winning 1st place in the last hackathon, I contacted my colleague Michael in the Fall of 2021 for a new AI hackathon SDSU's AI club was hosting. We wouldn't know the prompt until the day of, and we would be given a weekend to build the project. The format would be the same as the last one, we write code and need to present the project at the end to some judges who will judge our following of the prompt and solving of the problem.
The prompt we were given was totally different from the last one. The company sponsoring the event had given the prompt to "write an email spam sorter, given some emails we will provide". I wasn't a fan of this at all compared to our previous prompt, but we decided to go with it anyway. We spent an hour or so going over some of the sample data and formed a plan.
We first needed to load the emails into a dataframe so we could drop whatever content we didn't need. Michael did research on classification strategies while I wrote the parser. Eventually we settled on the Naiive-Bayes approach to classifying emails. We began doing some manual frequency word counters and classifyed certain emails as spam and others as non-spam.
Our final end goal was to come up with some "activation" function that would classify certain emails as spam and others as not spam. We ended up using the sigmoid activation function, mostly because the math worked out pretty well and we had used it in one of our classes. In the end we were pulling up emails randomly and putting them into our bayes classifier and it was fairly reliably picking out spam and not spam.
Most of the other students had found word frequency analysis tools online but failed to adhere well to the prompt of classifying whether something was spam or not and simply reported scores or if certain keywords were hit. Our project ahdered best to the prompt, even if the solution wasn't perfect, and we ended up getting 1st place combined for our efforts. We ended up winning a gift card and a mousepad for our efforts.
My disinterest in the given prompt was clear, but I still wanted to take a shot at the project anyway, and I'm glad we did in the end. If I were to attempt this again today (Almost 1 year later at the time of writing), I would have done this very differently and would have picked a more solid and well-reasoned activation function to separate our bayseian model.