Post 2 : Reflecting on projects : Chain

Post start : 12/9/24

Chain is a janky “network utility” for loading webpages in other webpages. In actuality, Chain is a tool that I made out of boredom to get around my highschool's firewall so I could dick around on “unauthorized” websites, mostly just looking at the internet archive, reading the scp wiki and anything but doing schoolwork.

The original idea was in 8th grade when I was screwing with google spreadsheets, more specifically xpath, I was using it to automatically pull data from websites to do spreadsheets for me, as I hate busywork and blitzing through a week long project greatly appealed to me. Out of curiosity I put in reddit expecting to get junk data from the unauthorized website error that my school had. To my surprise I got junk data from reddit seemingly bypassing the firewall entirely. I choked this up to the firewall jank and moved on with my day.

In 9th Grade I would return to spreadsheet shenanigans making a tool to pull data from scummy “student talk sites”. In actuality these sites were some weird collective of students ripping questions out of whatever mass produced quiz their teachers were giving them and going directly to google to see what the collective answer to this question was. Some group had set up a bunch of sites that would collect the answers and then paywall getting to them, luckily for anyone knowing their way around a webpage the answers were sitting in plaintext just not being displayed, so a simple web scraper was all that was needed to grab all the answers i could need.

For context this was at the very start of the covid 19 pandemic and we had just been sent home, some the entire school system was in a panic after realizing that we weren't going to be coming back to school any time soon, and they had nowhere near enough online infrastructure to facilitate “remote learning”. So they went to the first option they could get, shitty zoom meetings and most notably, a generic load of tests that were outsourced to some nameless education company.

Being so widespread the answers to these tests were widely available on aforementioned “scummy student talk sites” so all I had to do was strap a bunch of Xpath searches together in a spreadsheet and I made a quiz answer generator, quite literally a robot that does your homework. Success rates on actually getting the answer correct were around 90% so all in all a major success.

And that brings me to Chain. In 11th grade I was fed up. I was fed up with annoying teachers, I was fed up with the school system at large, I was fed up with the shitty chromebooks, and most of all, I was fed up with the firewall. Chain started as a side project in my computer science class in between learning java and annoying my asshole of a teacher by completing assignments in the worst possible way, including multithreading things that should not have been multithreaded and a liberal use of java.reflect, as if its a library natively in the system it is surely fair game and intended for me to use it to ignore “private” variables.

Chain started life on one of those online HTML editors that allow you to build tiny little pseudo websites to demo css or learn javascript ect. Under the assumption that how the spreadsheet obtained information was through google's servers looking it up. While correct, it wasn't going to be that simple.

The initial approach was to just try and load everything in an iframe. Take an iframe, point at a url, and bingo. However, the approach runs into 2 issues, firstly my computer was now the one requesting the website, which the firewall would block, secondly this approach would be directly in violation of the SOP policy. SOP stands for Same-Origin Policy, and it dates back to netscape 2.0 and the introduction of javascript. The Idea behind SOP, to my understanding, is to restrict access to items outside of the domain of the current website.

Luckily both these problems could be solved by asking someone else to look up the website i want to find and sending it back to me, that's what i had seen on the google spreadsheet. I would send google the website and the query, it would do all the looking up and return it to me. The service I ended up using was all origins. This free service allows for you to send requests for a website, they would pull the data and return it to you. That's exactly what we needed. This took about a month of work up to this point and I had something kind of working. It could load basic websites, but another problem arose. Most web pages are more than a single html file, and link to external scripts, images, and stylesheets that were not loaded. So after I receive the file back from all origins I would need to find every url or uri linking to external files and send those to all origins so they could send me back all external files. With the main file and all external files I have to change all links pertaining to any external file to point at internal blob files.

These were not easy feats to accomplish, much less by someone who prior to trying this had very limited experience with web development and javascript. But the results speak for themselves, I had a way of loading most websites even from behind the horrid firewall in a locked down web browser of a shitty chromebook. Not bad for 200 lines of javascript. This is by far the hardest project I have ever had the pleasure of working on. I was able to dig up the website i used to work on it and the project itself if you wish to see its original iteration here or using the power of an actual webhost, a slightly nicer looking one here

All in all it was a very informative and useful project/tool and one of my proudest moments as a programmer

Post end : 12/9/24