Skip to content

Latest commit

 

History

History
executable file
·
197 lines (155 loc) · 14.5 KB

README.md

File metadata and controls

executable file
·
197 lines (155 loc) · 14.5 KB

Demo URLs:

Project 1: https://youtu.be/dRD2LMxt_2E?si=1sarafHfKaw2HXPM

Project 2: https://youtu.be/N6BvjrIQn2o

Project 3: https://youtu.be/4ZEkeTvs1ow

Project 4: https://youtu.be/WjyvYfRoBrc

Project 5: https://youtu.be/rMztQfZuLXM

Instruction of Deployment

mysql -u root -p
SET GLOBAL general_log = 'ON';
exit
  • Install Tomcat@10
  • Import IntelliJ IDEA
  • Use git to clone this repository locally and import the project to IntelliJ
  • Install Maven
  • Configure Tomcat to run locally on IntelliJ, with the target being fabflix on port 8080. When run, this should deploy the Fabflix site locally.

Contributions:

Albert Wang:

  • Implemented frontend styling with Bootstrap and functionality for the movies page
  • Added sorting, pagination,n (number records per page), and UI updates
  • Adding employee dashboard functionality and casts XML parsing
  • Search page, single movie page for Android
  • Setting up AWS master/slave instances with load balancer

Katrina Mizuo:

  • Worked on the functionality for the single star and single movie pages
  • Added searching and browsing by title and genre
  • mains and actors XML parsing, as well as reCAPTCHA implementation
  • Autocomplete functionality for the results page of Fabflix
  • Movies list page on Android
  • Test plan and time measurements with JMeter

Project 2

LIKE Usage:

We used the LIKE predicate for substring matching with search on title, director, and star for movies in our QueryServlet. This can be seen below:

            String query = "SELECT * from movies, ratings where movies.id = ratings.movieId";

            Map<Integer, String> parameters = new HashMap<>();
            int currentParameter = 1;
            if (!title.isEmpty()) {
                query += " AND movies.title LIKE ?";
                parameters.put(currentParameter, "%" + title + "%");
                currentParameter++;
            }
            if (!year.isEmpty()) {
                query += " AND movies.year = ?";
                parameters.put(currentParameter, year);
                currentParameter++;
            }
            if (!director.isEmpty()) {
                query += " AND movies.director like ?";
                parameters.put(currentParameter, "%" + director + "%");
                currentParameter++;
            }
            if (!star.isEmpty()) {
                query += " AND movies.id IN (SELECT m2.id FROM movies m2, stars_in_movies sim, stars s2 " +
                        "WHERE m2.id = sim.movieId AND s2.id = sim.starId AND s2.name like ?)";
                parameters.put(currentParameter, "%" + star + "%");
            }

            PreparedStatement statement = conn.prepareStatement(query);

We also used LIKE for browsing by title in the QueryServlet:

SELECT * FROM movies, ratings WHERE movies.id = ratings.movieId AND movies.title LIKE '%s';

Project 3

PreparedStatement Files:

CartServlet.java

LoginServlet.java

MoviesServlet.java

PaymentServlet.java

QueryServlet.java

ReceiptServlet.java

SingleMovieServlet.java

SingleStarServlet.java

MetadataServlet.java

StarsServlet.java

MainParser.java

ActorParser.java

CastParser.java

XML Parsing

mains243.xml

For mains243.xml parsing, we decided that genres would be based on the abbreviations provided by the XML website. We did this by using a HashMap with abbreviation: genre name, but we did modify some names from the site if they were already existing in the database. We converted all abbreviations to lowercase to compare in the dict before converting to the full name. For example, the Stanford site mentions "Romt" -> "romantic", but we mapped "romt" -> "Romance" instead because Romance already exists as a genre in the database.

However, if a genre did not match any of the mentioned valid abbreviations' formats, then we would write its incorrect value to mainerrors.txt.

For inserting values with strings, backslashes () and double quotes (", removed specifically from movie titles) were removed to avoid breaking the insert strings.

actors63.xml

For actors63.xml parsing, we decided that stars would be uniquely identified by their stagename only, since that was how they were linked to movies in the cast XML. A HashSet of Strings (star names) was used to keep track of only unique star names.

For dates of birth, any + at the end of a string would be stripped off to extract the year. Additionally, any blank or "n.a." years go as null, and are printed as "N/A" for the star. Otherwise, if the year is not numerical, it would be invalid. These errors are found in actorerrors.txt.

casts124.xml

For casts124.xml parsing, we decided to add any star we found that was not already in the database. We used their stagename as a unique identifier and for their name in the database. We also decided not to add movies found in the XML file that were not already in the database. These movies are written to the file casterrors.txt.

Optimization 1: HashMap and HashSet

For our first optimization method, we used HashMaps and HashSets to hold the results of querying the database before inserting new data. Since the accessing runtime of both are O(1), this would allow us to quickly check for duplicate elements before inserting, instead of having to requery the database every time we wanted to check if we were inserting a new element.

Optimization 2: Generating SQL Statements Into One File

For our second optimization method, we generated SQL INSERT statements as we found new elements to insert into a file, with BEGIN; at the top and COMMIT; at the bottom. This meant all the SQL statements were executed together as one transaction, which was much quicker than making and executing one statement for every single new item.

Project 5

  • Connection Pooling

    • Configuration

      Connection pooling is configured in context.xml with the following line of code:
    url="jdbc:mysql://localhost:3306/moviedb?autoReconnect=true&amp;allowPublicKeyRetrieval=true&amp;useSSL=false&amp;cachePrepStmts=true"
    • The files that use Connection Pooling are here.

    • Explain how Connection Pooling is utilized in the Fabflix code.

      Originally in the Fabflix code, every servlet would create a new Connection to execute PreparedStatements with moviedb. Connection Pooling is utilized in the Fabflix code by being configured in context.xml, which creates a set number of cached connections to moviedb. Then, instead of creating a new Connection every time a servlet is called, each servlet call just grabs an available Connection from the pool and returns it after (by "closing" it). This saves time because the same pool of connections can be reused across servlet calls, rather than wasting time on opening and closing new connections.

    • Explain how Connection Pooling works with two backend SQL.

      With two SQL databases in the backend (or just more than one), Connection Pooling works similarly to described above - a set of connections to either database1 or database2 are defined, and Connections are grabbed and returned to the pool as they are used by servlets. Again, this saves time because a completely new Connection does not have to be opened and closed for every servlet call.

  • Master/Slave

    • Include the filename/path of all code/configuration files in GitHub of routing queries to Master/Slave SQL.

    • PaymentServlet.java

    • StarsServlet.java

    • MoviesServlet.java

    • How read/write requests were routed to Master/Slave SQL?

      When setting up database connections for the Slave instance, its MySQL connections point to the Master's IP address. On the other hand, the Master's MySQL uses its own MySQL setup. This means that Read/Write requests can be sent to the Master, while only Read requests are sent to the Slave (Write requests are redirected to the Master).

  • JMeter TS/TJ Time Logs

    • Instructions of how to use the log_processing.py script to process the JMeter logs.

      The log files for these results are located in logs/. The log files' format has each line as TS TJ. To run the log processing file, run python3 log_processing.py from within the project directory. This may have to be run with sudo, depending on permissions.
  • JMeter TS/TJ Time Measurement Report

Single-instance Version Test Plan Graph Results Screenshot Average Query Time(ms) Average Search Servlet Time(ms) Average JDBC Time(ms) Analysis
Case 1: HTTP/1 thread 86 5.81 5.78 It looks like the average search servlet and JDBC times are very similar, especially since there isn't much difference between their functionality here. However, the average query time is much higher - potentially because the query time accounts for sending the query and receiving a response, not just the work the servlet does.
Case 2: HTTP/10 threads 92 9.81 9.79 The average search servlet and JDBC times were higher than with just one thread, most likely because the throughput was higher than before (green on graph).
Case 3: HTTPS/10 threads 91 9.92 9.90 The times here didn't change too much from 10 threads with HTTP, which was weird because we thought HTTPS would make it slower.
Case 4: HTTP/10 threads/No connection pooling 99 11.94 11.92 The times here were slightly higher than HTTP with 10 threads (connection pooling enabled), but this case had lower throughput with its higher average times. We think the lack of connection pooling might have slowed down the servlet's operations with the database (making new connections).
Scaled Version Test Plan Graph Results Screenshot Average Query Time(ms) Average Search Servlet Time(ms) Average JDBC Time(ms) Analysis
Case 5: HTTP/1 thread 89 6.70 6.64 Similarly to the HTTP 1 thread (Case 1 on single instance), the average search servlet and average JDBC times were very similar. We think these times may have been faster than the single instance counterpart because load balancing might have made it slightly more efficient.
Case 6: HTTP/10 threads 93 10.11 10.08 This case has an average time that's slower than the single thread case, which could be because there are 10 times the amount of requests being made. With a much higher throughput, we think the serlvet would be slower handling so much more requests.
Case 7: HTTP/10 threads/No connection pooling 99 13.08 13.05 With 10 threads and no connection pooling, the average time was slower than the other two scaled cases. We think this is because a bit more time had to be spent opening and closing connections to the database, instead of pulling them from a cached pool.