Question Does anyone know how to copy a full https:// website?

dekw04

Commendable
Oct 6, 2017
124
0
1,690
Hi everyone.

Does anyone know how to copy an https:// website AND/OR a website that requires you to login for which you already have login credentials (i.e. username and password)?

I've been able to successfully copy a full website before using the program HTTrack Website Copier, but I don't think the website I copied was an https:// website nor one that required login credentials.

I tried copying an https:// website using HTTrack Website Copier, but it didn't work for me.

I want to make an archive of a website I subscribed to for offline use.

I was logged into my account and I tried copying this https:// website using HTTrack Website Copier, the program just made a copy of the pages, but the copied website's pages looked as if I had never even logged in. In other words, the copied pages just showed the login screen.

I really believe that there must be a way to copy this website using a program since I was able to make a manual copy of it (sort of) by saving each individual page. Saving each individual page however takes a massive amount of time and effort. I really want to automate this process with a program.

If it's possible to make a manual copy then it seems more than likely possible to make a copy using a website copying program.

Anyway, with that said, does anyone know how to copy an https:// website AND/OR a website that requires you to login for which you already have login credentials (i.e. username and password)?

Any solution to this question would be appreciated.

Thank you.
 
Last edited:

Quarkzquarkz

Distinguished
Sep 18, 2013
445
18
18,965
It's impossible to copy just the website. You have to understand that login credentials are handled via server requests which are usually stored in a database structure such as MySQL, MongoDB, and other forms of databases. And not to mention middle software that connect these two via PHP or Python or NodeJS etc...

This means that even though you have a local copy of the website, the actually login part needs access to the database, and unless you are a full stack developer who has a private copy of the database and or simulating on locally, you cannot access this.
 
If you just think about it: Most of modern WEB is not made of static HTML files. You can't copy your bank' website, neither you can copy Facebook, or Google. So, whatever tool you use, you'll end up with some snapshot of the website at the moment you grabbed it, but nothing else.
 

dekw04

Commendable
Oct 6, 2017
124
0
1,690
If you just think about it: Most of modern WEB is not made of static HTML files. You can't copy your bank' website, neither you can copy Facebook, or Google. So, whatever tool you use, you'll end up with some snapshot of the website at the moment you grabbed it, but nothing else.

A snapshot of the website at the moment I grabbed it is good enough for me. I just want to archive the information really. I don't really need the website to function like a normal website would.
 

dekw04

Commendable
Oct 6, 2017
124
0
1,690
It's impossible to copy just the website. You have to understand that login credentials are handled via server requests which are usually stored in a database structure such as MySQL, MongoDB, and other forms of databases. And not to mention middle software that connect these two via PHP or Python or NodeJS etc...

This means that even though you have a local copy of the website, the actually login part needs access to the database, and unless you are a full stack developer who has a private copy of the database and or simulating on locally, you cannot access this.

I think I understand, but seeing as how I can save each individual page, is there a way for me to instruct a program to save each individual page on my behalf?

I can literally just go through each individual page and save them by right-clicking on each one and saving the page as "Webpage, Complete". I just have to do this while I'm logged in and the saved page looks as it does when I'm logged in.

Is there really no way to get a computer program to do this for me while I'm logged in? This seems like something a program should be able to do.
 

USAFRet

Titan
Moderator
No, and especially no for a site you have to log into.

For instance, here at Tom's. Your log in grants you access to certain items. My login, as a Moderator, grants me access to other things.
The backend database and the associated code handles what you can and cannot see.
The content you're looking at right now is not ALL of the content that is linked from this page.

Unless some application has access to pose as you...it cannot know what it should get.

When you click on something and it shows you a new page, that particular content has been granted you by the DB and the code.
Long gone are the days of static HTML pages.
 
Download windows binaries of GNU Wget.
You'll want Ver. 1.20, at this time.

Learn to use it.

If the windows port is faithful to the Linux utility that I use daily, then you can do what you want to do. While I positively know that it is possible to pass login credentials to the server, I just don't remember whether it is limited to ftp or will accommodate https service; since I used it in that specific context years ago, and haven't had the need to do so since.